Do you do any bulk writes from R to a db? Sqlsave is slow. I'm considering writing a file and picking it up with another tool.
I knew exactly what he was talking about. Sqlsave() is a function to write from R to SQL databases in the RODBC package. I also know that he was likely refactoring my code, or code that he had partially copied from me at one time.
THE SOLUTION
Fortunately, a couple of years ago, I had migrated off of Sqlsave() to another solution. Here was my response to my colleague, in email:
Easier to type on here.
We don't have to do any bulk saves in production, everything there is a single transaction, so we're saving back 6-7 records at any time transactionally (times 100 tx's per minute, yes it's a lot at a time but not a lot at any one execution).As for SQLSAVE. I got fed up with the command and no longer use it. We have an insert stored procedure that we call from R. Much faster. Much more efficient and easier.We basically call an ODBC execute command wrapper, then just execute the proc with parameters to insert our data.I don't know how this would work form your point of view, but you could (sans proc) in theory, create a text string in R that is essentially the bulk insert you want to do, and execute it AS SQL. Which I think is the key... send the command, and let the insert be handled by the ODBC driver as a native ODBC SQL command, not as a hybrid weirdo SQLSAVE.
CONCLUSION
- Sqlsave runs slow in production jobs. Not sure why, but I would guess it is in a general class of problems I call "Meta-data shenanigans"
- Using SQL directly to write back to the database in some way is generally a faster solution.
This comment has been removed by the author.
ReplyDeleteHi Levi,
ReplyDeleteCould you elaborate a little more on using SQL directly? I assume you'd use something like SQLQuery in R with the SQL code as a character variable. How would one reference a dataframe (created in the R Workspace) in the SQL code when connected to teradata? Example:
new_query_3 = "insert into teradata_container.teradata_table
select * from df"
Regards,
Matt
Great question Matt,
DeleteI'm current using RJDBC from Linux. Here's an example of when I pull data (the sourced in code (connectionDynamic.r) holds connection info in a function called get_dbhandle().
library(DBI)
library(rJava)
library(RJDBC)
source("connectionDynamic.r")
dbhandle <- get_dbhandle(ServerName,DatabaseName)
quex <- paste("EXECUTE [dbo].[up_QCGetQCDecisionEngineData] @ServiceRequestID = ", Request_ID,sep = "")
x <- as.data.frame(dbGetQuery(dbhandle,quex))
Your question seemed specific to inserts though. I do this in a couple of ways. Most of the time, live in production, I'm writing back a small number of records in which case it looks something like this:
query <- paste("exec achprocessing.[dbo].[up_InsertACHItemBlacklist] @CreateUser = 2,@RoutingNumber = "
,dx$Routing_Number[i]
," ,@AccountNumber = "
,dx$Account_Number[i]
," ,@StartDate = '"
,toString(Sys.Date())
,"' ,@EndDate = '"
,toString(Sys.Date())
,sep="")
dbSendUpdate(dbhandle,query)}
but if wanting to batch insert from a df, I push the df into a string, then create an insert query from there. Something like this:
insert_data <- paste("(",df$x.precinct,",", df$x.bush,",",df$x.kerry,")", sep="", collapse=",")
query <- paste("insert into precincts values ", insert_data)
dbSendUpdate(dbhandle,query)
Lower price for wholesale with oakley sunglasses? how much? oh only $16 for all products! free shipping over $99, wholesale oakley sunglasses!
ReplyDeleteGreat Article
ReplyDeleteData Mining Projects IEEE for CSE
Project Centers in Chennai
JavaScript Training in Chennai
JavaScript Training in Chennai