Thursday, February 5, 2015

R in Production Diaries: rJava made me slow (not really)

Well.  rJava wasn't really the cause.  Here's a basic architecture overview:

1. The web application (C#) makes a call to a service layer.
2. That service layer creates a unique connection per request to the Linux R server using Rserve
3. Rserve makes a call into R, running code for the Decision Engine, (calling libraries, making database connections, running prediction functions against multiple models, writing database logging).
4. R/Rserve return a response to the service layer... the Decision Engine's answer.

We've had three major rounds of optimization:

1. R service initially runs 10-12 seconds.  Ends up this is due to a monstrous time to load a GLM model.  I've blogged on this before.. here.

2. R service now run 2.5 seconds on average.  But occasionally 17 seconds.  This was an issue with the services layer (step 2 from above).  This wasn't my code, but was eventually fixed.  I've blogged about this before too... here.

3. R service now runs 2.5 seconds.  Always.  Except, because sometimes server volume is high, sometimes it gets behind.  Generally, in the business I'm in, 2.5 seconds is fine, and doesn't negatively impact customer experience.  However, with multiple requests and a backlog, it would be nice if it could run just a bit faster.  So I set out to debug.

I implemented a verbose logging, and found that most of my process ran in milliseconds, but the server was taking a full 2 seconds load code libraries, spending most of that time on a large library, rJava.

I call rJava in the process so that I can utilize the RJDBC package, which gives me database connectivity.   It's a necessary library (long story on why I don't use RODBC).

So, it seemed like the solution was to not load rJava on each call, but the way the C# service layer is written gave little ability to do that.  So I figured if rJava loaded as R did, it would decrease that load significantly.

Here's the solution... to the file:

/etc/R/Renviron.site

add the line:

R_DEFAULT_PACKAGES='utils,grDevices,graphics,stats,DBI,rJava,RJDBC'

The entire process now runs in less than half a second, and I (as well as some IT people) am very happy.

No comments:

Post a Comment