Sunday, September 6, 2020

R versus Python: A Practical Paradigm for Choosing Your Production Language


As a Data Scientist fluent in both Python and R (and to a lesser degree, a few other languages), I'm often tasked deciding which to use for a specific project or script.  Making that decision is tough but generally can be made pretty quickly:

  1. Do the specific outcomes of this task make one option obviously better?  Example: if the outcome is a plot, I'll choose R for the ease of the ggplot interface.   
  2. Do the players on this task make one option obviously better?  Example: the data engineer involved is a Python person, let's keep it all in Python.
  3. Does the data science work (modeling) tend towards one language?  Example: the solution requires heavy econometric modeling, panel modeling, or time-series analysis, I'll generally choose R.

But one use case which often doesn't offer clear decision making is the productionization of Data Science processes into microservices.  Both R and Python have tools to handle this, but how do we decide the correct option for each case?  Here I offer a quick survey of the tools in each language, and my current paradigm in final choice.


Python Vs R


R

I have blogged a few times about pushing R into production to solve a variety of problems.  Is it possible? Absolutely-it has been done multiple times, mainly by data scientists.  What does that environment look like:

  1. Application code: Normal R applications using function, classes and code from 10K CRAN packages.  I recommend avoiding Tidyverse and pipes due to issues with debugging in production.   
  2. Database connectivity: R has several packages for db connections-I generally use RODBC or RJDBC depending on the use case, but DBI can also be implemented in prod. 
  3. Serving the application: Depending on what you want to do there are a couple of options.  To easily spin up a quick REST API, you can us the plumbeR package.  If you want a more robust solution Rserve provides a robust binary R server.

General thoughts:  As with most things R, building and running .predict() functions on models are straight forward.  plumbeR is a very simple system to setup and run a quick REST API, that's usually a single threaded API that isn't appropriate for production.  Rserve solves this problem on unix systems with forked processes, but is also esoteric in implementation, and getting to a REST system is more difficult.


Plumber: Getting R ready for production environments? - Data Scientists



Python

I have setup production systems in Python a few times, solving a variety of problems-mainly data engineering tasks.  It is possible, and actually happens in a much more broad sense than data science.  Many of the microservices for general web development purposes are also developed in Python, a lot of people work on them, which gives us a much more broad set of tools.

  1. Application code: In this case we would use standard Python Data Science application tools, including but not limited to pandas, numpy, and sci-kit learn.  Though this tool set is somewhat inferior to the 10,000 packages on CRAN, it does give us a solid base to complete 99% of Data Science tasks.
  2. Database connectivity: Python also has several options to connect to a database server in production, including pyodbc for ODBC connections and pymysql for a pure client API connection.
  3. Serving the application: As I stated before, because Python is used for many other potential application microservices, we have a rich toolset to choose from.  While there are options, we can use a combination of Flask (API), nginx (webserver) and gunicorn (workload handler).
General thoughts: While Python's data science toolset is a bit limiting, especially if tending towards econometric or traditional statistical models, there are other big advantages.  Specifically, with a rich micro service framework, we can very simply setup REST APIs, serve them with robust webservers, and manage heavy and embarrassingly parallel workloads.  

What Is Nginx? A Basic Look at What It Is and How It Works

Making the Decision

When weighing the relative feature sets of R and Python for production, we end up in a prioritization wash.  If ease of setting up robust APIs is a priority, then Python wins easily (yes plumbeR is easy, but has issues with workloads).  If access to the the thousands of algorithms and packages in CRAN is a priority, then a move to R is more appropriate--while biting the bullet on Rserve.  

But, as with most things, there are some external consequences we need to consider, and I suggest a new paradigm for choosing production language:  supportability.  If I personally can support R and Python packages that's great, but in truth, the Data Science team is not the group supporting prod.  

More often, that falls to a DevOps team to quickly figure out what's gone wrong, sometimes at odd hours.  From my experience, 90+% of DevOps teams have Python experience, and approximately 3% have R experience.  If R was a language that created reasonable error handling that would be less of a problem-but those errors are often non-sense to non R users.  As a result, my quick paradigm for Data Science production:

While R and Python come with similarly rich feature sets for production code, in cases when either can be used, Python is often to be preferred as it will ease support from DevOps and other technical resources.

15 comments:

  1. Thank you for taking the time to post this blog. I am pleased with your work after reading this post. This is very useful for us. Keep sharing such blogs. Backwater Valve Plumber Toronto

    ReplyDelete
  2. Thanks for sharing this best stuff with us! Keep sharing! I am new in the blog writing.All types blogs and posts are not helpful for the readers.Here the author is giving good thoughts and suggestions to each and every readers through this article.Quality of the someone write my assignment content is the main element of the blog and this is the way of writing and presenting.

    ReplyDelete
  3. You will have a better possibility than fulfill your essential by coming to at LiveGuestPost.com. The primary stage where experienced specialists have been working dedicatedly to help you at every movement start to finish and offer you complete responses for second underwriting guest posting objections list. Professional Essay Writing Services In Uk, Before we push ahead above all consider guest distributing content to a blog.

    ReplyDelete
  4. That is one of the main reasons that I am very much interested in knowing the history of the things. As most of the time their names are greatly related with their history. Also sometimes you can realize the importance of the thing just by knowing its name like this one here shows its importance dissertation assistance companies by its name that is “cure all”.

    ReplyDelete
  5. This is a really decent site post. Not very numerous individuals would really, the way you simply did. I am truly inspired that there is such a great amount of data about this subject have been revealed and you've put forth a valiant effort, with so much class. On the off chance that needed to know essay writing help more about green smoke surveys, than by all methods come in and check our stuff.

    ReplyDelete
  6. That is one of the main reasons that I am very much interested in knowing the history of the things. As most of the time their names are greatly related with their history. Also sometimes you can realize the importance cole hauser leather coats of the thing just by knowing its name like this one here shows its importance by its name that is “cure all”.

    ReplyDelete
  7. Excellent post it is full of knowledge and inspiring content Thor: Love and Thunder Chris Hemsworth Vest good work. Keep it up.

    ReplyDelete
  8. Wow, cool post. I'd like to write like this too - taking time and real hard work to make a great article... but I put things off too much and never seem to get started. Thanks though. Alves Anus Dybala Dina Mika Mitali Manik Luis Eric Marlisa

    ReplyDelete
  9. Nice job, this is essential information that is shared by you. This information is meaningful and factual for us to increase our knowledge about it. Always keep sharing this type of information. Plumber In Hills District

    ReplyDelete
  10. tackling an assortment of issues predominantly information designing undertakings. Write My Essay For Me It is conceivable, and really occurs in a considerably more expansive sense than information science. A significant number of the micro services for general web improvement objects are additionally evolved in Python.

    ReplyDelete
  11. Trade Stocks, Forex, And Bitcoin Anywhere In The World:tradeatf Is The Leading Provider Of Software That Allows You To Trade On Your Own Terms. Whether You Are Operating In The Forex, Stock, Or Cryptocurrency Markets, Use tradeatf Software And Anonymous Digital Wallet To Connect With The Financial World.: tradeatf Is A Currency Trading Company That Allows You To Trade Stocks, Forex, And Cryptocurrency.

    ReplyDelete
  12. Python have tools to handle this, Homework Help Services but how do we decide the correct option.

    ReplyDelete
  13. The Next 20 Things To Immediately Do About Advertising - https://www.reviewengin.com/

    ReplyDelete
  14. You have done good work by publishing this article here. I found this article too much informative, and also it is beneficial to enhance our knowledge. Grateful to you for sharing an article like this. water pooling under house Sydney

    ReplyDelete