Python overtakes R, becomes the leader in Data Science, Machine Learning platforms
Last KDnuggets Poll asked
Did you use R, Python (along with their packages), both, or other tools for Analytics, Data Science, Machine Learning work in 2016 and 2017?
Python did not quite “swallow” R, but the results, based on 954 voters, show that in 2017 Python ecosystem overtook R as the leading platform for Analytics, Data Science, Machine Learning.
While in 2016 Python was in 2nd place (“Mainly Python” had 34% share vs 42% for “Mainly R”), in 2017 Python had 41% vs 36% for R.
The share of KDnuggets readers who used both R and Python in significant ways also increased from 8.5% to 12% in 2017, while the share who mainly used other tools dropped from 16% to 11%.
Fig. 1: Share of Python, R, Both, or Other platforms usage for Analytics, Data Science, Machine Learning, 2016 vs 2017
Next, we examine the transitions between the different platforms.
Fig. 2: Analytics, Data Science, Machine Learning Platforms
Transitions between R, Python, Both, and Other from 2016 to 2017
This chart looks complicated, but we see two key aspects, and Python wins on both:
- Loyalty: Python users are more loyal, with 91% of 2016 Python users staying with Python. Only 74% of R users stayed, and 60% of other platforms users did.
- Switching: Only 5% of Python users moved to R, while twice as many – 10% of R users moved to Python. Among those who used both in 2016, only 49% kept using both, 38% moved to Python, and 11% moved to R.
Net we look at trends across multiple years.
In our 2015 Poll on R vs Python we did not offer an option for “Both Python and R”, so to compare trends across 4 years, we replace the shares of Python and R in 2016 and 2017 by
Python* = (Python share) + 50% of (Both Python and R)
R* = (R share) + 50% of (Both Python and R)
We see that share of R usage is slowly declining (from about 50% in 2015 to 36% in 2017), while Python share is steadily growing – from 23% in 2014 to 47% in 2017. The share of other platforms is also steadily declining.
Fig. 3: Python vs R vs Other platforms for Analytics, Data Science, and Machine Learning, 2014-17
Finally, we look at trends and patterns by region. The regional participation was:
- US/Canada, 40%
- Europe, 35%
- Asia, 12.5%
- Latin America, 6.2%
- Africa/Middle East, 3.6%
- Australia/NZ, 3.1%
To simplify the chart we split “Both” votes among R and Python, as above, and also combine 4 regions with smaller participation of Asia, AU/NZ, Latin America, and Africa/Middle East into one “Rest” region.
Fig. 4: Python* vs R* vs Rest by Region, 2016 vs 2017
We observe the same pattern across all regions:
- increase in Python share, by 8-10%
- decline in R share, by about 2-4%
- decline in other platforms, by 5-7%
The future looks bright for Python users, but we expect that R and other platforms will retain some share in the foreseeable future because of their large embedded base.
Bill Winkler, computational speed, etc.
We are a SAS shop with ~30 new Ph.D.s and others who primarily use R. Some people are starting to work with Python because we are starting to move to a cloud-based environment. For working with national files using software based on rigorous theoretical models, we use very highly optimized C and FORTRAN routines. SAS, R, and even sometimes Python are 100+ times too slow.