Codementor Events

Preferable tools for machine learning - Python - MatLab - R

Published May 11, 2018Last updated Jul 08, 2019
Preferable tools for machine learning - Python - MatLab - R

AI Q&A sites and Data Science forums are buzzing with the same questions over and over again. I’m new in data science, what language should I learn? What’s the best language for machine learning?

“Which is better” ; questions usually depend heavily on the context. This is no exception.

What do someone want to be able to achieve with machine learning? If someone want to learn it just for the sake of understanding machine learning, then it is best to choose the language in which he can get most support from his immediate environment.

If someone want to get into machine learning in order to do something more specific, there will be differences. Does your machine learning task involve images? Go with Matlab or Python, because you might want to use image processing as well. If someone want to get deep into the theory behind machine learning and use fancy statistical methods for any novel algorithm? Then it’s better to choose R

However, it’s always good to have more weapons in our armory though.

Matlab , Python and R have all been used successfully in teaching college students fundamentals of mathematics & statistics. In today’s data driven environment, the study of data through big data analytics is very powerful, especially for the purpose of decision making and using data statistically in this data rich environment.

Matlab can be used to teach introductory mathematics such as calculus and statistics. Both Python and R can be used to make decisions involving big data.

On the one hand, Python is perfect for teaching introductory statistics in a data rich environment. While R is a little more involved, there are many customizable programs that can make somewhat involved decisions in the context of prepackaged, pre-programmed statistical analysis.

If learners are from undergrand , it’s good to start with Python – as he can get the advantages of general purpose language. If from research, good to start with R and explore Octave. Later when Matlab access , learner can use his/her Octave skills as well. And for employee, it’s best to master both Python and R. Because , to build a product in an enterprise scenario, he may need interact with multiple entities which may talk different languages. And again for tech enthusiast and love exploring or learning new things, learn Julia – the killer feature being the speed of execution.


However , let’s elaborate each in depth-

P Y T H O N

Python is a type of programming language. The most common implementation to this programming language is that in C (also known as CPython). Not only is Python a programming language, but it consists of a large standard library. This library is structured to focus on general programming and contain modules for OS specific, threading, networking, and databases.

M A T L A B

Matlab is most highly regarded as not only a commercial numerical computing environment, but also as a programming language. Matlab similarly has a standard library, but its uses include matrix algebra and a large network for data processing and plotting. It also contains toolkits for the avid learner, but these will cost the user extra.

R

R is software designed to run statistical analyses and output graphics. R is a free, open-source statistical software. Colleagues at the University of Auckland in New Zealand, Robert Gentleman and Ross Ihaka, created the software in 1993 because they mutually saw a need for a better software environment for their classes. R has certainly outgrown its origins, now boasting more than two million users according to an R Community website (“What is R?” 2014).

Let’s talk about briefly about the advantages and disadvantages each of them.

Advantages of Matlab

MatLab has a large number of committed users which include many universities and a few companies who have the budget to buy a license for the program. Even though it is used in many universities, Matlab is easy for beginners who are just starting to learn about programming language because the package, when purchased, includes all that you will need.

When using Python you are required to install extra packages. One part of MatLab is a product called Simulink, which is a core part of the MatLab package for which there does not yet exist a good alternative in other programming languages.

Disadvantages of Matlab

Disadvantage is its cost of License. Its very costly user has to buy each and every module and pay for it. Disadvantage is during cross compiling or converting Matlab to other language code is very difficult. Its very difficult or requires deep devel Matlab knowledge to deal with all errors.

Matlab is not suggested to make any product. Because, Matlab doesn’t create application deployment like task (like setup files and other executable which copies during installation).

Advantages of R

R is a statistical package that tries to solve problems of statistics in nature. There are many prepackaged programs in R that attempt to solve various analytics problems. However, MatLab is used to teach various aspects of mathematics, such as calculus or graphing equations. In the analytics field, R is preferred over MatLab when it comes to performing statistical analysis.

R is the most comprehensive statistical analysis package available, incorporates all of the standard statistical tests, models, and analyses, as well as providing a comprehensive language for managing and manipulating data. New technology and ideas often appear first in R.

Disadvantages of R

R has a steep learning curve – it does take a while to get used to the power of R but no steeper than for other statistical languages. R is not so easy to use for the novice. There are several simple to use graphical user interfaces for R that encompass point and-click interactions, but they generally do not have the polish of the commercial offerings.

Advantages of Python

The Python language has diversified application in the software development companies such as in gaming, web frameworks and applications, language development, prototyping, graphic design applications, etc.

– User Friendly and Easy to learn
– Cross platform supported
– Vast community support
– Very powerful
– Open source

– Python Packages Index ( PyPI ) – hosts thousands of third-party modules for python.

Applications

  • Web and Internet Development
  • Database Access
  • Desktops GUIs
  • Scientific and Numeric
  • Education
  • Network Programming
  • Software and Game Development

This provides the language a higher plethora over other programming languages used in the industry. Some of its advantages in details are-

Extensive Support Libraries
It provides large standard libraries that include the areas like string operations, Internet, web service tools, operating system interfaces and protocols. Most of the highly used programming tasks are already scripted into it that limits the length of the codes to be written in Python.

Integration Feature
Python integrates the Enterprise Application Integration that makes it easy to develop Web services by invoking COM or CORBA components. It has powerful control capabilities as it calls directly through C, C++ or Java via Jython. Python also processes XML and other markup languages as it can run on all modern operating systems through same byte code.

Improved Programmer’s Productivity
The language has extensive support libraries and clean object-oriented designs that increase two to ten fold of programmer’s productivity while using the languages like Java, VB, Perl, C, C++ and C#.

Productivity
With its strong process integration features, unit testing framework and enhanced control capabilities contribute towards the increased speed for most applications and productivity of applications. It is a great option for building scalable multi-protocol network applications.

Disadvantages of Python

Python has varied advantageous features, and programmers prefer this language to other programming languages because it is easy to learn and code too.

However, this language has still not made its place in some computing arenas that includes Enterprise Development Shops. Therefore, this language may not solve some of the enterprise solutions, and limitations include-

Difficulty in Using Other Languages
The Python lovers become so accustomed to its features and its extensive libraries, so they face problem in learning or working on other programming languages. Python experts may see the declaring of cast “values” or variable “types”, syntactic requirements of adding curly braces or semi colons as an onerous task.

Weak in Mobile Computing
Python has made its presence on many desktop and server platforms, but it is seen as a weak language for mobile computing. This is the reason very few mobile applications are built in it like Carbonnelle.

Gets Slow in Speed
Python executes with the help of an interpreter instead of the compiler, which causes it to slow down because compilation and execution help it to work normally. On the other hand, it can be seen that it is fast for many web applications too.

Run-time Errors
The Python language is dynamically typed so it has many design restrictions that are reported by some Python developers. It is even seen that it requires more testing time, and the errors show up when the applications are finally run.

Underdeveloped Database Access Layers
As compared to the popular technologies like JDBC and ODBC, the Python’s database access layer is found to be bit underdeveloped and primitive. However, it cannot be applied in the enterprises that need smooth interaction of complex legacy data.

Let’s consist a small combination of them – following can be incredibly useful –

MATLAB

– Invaluable for signal processing
– Incredibly broad array of useful libraries
– Simplest and most concise language for anything involving matrix operations
– Works very well for anything that is simply represented as a numeric feature matrix
– Huge pain to use for anything that isn’t simply represented as a numeric feature matrix
– Lacking a good open source ecosyste

Python

– Very fragmented but comprehensive scientific computing stack
– Pandas, scikit.learn, numpy, scipy, ipython, & matplotlib are my most-used scientific computing libraries
– IPython notebook makes a nice interactive data analysis tool
– All the benefits of a general purpose programming language
– Unfortunately slow if you don’t drop into C
– Some of the scientific computing stack is still stuck in Python 2.7
– Very good for problems that don’t come as a simple feature matrix, between tools like pandas and nltk
– Incredible open source ecosystem

R

– As a general rule, if it’s found to be interesting for statisticians, it’s been implemented in R
– High quality libraries with a good focus on unit testing
– Nice interactive data analysis tool through things like RStudio
– Language as a whole is slow and memory-intensive
– Language itself makes me want to gouge my eyes out
– Process for contributing libraries is unnecessarily manual and generally a pain in the ass
– The Incredible Growth of Python
– Recommendation for the Attached Blogs

Python is most popular language in the AI field.

Why ? Because –

Python comes with a huge amount of libraries. Many of the libraries are for Artificial Intelligence and Machine Learning. Some of the libraries are Tensorflow (which is high-level neural network library), scikit-learn (for data mining, data analysis and machine learning), pylearn2 (more flexible than scikit-learn), etc. The list keeps going and never ends.

For other languages, students and researchers need to get to know the language before getting into ML or AI with that language. This is not the case with python. Even a programmer with vert basic knowledge can easily handle python.

Apart from that, the time someone spends on writing and debugging code in python is way less when compared to C, C++ or Java. This is exactly the students of AI and ML wants. They don’t want to spend time on debugging the code for syntax errors, they want to spend more time on their algorithms and heuristics related to AI and ML. Not just the libraries but their tutorials, handling of interfaces are easily available online. People build their own libraries and upload them on GitHub or elsewhere to be used by others

Python has a solid claim to being the fastest growing major programming language. Recommended to check ground breaking statistics on incredible growth of python and why is python growing so quickly from stack overflow.

Advantages of Python over Matlab

1. Python code is more compact and easier to read than Matlab code
—- Unlike Matlab, which uses end statement to indicate the end of a block, Python determines block size based on indentation.
—- Python uses square brackets for indexing and parentheses for functions and methods, whereas Matlab uses parentheses for both, making Matlab more difficult to differentiate and understand.
—- Python’s better readability leads to fewer bugs and faster debugging.

2. While most programming languages, including Python, use zero-based indexing, Matlab uses one-based indexing making it more confusing for users to translate.

3. The object-oriented programming (OOP) in Python is simple flexibility while Matlab’s OOP scheme is complex and confusing

4. Python is free and open
—- While Python is open source programming, much of Matlab is closed
—- The developers of Python encourage users to input suggestions for the software, while the developers of Matlab offer no such interaction

5. There is no Matlab counterpart to Python’s import statement
6. Python offers a wider set of choices in graphics package and toolsets

In Steve Hanly’s research on the speed test between Python and MATLAB for vibration analysis

Python vs R

Some real important differences to consider when someone is choosing R or Python over one another:

- Machine Learning has 2 phases. Model Building and Prediction phase. Typically, model building is performed as a batch process and predictions are done realtime. The model building process is a compute intensive process while the prediction happens in a jiffy. Therefore, performance of an algorithm in Python or R doesn’t really affect the turn-around time of the user. Python 1, R 1.

– Production: The real difference between Python and R comes in being production ready. Python, as such is a full fledged programming language and many organisations use it in their production systems. R is a statistical programming software favoured by many academia and due to the rise in data science and availability of libraries and being open source, the industry has started using R. Many of these organisations have their production systems either in Java, C++, C#, Python etc. So, ideally they would like to have the prediction system in the same language to reduce the latency and maintenance issues. Python 2, R 1.

– Libraries: Both the languages have enormous and reliable libraries. R has over 5000 libraries catering to many domains while Python has some incredible packages like Pandas, NumPy, SciPy, Scikit Learn, Matplotlib. Python 3, R 2.

– Development: Both the language are interpreted languages. Many say that python is easy to learn, it’s almost like reading english (to put it on a lighter note) but R requires more initial studying effort. Also, both of them have good IDEs (Spyder etc for Python and RStudio for R). Python 4, R 2.

– Speed: R software initially had problems with large computations (say, like nxn matrix multiplications). But, this issue is addressed with the introduction of R by Revolution Analytics. They have re-written computation intensive operations in C which is blazingly fast. Python being a high level language is relatively slow. Python 4, R 3.

– Visualizations: In data science, we frequently tend to plot data to showcase patterns to users. Therefore, visualisations become an important criteria in choosing a software and R completely kills Python in this regard. Thanks to Hadley Wickham for an incredible ggplot2 package. R wins hands down. Python 4, R 4.

– Dealing with Big Data: One of the constraints of R is it stores the data in system memory (RAM). So, RAM capacity becomes a constraint when you are handling Big Data. Python does well, but I would say, as both R and Python have HDFS connectors, leveraging Hadoop infrastructure would give substantial performance improvement. So, Python 5, R 5.

So, both the languages are equally good. Therefore, depending upon someone’s domain and the place he works, he have to smartly choose the right language. The technology world usually prefers using a single language. Business users (marketing analytics, retail analytics) usually go with statistical programming languages like R, since they frequently do quick prototyping and build visualizations (which is faster done in R than Python).

Python is clearly the most popular introductory language that was being taught, from the selection on this list. It surpassed Java, that was until recently the most used introductory teaching language over the past decade. Python has been added to most schools teaching curriculum due to its easy to learn and use programs and features. With Python, beginning students do not have to focus their energies on details like types, compilers, and writing boilerplate code, and other algorithms. Python allows the students to easily code the and make the program accomplish the tasks that they want to see achieved

Utilization of Python

Python has been gaining momentum as being the programming language for novice users. Highly ranked Computer Science departments at MIT and UC Berkeley use Python to teach their novice programming language students. The three largest Massive Open Online Course (MOOC) providers (edX, Coursera andUdacity) all use Python as their programming language for their beginning courses in programming. A variety of professors in other disciplines now utilize the need for novice students to understand Python and its key features.

Conclusion

There is no such thing as a ‘best language for machine learning’

Popularity is not a good yardstick to use when selecting a programming language for machine learning and data science. There is no such thing as a ‘best language for machine learning’ and it all depends on what you want to build, where you’re coming from and why you got involved in machine learning.

In most cases developers port the language they were already using into machine learning, especially if they are to use it in projects adjacent to their previous work?—?such as engineering projects for C/C++ developers or web visualizations for JavaScript developers.

If someone first ever contact with programming is through machine learning, then he peers in global survey point to Python as the best option, given its wealth of libraries and ease of use. If, on the other hand, he’s dreaming of a job in an enterprise environment, be prepared to use Java.

Whatever the case, these are exciting times for machine learning and the journey is guaranteed to be a mind-blowing one, irrespective of the language one opt for. Enjoy the ride!

Regards
World of Void

.
.

Get in touch with Me:

Discover and read more posts from Mohammed Innat
get started
post comments2Replies
Pedro Correa
7 years ago

Hi Mohammed, nice synthesis and a very well structured content. Thanks a lot.
It surprised me your statement that Python is not adequate for corporate, enterprise domain… Why is that? Integration issues? Your thoughts are welcome.

Another question: under “Python Integration” text you meant “COM and CORBA,” not COBRA, correct? Referring to Common Object Request Broker Architecture, right?

Best regards
Pedro

Mohammed Innat
7 years ago

Thanks for your comment. However the point you’ve just noted is under my concern. I can’t get enough time to update my post’s content these days , sorry about that. Including your noted issue i’ll update some more stuff to this documents.

And yes , Python is now adequate for enterprise domain. It’s becoming appropriate for building enterprise software. I always love to visit Python Success Stories. Anyway , another thing you’ve just pointed out is, of course type mistake. Will be fixed soon. Cheers.