Python, Python,Python … I have seen Python mentioned more in past year than any time since it’s creation. You could make a good argument that it is one of the most important programming languages today. It has an enormous number of uses in general and specifically,it is very compelling to consider using it for Data Analytics and Machine Learning since it has some excellent tools and API’s in that domain. Following is a bit of the Lore of Python and arguments for why you would want to consider using it.
A Brief History of Python
I think the following quote from Guido van Rossumsays a lot about him and hints at why Python is a great programming language.
…in December 1989, I was looking for a “hobby” programming project that would keep me occupied during the week around Christmas. My office … would be closed, but I had a home computer, and not much else on my hands. I decided to write an interpreter for the new scripting language I had been thinking about lately: a descendant of ABC that would appeal to Unix/C hackers. I chose Python as a working title for the project, being in a slightly irreverent mood (and a big fan of Monty Python’s Flying Circus).
Guido van Rossum is the, happily accepted, benevolent dictator for life (BDFL) of Python. He is one of those rare individuals that has both the intellect and personality to keep a project like Python from becoming a “Dead Parrot” ( That link is to a classic skit, what can I say, I’m a Monty Python fan too :-).
I remember compiling Python from source in the early 1990’s, trying it out and thinking, OK this is kind of cool — A nice “C” like scripting language that runs in an interpreter loop. I then went back to using Perl and shell scripts for that sort of thing. By the time version 2 was released in 2000 Python had become an important general purpose programming language and was being widely used for critical applications including “system code” like RedHat’s Linux install application. By this time nearly every Linux and UNIX distribution had a Python install in the base OS packages. Since then Python has seen rapid, continuous growth and wide adoption. It runs on most operating systems including Linux, Windows, and MacOS. There is a lot of Python 2 code ( currently at version 2.7 ) running in important places. The version 2 branch should be supported until 2020. You can look at pythonclock.org for an unofficial countdown ( yes, it’s important enough to warrant that clock but 20 years is long enough).
Why the concern about Python 2? Python 3 was released in 2008! Python 3 was an opportunity for van Rossum to make a decision that only a “BDFL” can make. Break some backward compatibility on a very successful project. Python 2 was solid but there were some inconsistencies and things that could just be better. So he did it. Python 3 is a better Python! [ One case where you might need to use Python 2 is if you have to support an “ancient” system OS that doesn’t have Python 3 and you have no control over that. Another case would be having to use some legacy version 2 code and you don’t have any control over that either. ] The best argument for Python 3 is probably this,
Short version: Python 2.x is legacy, Python 3.x is the present and future of the language
Python 3 is at version 3.6 as of this writing, it’s wonderful, and you should use it (or at least 3.4 or 3.5).
Why should you Use/Learn Python?
- Python is easy to learn
If you have used another programming language then Python is easy to pick up and start writing useful code with. If you haven’t used another language then, no, it wont actually be “easy”. Programming can be difficult to understand. However, Python is one of the most beginner friendly programming languages you could pick to start with.- Python is “interactive” — It can use what is referred to as a REPL ( Read-Eval-Print-Loop ). This lets you interact directly from the Python “prompt”. You can execute commands one at a time interactively and see the results immediately. That is a great way to explore the language. You can use it like a powerful calculator.
- Python is “interpreted“ — That’s as opposed to “compiled“. Python uses an interpreter, which is a program that reads your code and converts it to executable instructions for the hardware on-the-fly. A compiler takes a complete program source code and translates it into machine instructions and links it to other libraries of executable objects and creates a final program that can then executed directly in your operating system. With an interpreter you can write and run your code immediately. With a compiled language you have to go through series of sometimes complicated, steps, compiling and linking your code into an executable before you try to run it. With Python you get feedback immediately if you have made a mistake which makes its much easier to explore and learn.
- Python has a wealth of learning material. I did a quick search on O’Reilly Safari Books Online and got 6272 books, 59 live training links and 252 videos. That’s just on that one resource site. All of the popular online education sites have courses for learning Python. There is plenty resources to choose from for learning Python.
- Python has an active community — Python is popular! There are Python user groups in every major city and most of them are kind to new learners. Python has a yearly conference PyCon that attracts thousands attendees. Python was the 3rd most popular language on GitHubin 2016. I did a quick search and there were 390,000 repositories using Python on GitHub. ( there are 316 unique programming languages represented on GitHub… that’s kind of scary if you think about it! )
- A few possible down sides —
- Python is not “statically typed”. ( A “type” is something like integer, float, string, etc.. ) That can be considered good or bad. It makes things easy because you don’t have to declare the type for a variable (good?) and a variable can be associated with different types during program execution (bad?). Python uses what is called “duck typing” i.e. “if it looks like duck, sounds like a duck … then it is a duck”. Variable types are inferred from context. For example if a = 2 (2 is an integer) and b = “2” (“2” is a string) then a + a = 4 but b + b = “22” ( in this case + does string concatenation ), a + b gives a type error, thankfully! Sometimes “weak typing” can be confusing and cause hard to find errors. Overall I think it is a good thing and it greatly simplifies code but sometimes a new user can get in trouble.
- Python uses indention to delineate block structure. That drives some people crazy! Most languages uses something like maybe, { }, to define the start and end of a code block. That gives you some versatility in how you layout your source code and makes start and end sections very clear. In Python that is mostly defined by indention. If you don’t get your indentation nesting right then your code will blow up or do something unintended. It’s annoying when you are editing and refactoringcode. It does result in clean an often more readable code.
- Python uses “C” like indexing. OK, this is just a personal rant. I have a mathematics background, to me the first element of the list [ 1, 2, 3 ] is the number 1. In Python (and most “C” based languages) 1 is the zero’th element, i.e. indexing starts at 0. In mathematics you usually call the first “thing”, “thing number 1” and you do mathematics based on that. Off-by-one and array indexing errors can be hard to debug. You do get used to 0 based indexing and it’s not really a problem, but it is sometimes annoying to me. Languages like Matlab and Julia are a lot more comfortable for a mathematician in my opinion. [ Julia is my favorite “new” language and it works together with Python! ]
- Python is versatile
You can use multiple “styles” of programming with Python; Scripting, Procedural, Object Oriented, Functional. You can use Python for web programming, game programming, scientific programming, business/finance, embedded systems, operating system services, network programming, black-hat white-hat “hacking”, add-on scripting to other programs, GIS, etc.. It is a general purpose language with LOTS of available add-on modules. - The “Standard Library” is expansive and useful and the “Package Index” is massive
The phrase “Batteries Included” is used to describe the rich variety of packages in the “Standard Library“. There are 37 section headings in the documentation for the standard library. Everything from multimedia support, internet protocols, html and XML processing, threading multiprocessing, cryptographic services, numeric and math support, OS system support, etc.. And that is the “standard” stuff. The PYPI Python Package Index has 98806 packages! (today) There is a Python module for just about everything you can think of to do with a programming language. - There is a Python API for everything
Well, maybe not everything … When developers make an Application Programing Interface for their programs, libraries or services it’s pretty likely that they, or someone else, will create a Python interface to make it easier to use and access. For just about any service that will allow a programmatic interface it’s likely that Python will be supported. - You can get high performance with Python
Traditionally Python has been considered a “slow” language because of it’s interpreted nature. It’s true that most algorithms written in pure Python will be much slower than if they had been implemented in a typical compiled language. However, that often doesn’t matter that much. Why? Because the packages that can be imported into a Python program can contain well optimized compiled libraries that link to high performance system libraries. A good example of that is the very popular “NumPy” package which is typically linked to Intel’s MKL ( highly optimized ) Math Kernel Library. If you need to solve a large system of linear equations in Python don’t start writing “for” loops for matrix operations, just import NumPy and use its functions. There is also some support for parallel execution with threads, message passing and interfaces to access GPU accelerators for compute. These are advanced features but yes, you can use Python for high performance. There is also the consideration that it can take much less time to get working code running in Python. Even if that code is slower it may save a lot of time overall. - Python is a great “glue” language
“glue language” describes how Python is often used to tie together separate programs, libraries and API’s. Python can interface to so many different “computing objects” that it is great for creating hybrid functionality from programs that normally wouldn’t be able to talk to each other. You can combine C/C++, Fortran, and other code with Python in various ways too. Lots of languages will play nice with Python. It’s also not uncommon to use Python to create user interfaces or wrappers to programs that would otherwise be hard to work with. - Python is a great operating system equalizer (Linux Windows MacOS)
Python has modules in the standard library that allow you to make operating system calls in a unified way. That can be a big help in creating programs that will work on more than one operating system. It can effectively put a layer between the OS and your program that hides many of the differences in operating systems as long as there is similar functionality available. It’s not uncommon to find programs that run equally well on Linux, Windows and MacOS written in Python. - Python is great for Data Analytics, Machine Learning and AI
This is the real reason I’m looking at Python right now! There are some great tools for Data analytics and Machine Learning for Python. There are Python modules NumPy, pandas, SciPy Statsmodels, Scikit-learn Shogun, Bokeh, matplotlib, and many others. There are interfaces for most important machine/deep learning frameworks like TensorFlow, Caffe, Theano, mxnet, etc.. Python also works well together with other popular languages for Machine Learning like “R“, Julia, Java, Matlab, and Lua. It’s hard to be involved with Machine Learning and not do at least something with Python. - Python is easy to install and get started with
Last but not least, Python is easy to install and get started with. This is pretty important since getting started with a programming environment can be challenging. This is another point in favor of Python, it’s not too hard to get started and there exists what is becoming a standard and recommended setup for Python development on all OS platforms.
In my next post on Python I’ll go through installing and getting started using Anaconda Python on Ubuntu Linux 16.04 and Windows 10.
Happy computing! –dbk