 When talking about coding and data science and the languages along with our we need to talk about Python now Python the snakes is a general purpose program that can do it all and that's its beauty if we go back to this survey of the software used by Data mining experts you see that pythons there It's number three on the list and what's significant about that is on this list Python is the only general purpose programming language It's the only one that can theoretically be used to develop any kind of application that you want That gives it some special power compared to all these others most of which are very specific to data science work So the nice things about Python are number one, it's general purpose It's also really easy to use and if you have a Macintosh or a Linux computer Python is built into it Also, Python has a fabulous community around it with hundreds of thousands of people involved And also Python has thousands of packages Now it actually has something like 70 or 80 thousand packages, but in terms of ones that are specific to data There are still thousands available. They give it some incredible capabilities Now a couple of things to know about Python first is about versions There are two versions of Python that are in wide circulation There's 2.x so that means like 2.5 2.6 and there's 3.x 3.1 3.2 Version two and version three are similar But they're not identical and in fact the problem is this There are some compatibility issues where code that runs in one does not run in the other And consequently most people have to choose between one or the other And what this leads to is that many people still use 2.x I have to admit the examples that I use I'm using 2.x Because so many of the data science packages are developed with that in mind Now let me say a few things about interfaces for Python First Python does come with its own interactive development and learning environment. They call it idle You can also run it from the terminal or command line interface or any IDE that you have Now a very common and very good choice is Jupyter. Jupyter is a browser based framework for programming And it was originally called iPython and that served as its initial version So a lot of times when people talk about iPython what they're really talking about is Python in Jupyter and the two are sometimes used interchangeably One of the neat things you can do is there are two companies There's Continuum and Endthought both of which have made special distributions of Python With hundreds and hundreds of packages preconfigured to make it very easy to work with data I personally prefer Continuum Anaconda. It's the one that I use a lot of other people use it But either one's going to work and it's going to get you up and running And like I said with R no matter what interface you use all of them are command line You're typing line to code again. There are some tremendous strengths to that But it can be intimidating to some people at first In terms of the actual commands of Python you have some examples here on the side The important thing to remember is that it's a text interface On the other hand Python is familiar to millions of coders because it's very often a first programming language that people learn to do general purpose programming And there are a lot of very simple adaptations for data that make it very powerful for data science work So let me say something else again data science loves Jupyter And Jupyter is the browser based framework. It's a local installation But you access it through a web browser that makes it possible to really do some excellent work in data science There's a few reasons for this When you're working in Jupyter you get text output and you can use what's called Markdown as a way of formatting documents You can get inline graphics where the graphics just show up directly beneath the code that you did it Also, it's really easy to organize and present and to share Analyses that are done in Jupyter, which makes it a strong Contender for your choices in how you do data science programming Another one of the beautiful things about Python like are is that there are thousands of packages available In Python, there's one main repository. It goes by the name PyPy, which is for the Python Package Index Right here it says there's over 80,000 packages and seven or eight thousand of those are for data specific purposes Now some of the packages that you'll get to be very familiar with Are NumPy and SciPy, which are for scientific computing in general Matplotlib and a development of it called Seaborn are for data visualization and graphics Pandas is the main package for doing statistical analysis And for machine learning almost nothing beats scikit learn And when I go through hands-on examples in Python I will be using all of these as a way of demonstrating the power of the program for working with data So in sum we can say a few things Number one Python is a very popular program familiar to millions of people and that makes it a good choice Second of all the languages we use for data science on a frequent basis This is the only one that's general purpose, which means it can be used for a lot of things other than processing data And it gets its power like R does from having thousands of contributed packages Which greatly expand its capabilities, especially in terms of doing data science work