 Hello everyone. So I'm Alexis. You can find me around as ultra bug pretty much everywhere I'm a gentle in next developer. So when I do a rope open source, I and most of my contributions are related to clustering distributed databases and some Python packaging as well I'm also a CTO at number Lee Number Lee is a data marketing company where we help Our customer makes sense out of their data so we help them collect it and then make sense out of it. So Actually, it's this position that has led me to To present this topic and this year So, why am I doing this talk and why did I name it this way? I've been in this data-driven industry for 15 years now and I wanted to take a moment and allow ourselves maybe or find some interest in Sharing what happened. Let's say about the last decade So this will be my opinion native point of view of on how Python Has become one of the main language if not the main language used to interact with data I Will start this presentation by by asking two questions The first one is raise your hand if you are data scientist yet or wannabe or some kind Okay, so maybe half the audience mostly and and I want to ask you a question first by Because I want to acknowledge a fundamental fact that will serve The purpose of this presentation around and this question is this Think of companies or institutions and that's another question to you. So Who has driven tech over let's say the last Decade the last couple of 10 years and more precisely who influenced it the most in your point of view Company names or institution names Google Amazon Facebook Twitter Okay, sounds pretty Pretty con-social like As a question So, yeah Basically, I put on a few names. There are others And and the point is not having the question right The point is what do they have in common? Where do they get their value from? They Google Does Google's value come from the fact that they have a lot of Topics associated to URLs in their database Not that much Facebook Okay, I think you know where I'm leading you They are data-rich companies. They are actually their value comes from your data That means that data has value and regulators finally at some point Realize that and we got GDPR and Maybe in the next couple of years we'll hear about e-privacy as well but today in In 2018 especially say because GDPR started this year and this is not to talk about GDPR So I will stop talking about it just after this We all recognize now that data has value So this is the fundamental fact that I wanted to settle right now But private companies are not the only ones Who under a lot of data who are data-rich? Okay, some scientific and scientist scientific institutions are as well So we can fairly say that aliens data has value too at some point We'll get back on CERN a bit later to so you can reflect on the kind of Data volume they have and how they are facing it The point is Overall companies private companies or institutions that they have all been working hard on Addressing their ability to make sense to process and make sense out of a larger and larger data set and This happened over the last 15 or 10 years has been a big a big a big shift in this and For this they relied on people you me anyone but Talented people which we can split into let's say four communities for the sake of this presentation I will split them in four the first ones They are software engineers The software engineering itself over the years they have opened up You see the time of the solo software engineer that has been working on these monolithic program is over a lot of Jobs and positions have Softwareized themselves if I can say I don't know if it's the exact and in good English for this, but you'll pardon me I'm French This code if you wonder because I see some people like this is actually JavaScript It's a JavaScript code implementing some donut counter or some sort and And it looks to me that in JavaScript at least it's pretty hard or it requires a lot of code to come donuts But that's that's another topic for another talk science and scientists they got More and more of it and they got crazy about data data manipulation Wrangling and making sense out of all these new data points that they keep having having and having A great example is the LHC the large Hadron Collider at CERN Just to give you an idea this year it is expected to produce 50 petabytes of data of new data Just for the LHC itself Obviously they can't process it Only on the LHC infrastructure, so they created the worldwide LHC computing grid Which is an awesome project there is a talk about this one of Oliver Keeble at dot scale 2016 if This kind of topic interest you I encourage you to to watch his talk It was very interesting and they are doing a great job but it means that science and Scientists engineered themselves as well that they're they're growth in engineering and software Isation let's say has grown as well then this data It has to be stored hosted made available and this is the kind of work that C-SOPs and NetOps do They are the foundations of the data flow and They didn't use to interact much with with the others at first But today if you think about yourself or in the community you see that this is not the case anymore So it changed over time and that's what we are And I will be trying to give you a glimpse on how it happened and how it influenced Python along the way Last but not least obviously that analysis and the data analysis field has grown a lot half of you want to be a data scientist or engineer or whatever it is that you can put as a suffix of the data world and It's for the first and the simple fact that Data has value when it's made relevant data itself that sits there and nobody watch It's interest nobody So what that's what data analysis is about and it's a it's a real field, right? So we can see data as the cornerstone of their relationship over the years they started to work more and more with each other and While they were doing this they were trying to address the same kind of problem at some point Which was data centric so they became data aware over the time and they interacted more and more together Here is a good example in a European Python conference When you interact with people you need some kind of common tongue it makes sense, right? Just like I'm French. I'm talking to you in English and you are from all over the world and you have all your mother tongue But at some point we agree or we converge to a common tongue because this is the best way to interact and to Make sure that we can understand each other and that we can face the same challenges when you think about companies or Institutions So while building this relationship They adopted a common tongue named Python It's not the only one, but we can fairly say that it's the main one today So I'm gonna try to reflect on how it happened. It started with a simple definition a general purpose programming language Simple syntax that makes its code easy to read and learn so it has a low barrier on tree especially for scientists or Network people will see It has a huge array and this morning we heard Nicole talk about pi PI and This is the great greatest example of all about this We have a great array of third-party packages and libraries available to us and the community keeps on adding them So the term language is important In this presentation, it's it's a really like a tongue Let's focus on those two first Python was not and is not the only language used in software engineering and this is not the purpose of this but Let's let's assume that it was already pretty Pretty pretty strong in the software engineering on the system and network operation side They use the most per and bash Okay, 15 years ago, even maybe you still 10 years ago There were pretty strong languages in this in this in this in this field then 2009 happened You got to keep this year in mind over the slides because you will see I will highlight it, but it's actually a key year The DevOps culture emerged When I say emerge, I don't say it is a common a common ground I say it's just starting to kick in and starting to influence the industry and Service-oriented architecture design emerged as well at the same time This DevOps culture Influence both worlds. So the software engineer started to collaborate more with system and network ops and They over the year over the years adopted a common tongue as well The C-Sops they use bash to interact with system pearl to parse files and manipulate files And they found in Python a good way Batteries included we often say to do this as well. So it helped In this DevOps continuum, it helped both worlds to collaborate How did it influence Pete Python? Sorry The first thing I think is 2010 we finally saw Python for the web becoming strong With the advent of this pay-payton with ski 101 That's where we you heard you started to hear about flask you was he unicorn on the software engineering side and that's where you started to hear about fabric which allowed cis admins to address a large number of servers Programmatically then they evolved and 2011 and where the uncivil and soul stack appeared Which are today very very very very strong in this community and they are both Python based So that means that C-Sops and NetOps Thanks to Python started to have a language that they could use to do their own work as well and it Bounded software engineering with them mean at the same time while doing this Python also became the de facto language or driver libraries to client server interaction You who you wouldn't think of a new server a new database coming up today without its Python driver, right? It's granted. It's almost granted now and it happened over the years like this now scientists They have been early adopters of Python actually and and and the first thing because of the low-level Entry barrier they have a pretty solid base understanding of computer science But they are not programmers So they were sitting on a lot of C and C++ libraries that they used they were using also Fortran are and But when they started to use Python their first move was to work on a numerical computing library We do himself worked on this library called numeric back in 1995 and Here for them and It is a key point actually the main power of Python. It is ability to interface with C and C++ This allows and this allowed numerical computing to get where it is today and it has helped a lot Python to become what it is today because data Manipulation and interaction is is is is very important in America, and I'm not even talking about AI and So Python had them has a glue the the the right level of Interface To their lower level optimized or very specific libraries and their day-to-day work. It was a Very well a very good glue So we started to hear about I Python back in 2001 While doing this research. I was pretty amazed by by how how far it goes back to 2006 NumPy so the numeric library finally evolved into NumPy and 2009 again It's pretty fun to to see it is where pandas and scikit learn were born and then we have also institutions Non-non profitable Organization like NumFocus that started funding Development around scientific libraries and the scipy packaging helped as well Put everything into into place and make this more and more available so the scientific Community amazingly quick developed the foundation of what that aside of data science computing and data science in Python and It is really interesting to see in retrospect that it happened or the key Libraries that we use today that are the foundations of that as science are have been Emerging at the same time as DevOps did and and service oriented architecture What this means if you see the three of them is that the things that were imagined in imagined and developed Could be running production It's a nice continuum and Python helped in free defying this and got influence By this now data analysis data analysis it was done a lot in pearl Jiva SQL are a lot of some others as well, but let's say it's the major ones They are close to science because they have a strong numerical background So we saw the pie data emerge as well as NumFocus Emerge and they started organizing pie data. This is the pie data track actually I didn't fill in the talk as pie data at first, but it got classified as this So I guess some of you may be here because of this But so they started to converge with science Those guys there were also stock and stock and market analysts okay fintech or fintech science scientific infintech Started using using Python more and more but at some point They needed they had an increasing need of infrastructure because their data set was constantly growing So this led them also to shift up Close to the CIS admins and the network operations as softwareized and operationalized them Themselves to scale to match the scale of data Sorry Then this technology and infrastructure around data is now pushing this is pretty recent It's it's it's like three years old that it's starting to to kick in data ops culture and even driven architecture that which I will get back to layer alone, but This technology and his infrastructure it's pushing for a DevOps like movement around the data analysis and the CIS ops and It's converging towards the event driven What's interesting is with this field and you are a great representation of it And if you talk to a lot of people in in this conference, I think you will you will feel pretty much the same is that Everyone wants to be a data scientist today. It's the new cool kid We are there because I'm not one of them. I'm from the CIS ops and networks and net ops Field This is the next wannabe. Okay, everyone wants to be a data scientist. It's it's it's it's it's pretty nice actually There is a good reason for this if we if we look at this we can see that they are a good mix of what tech is today It's daunting science with strong engineering making sense Out of more and more data So it's attractive It's it's fascinating and in in a lot of ways So it's no wonder that people want to be a data scientist and then what we are seeing as well and Maybe this is a warning for you the DevOps guys. They say, oh This is cool as well. So they come with cables attached But still they also want to be in that field So you see a lot of DevOps guys today or that claim to be DevOps because I don't know if it's a title or not But that's another story That want to do data analysis that want to be data scientists Maybe there will be data engineers or whatever it is that the suffix will will be Anyway, Python if we can see in here has become everywhere 2009 don't ask me why was very cool in this regard it it brought a lot of the fundamental things and Python got influence and a lot of key projects started and cultural changes started at at at the same time and so the general purpose programming language and its vibrant ecosystem Elvolve too much everyone's needs and it's more than Tech people or scientists people. Okay, what's fascinating about Python? Python's popularity is he got beyond the bar of programmers. We have a recent article at the Economist start It's I think one week old so I see it pop while I was working on this presentation and it says This is the popularity of the Google trends on on Python over the last eight years so We can see that it's pretty much in line 2010 was a key a key year for for for Python popularity so it's a bit of the lag between a culture emerging and heat Doing some actual influencing more and more right So Stack Overflow observes a 25% growth in the wheel of learning Python Kodak amademy also observe a growing interest in the populations that are not take oriented like marketers college lecturers Bank analysts and the Economist say themselves that their own journalists use Python to to crawl the web or To to do some some tasks Python is finally also now widely teached at school And it often replaced or completed Jiva courses because it's a more gentle introduction to to to programming So the popularity is in line with the ecosystem over the years and it's not only about take people which is good news and This slide is pretty new in the presentation as well Because since he resigned as our BDFL Guido has been a lot in Interviewed a lot actually by by the media Including the economies that I mentioned earlier, but this photo comes from an article from Le Monde That I discovered this very morning and it was published yesterday and In the interview the journalist explains what we already know about Python basically what we just have shared right now right now and it's prominent usage in the data world and Guido then gives his own Explanation on how scientists got the language here. I Can't tell you how miserable I would have felt if I said something different and I was I prepared something different That I was just trying to explain to you but he didn't so I Will go a bit mad. Okay, and I can fairly say now that this presentation is Guido approved Anyway, so what I find fascinating in in that it's Actually this sentence It can be turned inside out To understand what is happening today. We have seen what has happened and where it has led us and what we can feel Around us and around our community But if we turn it inside out We can see that while shaping its ecosystem and the Python ecosystem Data has actually taken over the world to become the main subject of concern for every citizen Take on that So everyone is feeling wary about so now our data is Rising and influencing the Python communities on its own So I am a happy man Since as a member of this industry have been waiting 15 years for this and Today is the data day. So today is our day But this era Is full of pitfalls and I'm not talking about the fines of GDPR that you can face and multiple millions of Python Is now so prominent to dig to interact with Python that it now has duties To keep up with the challenges and to remain as useful as it is for everyone today We have this duty as a community now So in the next slides, I will show and I will highlight the challenges that Python has to face to to to to meet this these duties Facing other languages in the in the data industry And I'm not here telling you that we should replace them and prevail and that it is a war Okay, we are an inclusive community what we want is people when they come and use Python as a language They don't feel trapped into it. So they should be free to go and And use another one that is specialized in the in the use case that they want to to address But that doesn't mean that we have to benefit from the status quo and say, okay, Python is just doing Good this job and we are done. Thank you victory The first one I want to emphasize is this one the way that we build and deploy apps and platform changes So we can look as those software engineering and network operation guys the DevOps culture It's still maturing and the technology around it has evolved over the time. I Don't know if you went or tried to to get in the Kubernetes talk yesterday and Kubernetes world in the in the in the in a tall title almost gets you Accepted today, right? It's it's popular. Every no, everyone Knows that it has to understand he has or she has to understand What it is and how it's gonna influence is work tomorrow if it's not already influencing it today Cloud is also a good example of this because we started using the cloud as virtual machines running somewhere else and now we can use it in Very different ways. So the challenge is here is still packaging So I was delighted to hear that we have a working group on on this this morning When when Nicole talk on the keynote that we have a working group still working on these issues There is a lot of talks talking about packaging issues or packaging nightmare around the Python Conferences and I hope that it ends someday Stand alone build and runtime because the deployment and build see process is important So it creates nightmare and the yesterday's talk about Kubernetes was a was a good example of this Here go shines and is clearly in advance from from Python performance we still have it We still have the gill We we we we we have not Come over this and and Antonio is in the audience and from the pi pi project And I thank him very much and then all the team for all the amazing job that they do on pi pi But it's still not mainstream. It's still not Python right in the sense of we can't label Python as a very performant runtime just like java of v8 Did java did it so we can we can do it as well as a community and Distributed applications as well is not still not our strong suit in pi pi We don't have a lot of libraries related to to ease Distributed application coding just like java or go which is intrinsically meant for this do Then don't mistake yourself Data science when you go to a data conference and the data community itself All agree on this Operating data science putting data science putting your models in production Is still not solved It's still a big challenge. It's not about having results. It's not about the science in itself It's about the continuum and running it for a long time models they show and Different behaviors over the time why because data changes over the time and we change over the time So the input data is not the training data the production input data is not the training data So this has to evolve over the time always always always which is related to how you build and deploy your applications It's related to how you build and deploy your models and how you operate them So the data science and operating data science at scale is still not solved Here we have Two main projects the best one I think is Jupiter obviously changed the lives of Millions of people I think I can fairly say this and it's the de facto and number one and it's Python based Thank you guys. It's awesome Distributing pandas using desk It's pretty good, but the challenge that you face and the last one is TensorFlow. Okay, so it's not Python based Per se the challenge is here is that all the production Environment that run this kind of code. Let's say it's such as addup or that schedule this time this kind of code They run under the java. They are java is prominent in this in in this field in the infrastructure that runs it for real so The challenges of Python is integrating with them being able to run smoothly and to to be buildable and Runnable in the java world Performance is still a problem. That's he's trying to tackle this and scaling He's trying to tackle this. There is a ray project that was mentioned earlier as well that he's doing a great job Especially for data scientists, but here we still can't beat spark and scala Graph computation as well. We're not very strong Jva go and and and JS are still the the leaders by far in in this field. We are we are Making progress, but but but we could be way better as a community, I guess data parodying and The data parodying is changing to the event-driven architectures This is important to emphasis. This is happening now. This is happening as we speak and it's Two or three years old It's coming back in front of the scene, but it's it's coming the main technology is Kafka, which is written in java and scala and So we have the same kind of problems in here The the ecosystem around this is java based and scala based you wouldn't think of Writing a distributed database using Python, right and People don't think twice when it goes to it and they do it in java and go So their ecosystem is better for this. We have to work on making sure that we face this challenge So this is a quick takeaway because I want to keep two three minutes for the questions devopsculture 2009 the key year let's say 2010 piton power web is is strong 2020 the structured data in python we have we have good fundings for this 2015 data ops and most importantly event-driven and AI emerge. So we are in the middle of this This data-driven era is coming and Right now we are influenced by the data challenges that the language and the position that we have acquired over the years as led us to to face So keep on rising. It's very promising. It's very exciting to be and Passionating to be in in this community. I'm very proud of all the all the way that we we've gone through We have a lot to do and of different fields and collaboration is key We have made it so far by collaborating. We need to make it further by collaborating more So I I hope and and trust that our community will meet those challenges Thank you very much Thanks very much. So we've got time for some questions Hey, so what do you think are is the most Challenging problem that the python community should solve to stay relevant or become more relevant. I Think it is the runtime integration it's making sure that we integrate well and and and and scale well in the in this java and scale Ecosystems that are growing and that are leading so it's not about let's do it on our way and and and let's compete with them Frontally, it's making sure that we keep at being the best at what we are already the best interacting Right, so we need to be there. We need to be better at this Kafka streams. They are still written in java There's attempts and good things about Python, but it's still not Python native It could be It should be in my opinion because then you have your all the rest of the ecosystem available to you inside those platforms and Kafka on the event even driven architecture in this Culture is Getting stronger and stronger if you don't know it you should This is the this is the topologies that will that will that are Happening that is where everyone converge on the data side Just like everyone converge about the app and deployment around Kubernetes. Okay, is the data side of what? Kubernetes can be for the software engineering So we have to be good there No Just a connection connecting to what you mentioned about the Python should be also be best in this place and I had a course with Hadoub it was in Java and in talking about how to distribute the Computers, it's very painful to be honest using Java interface and then you just Lose the expressiveness from Python. I'm thinking it's any appetite I fall in your community to have really Python like alternative to to hadoop Not to adoop itself The work would be to be but scheduling it interacting with it. Yes, but to be honest today Scala is seen as the alternative So instead of doing java pure java just like maybe you did And because Python is not so great as interacting with it a lot of people go with scalar Which has a higher level of abstraction than java so you can Do things more quickly and more fluently. It's more readable So it's seen today in this industry as the good alternative That's why spark is very strong in in distributing processing and while Python and desk is still not enough Thanks, any more questions Okay, what can we thank Alexis again for his really interesting talk