 So our next speaker is Benze Arato from BI Consulting. He also organizes the PiData Budapest Meetup, and he is a visiting lecturer at the Central European University, where he is teaching courses on data visualization and visual analytics. So welcome, Benze, and let's start. Thank you for your kind introduction, so I can kind of slip my first slide. Just one thing I might be able to mention that this talk is kind of in the intersection of two of my kind interests, data visualization and working with Python. And during the talk, I kind of like to share my experience what I had when I was learning about different options which exist in the Python world for visualizing data. Let's start with a bit of thank you. This talk would not exist without the help, the works, the presentation, the talk for many people from the Python data visualization community. So thank you for all, and special thanks to Jim and Nicholas who was kind enough to provide feedback during the preparation for this talk, and my colleague Annette who helped with the code examples what you will see during the talk. So thank you very much. And the word of caution as well. At first the Python data with landscape is quite large. So this talk will only cover a subset of the libraries and not everything which is there. All libraries have pros and cons, and evaluation is always subjective. So what you will hear is actually my point of view, my personal opinion. You might have different preferences for how you would like to visualize your data. And finally a technical comment that the code samples was prepared on Google collab and works well there. But if you want to try them out in different environments, you might have to make adjustments. All right, first part introduction. The reason for this talk is came back to this very nice chart when I was researching the different options existing the Python visualization landscape. I was a little bit confused, but there is so many libraries that connected in very interesting ways. There seem to be libraries doing the same thing. They are libraries with different syntax. So I set out to kind of structure them in a way to create a map which will help me understand what kind of libraries are there, which are the main libraries and how they are built on each other. And this is what I will share today, but a bit of preparation before the main part of the talk. Many times we mentioned that some libraries are more low level detailed, whereas we can say imperative, which means when you have worked with the library, you have to exactly tell to the computer how you want the chart created with the exact steps. And with other libraries, you can go to the decorative way where you just tell the computer that what should be done. I need a scatter pod for two of these variables. And the library will kind of find out the exact details how they would work, which feels typically used by the higher level libraries when the code would be shorter, easier to use, easier to understand. The examples in the talk would use the penguin data sets. It's a very nice and funny data set recently released. It has data on several observations and different kind of penguins. So this is how it looks like loaded into a pandas data frame. We have a species information, which kind of penalties, the location, some measurements, other categories. This is the data set that creates all of the example charts. It is loaded into a data frame called DF. That's what you will see in the code examples. So the first major part of the talk is the charting libraries. And I try to group them together because I believe there is separate group of charting libraries which is connected to each other or building on each other. The first and probably the most well-known library is Matpotlib. There is a saying that everybody starts charting in Python with Matpotlib. This is the chart. This is the library that has been around for the longest time. It does many, many things. So it has a large number of use cases where you can use Matpotlib for academic publication, to data discovery and creating working libraries for any kind of digitalization need. It's a very nice thing in the digitalization world in Python that almost all of the libraries has a nice gallery where you can see at least what kind of chart is supported by the libraries. And also, typically if you click on a library then you will see a code sample so it's very easy to understand what kind of code you have to write to create in Matpotlib a bar graph or a horizontal bar chart or any other kind of chart which is supported by Matpotlib. All right, let's have a look our first example chart. So here what we create with every library is a scatter plot which kind of compares the build length and the flipper length for the penguins and we color the different species of the penguins in different colors and we have a legend and some axis titles. And in sense the method of the code is quite straightforward. We are importing the library. We are setting an option about the figure size. Then we are looping the species groups. So every kind of species we are adding a set of markers on the plot. Then we are showing the legend and setting some axis labels. It's understandable, it works well but you will see that higher level libraries will offer a much more shorter syntax creating a very similar chart. So let's start building our data visualization map. The background is Matpotlib is Matlab, the Matlib software package. It's an open source and option of variation of that and it is a charting library. And you know, in a sense you can use Matpotlib, you don't have to use anymore. But there is two major challenges working with Matpotlib which is a very common finding with Python users. The first that the Matpotlib because it's an imperative low level library you have to code many things manually and sometime you will wish for a higher level library easier to understand syntax. There is a Q&A there but anyway we have to continue on that. So one set of related libraries to Matpotlib will try to solve the first problem the low level, very variable syntax. Seaboard is one such library. The goal of Seaboard is to provide a higher level API or interface on top of Matpotlib. So it would be easier to create charts and also has quite nice visual defaults. So typically a Seaboard chart would look nice without any kind of further adjustments or settings. It has a special focus on statistical charts as well. So if you look for statistical work problems Seaboard will have the chart you need. There is a gallery as well but I will mention them just flip through them. And okay, let's have a look at the Penguin chart in Seaboard. You can see interesting things there. There are still importing Matpotlib because some options not present in the Seaboard API so it kind of goes down in the Matpotlib level and set there for example the figure size and the other thing you can see here that the actual code for creating the Skeptor plot is indeed much more shorter and I would say more easy to understand. You just telling Seaboard you want a Skeptor plot you are setting the variables you telling Seaboard that you want a Skeptor byte by species and the data is in the data frame called DF and the chart is done. The other library I want to mention here is Plotline Plot9 has the same goal it's one to be a higher level library on top of Matpotlib but this comes from the philosophy from the R word and GG plot. If you ever worked with R you will know that in the R word GG plot is the data visualization library which is a very nice syntax so people who are coming to Python from R and has experience creating charts in GG plot then they will find plotline very very similar to GG plot and it would very easy for use for them because the syntax very closely copies the GG plot syntax. As you can see in some example it is a high level library you exactly need just three line of code to create the plot you are importing plotline and then you are setting the variables and then you are coloring them by the species and you are done everything else like the axis labels or the legend with the species is automatically generated you don't have to specify them manually. Okay there is a few summary slides during the presentation this is for later viewing so this is just when you review the presentation you can see what I told in the talk. Let's have a second look at our map so we have a core low level library and if you wish to use something which is a higher level easier to use then we can either go to Seaboard or if we have we already have no experience with GG plot then plotline would be a good library for them and this is the first group of libraries libraries a total and a higher level of wrappers around Matpotlib. Let's move to the second group if you are I told that there is basically two challenges related to using Matpotlib the first is the low level syntax and the second there was for a long time lack of any kind of interactivity interactive features so if you wanted to have like browser based javascript over visualization where you can have interactive features Matpotlib doesn't really did that that was the reason for the creation of bokeh which is I would say the probably the second most widely used and well known data visualization library in Python bokeh was created to be able to do interactive charts in the browser powered by javascript but it is actually a little bit more than just a charting library because the bokeh library also has interactive controls or widgets so you know sliders checkbox radio buttons and it has features for building applications as well so you can use bokeh as just a charting library or you can try to build complete applications and dashboards just in bokeh we have our gallery as well there with examples where to check out and then if you get to the code again I would say there is two things to see here the first that the bokeh code is kind of not short so you can see again here that bokeh is a lower level library where you have to specify many things manually you have complete control how bokeh works but you do have to learn the syntax and write the code to set every option the way you want and the more interesting part on the left side you can see the chart there the top right corner there is a toolbar this is where you can get the interactions you can zoom in into the chart you can select points you can save it and there is many more interactivity feature of bokeh this is just a presentation so I can't show the interactive feature to you but I will be share the Google Collab notebook with all of the example charts so you can try the interactive features for yourself after the presentation has been finished alright bokeh background let's move forward what we have here is our map so now we have bokeh as a second core low level library and now probably you can guess what's coming up of course there is higher level wrapper libraries for bokeh as well which would make it more easier for creating charts in a more short, more dense syntax the first I want to mention is holoviews it is a way for creating chart just by telling holoviews how your data look like and the chart would be auto generated for that an interesting part for bokeh is that for holoviews I'm sorry that it is supporting not just bokeh it also supports metapotlib as the documentation say there is different backends for holoviews the key point is here that once you learned the holoviews syntax and the holoviews AP course from creating different kind of libraries then just with the single line of code and maybe very minor adjustment you can have your chart generated either as bokeh base charts or metapotlib base chart there is even a new support for the plotlib backend so holoviews try to be a really an overall library which will serve as a common unification interface for several backends and if you are moving to the holoviews code example the holoviews penguin chart you can see that again this is just three lines of code because it's a high level decorative library we are importing holoviews we are setting the backend to metapotlib so the charts here is actually rendered by metapotlib and the actual chart creation code is just one line we are telling holoviews that we need a skeptor plot passing in the list of variables and setting some options for coloring and figure side much more shorter I would say than the pure bokeh version from the same chart or pure metapotlib version from the same chart there is actually holoviews is a high level library which wraps both bokeh or metapotlib based on your preference but there is more you can see there is some white space below holoviews because there are actually more than one high level rapid library for bokeh let me quickly flip through them there is hvplot which aims to create a common charting interface for several python data containers so the id is here then you can visualize your data stored in pandas or in dusk or in x-array with exactly the same syntax hvplot will read your data from the data container and then generates holoviews object so hvplot is based on top of holoviews and the actual charts will be then visualized by bokeh and now the interesting option is pandas bokeh library which is an independent project done by patricklobile if I pronounce it right the goal is kind of the same you have your data is in pandas or geopandas or python data frame and you want to plot them using bokeh then you can just import pandas bokeh and then you can use the plot bokeh function so just a single line of code for your data frame data visualized in bokeh it's a bit same in context same in goals that what hvplot does your data is in a data frame you want to visualize in bokeh the most simple way it's possible and it was very interesting to see the sponsor announcement for spotify because another chart for working for bokeh data would be chartify and it does come from spotify there is a very nice blog post where they described they wanted to have a standard highway easy way to create charts for spotify for all data scientists and they did not find anything which really liked so they created their own which is open source and that's chartify I don't have an example for that but on the chartify website there is very nice examples how you can use chartify to chart your data okay our summary sites for later viewing and let's have a look at the map so now we have two low level core libraries matplotlib and bokeh and we have several high level wrappers for each of them let's move forward the third group of pytovisualization libraries in my classification would be the plotly family it is actually several several components building on top of each other coming from the same company the same group of people the lower level libraries is the plotly open source libraries there is a pyto library there is an r library my understanding is there is also a julia library is in the works and then there is the plotly javascript open source library this is all of the pyto or julia visualization so you can create your low level charts in pyto or hopefully soon in julia and they will be rendered by the plotly javascript library if you have a closer look at the python open source I'm sure you will see that there is a huge number of chart types supported this is just the first part of a long list but on the left side you can see that there is fundamentals basic chart, statistical chart scientific chart, financial chart map there is even support for 3D charts and my understanding is that it's kind of a unique feature supporting 3D chart generation which no other pyto library does really well so if you need 3D chart probably plotly would be a good solution for you but I have no code option for plotly because since the last year there is a new higher level library from plotly called plotly express and the recommendation is now that if you want to create plotly based chart you should start at plotly express you can always get down to the plotly level if you need there is something you can do in plotly express but in general speaking the plotly express library would be a more easier and nice experience for creating plotly based charts this is the documentation side you can see that there is more than 30 different functions in plotly express so there is many many chart type which you can actually create just working from plotly express without moving into the lower plotly level charting API let's have a look at the example code this is how you can create the ping-ring chart in plotly express it's very very straightforward you are importing the library and you are just telling it I need a skeptic plot here is my data in the data frame here is my here's my variables I want to be plotted I want to mark colored by species and we are just setting the size and everything is done very very nice sweet short syntax background information and then so now we covered three groups of libraries the map plotly family let's call that way the bokeh asset of libraries and the plotly get and the plotly get object and we have four four set of libraries have to be discussed and that would be a little bit different than the previous ones so because Viga and Viga like those are not libraries they are visualization grammars which means that they are way of describing your visualization appearance and interactivity in json format in json objects Viga is a low level very detailed specification and there is Viga light which is a little bit more which is more structured easier to describe a standard chart you can see here that the bar chart in Viga needs a very long description about the hundred line of codes the same charts in Viga light just half a screen because Viga light is a higher level way of describing chart and there is a Python library for generating those descriptions which could alter this is definitely built on top of Viga and Viga light so if we move to the code we can see that alter it is very very high level we are just importing them and there is just one command to create the chart we are telling the alter that we need the mark to be circles we are encoding two variables at the x and y axis and we are encoding the color by the species variable and we are setting some properties so this is our final version of the charting libraries part we have four major way of creating charts and the wrappers similar with Bokeh with the protley line of code and Altair as a fourth way and we are almost finished here but let me just mention that sometimes you need not just simple charts you want to build dashboards or data apps which has typically more typically more than one chart and that's some interactivity controls selectors and probably a way to run code and update your dashboard or app with the results of the code and there is four options also for them I'm just mentioning all of them there is protley from protley dash protley dash from protley the company are working with our creating protley and it's let you create nice dashboard kind of visualization and apps is an example you have your selectors on the left and the visualization key PIs and charts on the right there is a panel which is anaconda related I would say the same people are working on that who are writing collobius a very nice feature of panel that supports inclusion of different kind of charting libraries on that simple dashboard you can see that the same chart is created in HV plot in Altair in Metcotelep and in protley and they are displayed on the same dashboard with some interactive controls so panel is a friend to everyone there is Voila there is a different approach a very nice approach I would say Voila aims to be to provide an easy way to turn your Jupyter notebook into an interactive web apps so this is how something this is how Jupyter network look like after working in Voila your code cells is kind of hidden so the end user just see your widgets and visualization results but they don't see your code they can't run any kind of code it's a quite sacred and quick way to publish your Jupyter Jupyter notebooks to to a larger end user larger end user audience and the final one is Streamlit this is a very interesting library this is the newest kind of the block it's just 10 months old but has a very large structure and they like raised 21 million dollars in funding very recently they are very strict very much targeting the data scientist and data engineer community they say that this is the fastest way to build a machine learning app so if you have a border for example a GN for generating images then you can wrap it into a Streamlit app on the left side you have your parameter controls and when you move them then the image generation algorithm will rerun and you will get a new modified image I definitely would recommend to try this in your own browser and show how that works so this is our final map we are coming from different backgrounds there is core level libraries which give you absolute control but you need to learn the syntax and learn and probably write longer code bases there are higher level wrapper libraries which would make working and creating chart easier and if you need more than just simple charts then there is dashboard writing up frameworks four of them and all of them worth checking out final steps at first I want to mention the PIVIS org website which is an open guide to all Python data visualization tool so I mentioned that I don't cover all libraries in this talk because of time constraint but the PIVIS website try to cover everything so you can see here many other charts being mentioned there is very nice statistics about the libraries you can see how many stars it has how many contributors are there what is the download numbers the core libraries or the dashboard libraries there was a good talk at the Anaconda conference in June I would recommend watching that as a next step it's a little bit more advanced than my talk so if you want to learn more about the Python data visualization landscape then I would recommend watching Jim's talk there was a meetup last June where three of the contributors have spoken and you can download their slides and code examples from my blog post I wrote in them and with that we are wrapping at the talk the conclusion is that we are living in a golden age of the Python data ways there is many great libraries and very active development there is a very nice and welcoming community good cooperation between the different libraries I'm very much looking forward to the next five months in terms of the Python data visualization landscape material will be posted I'm very happy to answer your question in Discord you can also find me on Twitter or LinkedIn if you want to chat about data visualization on Python thank you very much for the attention thank you very much for your nice talk we have time we have many questions actually too many questions we will not have time for all so let me for the answer can you share the slides so you don't have to answer this anymore one interesting question is which of these are free and open source Jonny Zhang is asking that question I believe that basically everything which was been which we show here is free and open source I know that probably has some paid commercial options as well but if we go here that basically everything I believe here is free and open source the plotly has not open source paid enterprise option but my understanding is that everything else this here is free and open source then another question was what is your favorite visualization package it's a hard question because I definitely prefer the high level libraries so I try to avoid learning too much syntax from a specific level libraries and I would say that for me all here is very nice plotly express is very nice HoloViews also very nice I'm not much an R people so plot9 doesn't really provide value because I don't know R but I would say that HoloViews plotly express and all there all of them I use them very happily so you can go wrong with any of them and the question from Paolo Gomez what are your thoughts on a patchy superset it's a very good very good question so there is actually two other open source project and a lot of them are patchy supersets which is I'm not really familiar with them but my understanding is that supersets try to be more of a business like dashboarding tool but I can't really comment on them but I can say that I do see superset is being used by a number of companies so it has a strong following and a strong user base but it's not covered in this talk and then we have the question from Bois Young Chua sorry for the pronunciation with the many libraries which are the ones do you see usually I would say that the four main set all of them are being quite widely used matplotlib everywhere many people just know matplotlib and probably seaborne bokeh is very widely used for that actually I would say that the best way not ask me but if you check out the on the pyvis website the statistics that you can see that either based by stars or by download you can see that matplotlib is like everywhere still everybody's using matplotlib 10 million downloads per month but potlib is quite close to that especially on pypy like 3 million downloads and if you go down to the dashboarding level then the downloading number would be dash which is the most well established library is probably the most widely used all of them you can see 300k downloads per month and others would be like 7 like 70s and 30s and 27 downloads but you can see the market dynamics between streamlet streamlet is 10 months old it already has 8k stars on github so this is something I see to be watched okay the time is up actually there is one more question with the thumbs up so the really last one do you have any recommendation for geographical data unfortunately that's not my expertise I do know that there are specific libraries targeting that so probably for that kind of use case you need something special but I will have to kind of search or look it up let's come to discord and then we have a discussion there exactly let's go to discord I am sure Benze will answer all your questions there you can ask him all the remaining questions unfortunately we didn't have time to cover all so thank you very much again