 So, before I get started, I thought I would share a little bit more about myself. I am, like was mentioned, I'm from Hawaii, Zimbabwe, and Zimbabwe is a country in the southern part of Africa. It's just above South Africa, and I've lived there for most of my life, very interesting country to live in, but actually through here from Zimbabwe. Another thing about me is I'm a developer advocate. I work for Voltron Data, and I just started Developer Advocacy about a few months ago. I started last year in November before Developer Advocacy. I was working as a software engineer at NVIDIA, and I was working on a GPU-based data frame library called QDF, so very focused on performance and speed with Python, but I really enjoyed engineering work, but really was also interested in Dev Advocacy and decided to kind of switch, and so far it's been good. This is actually my first talk as a developer advocate, so hopefully it goes well. And then a third and final thing about me is that I am the outgoing vice chair, or the outgoing vice chair at the Python Software Foundation and outgoing director. And if you've never heard of the PSF, the PSF is the non-profit organization behind Python, and it does a lot of stewarding of the language and of the community and making sure the community is inclusive and really, like, good and safe for people in different parts of the world. And throughout this talk, if you're enjoying the talk or want to ask me a question or something, feel free to tweet at me. My handle is Marlene underscore ZW. I'm pretty active on Twitter, so feel free to DM me or something if you have a question. And if you want to visit my website, it's just marlenemangami.com. So when I was thinking about, like I mentioned before, that I'm a developer advocate and something that I enjoy doing even before I started Dev Advocacy is traveling and speaking at different conferences in different parts of the world, and usually because I come from Zimbabwe, it's quite a long flight. So I tried to add some days either before or after the conference to explore around the city. And when I was thinking about coming to Dublin, I thought a cool thing to do could be to explore the public art that's around the city. And when I was thinking about this, because this is a Python conference and this is a talk about databases, I decided to go online and Google open data sets about art in Dublin. And there was a very cool website called Dublin that I found. And the website kind of shares different data sets, open data sets about information about the city of Dublin. So I found this really cool project on the website called Dublin Canvas. And what it does is it's actually an initiative by the city that invites artists from around Dublin to, I guess it's actually more than just Dublin, it seems, but they invite artists to create art or apply to create sort of different canvases where they paint old traffic control boxes that were gray and they paint them with different colors. So that's like an example of one on the screen. It's not super clear. And it's been kind of, has anyone noticed these across like this one like really like outside? So it's kind of been cool to see those as I've actually been here, but decided to work with this data set. They have CSV files and a GeoJSON file as well. So I thought this could be kind of cool to explore with IBIS for my talk as well and learn about art in Dublin. So when I was doing this, I decided to create a MySQL database and try and create like relational database. And I faced like several issues when I was trying to do this. And for me, some of these issues are one of the first things that I came across is that I wasn't as familiar using MySQL because I had come back, come from like a software engineering background. So I had done a lot of work mainly on the back end of things. So I was very used to like bug fixes and, you know, feature requests and things like that. But then now trying to write SQL code, I felt super unproductive because I was like writing SQL strings and Googling all the time. It just wasn't feeling like this is not Python. So it wasn't feeling good. And then another thing is that I had to like consistently be in my terminal. So when I was taking a look at the art, I wanted to like visualize the data I was seeing. And for that, I'm used to using like a Jupyter notebook and it's easier to share the information as well. But when I was using MySQL, it felt very restrictive in terms of using the terminal only. And something else is that there was geographical data that I wanted to work with. And one of the issues with using some of these databases is that data types are not really preserved or you can't really do certain things with databases with MySQL. Data types, like geographical data types. And so that was an issue. And then finally, I also thought as someone who's used to using Python, I didn't think that it was like SQL strings were very maintainable in terms of writing tests and things like that. So I don't really have an issue with SQL, but I felt like this was not very Pythonic in my perspective. So I kept thinking to myself, there's got to be a more Pythonic way to work with databases. And so I was introduced to Ibis. So what exactly is an Ibis? In real life, not in Python land, an Ibis is a bird. And like I mentioned before, I live in Zimbabwe and maybe once or twice a year, I will go outside of my city and I will go to places where there are animals. And one of my favorite animals is an elephant. And elephants are these really big creatures. And sometimes I see these tiny birds sitting on top of an elephant. And sometimes those birds are ibises. And so this really striking image of this really big elephant with a tiny sort of elegant bird sitting on it is a good metaphor for us to think about when we are trying to understand what Ibis is in the Pythonic sense. So in Python, Ibis is a Python package that provides us with a more Pythonic way to work with databases. And the purpose of Ibis, it was created mainly for analytical SQL. So if you're someone who wants to do computations on larger datasets or things like that and you want to do this with Python rather than using SQL strings, this is a really good option for you to choose as well. And another thing as well is if you're using, particularly if you're using large datasets that are able to, maybe your data is able to fit on Drive but is not able to fit in RAM. Oftentimes a library like Pandas is not actually able to let you load in your data and edit things or run computations on your data. Pandas will just crash. And so Ibis also solves this issue. And I'll kind of talk about how a bit later. And it was also inspired by deeplier. So if you've used R and you've worked with data in R, you've probably used deeplier before. So Ibis is very heavily influenced by deeplier. So how does this relate to the metaphor I talked about earlier on? Well, we can think about it like we can imagine maybe the Lion King or something like that and there's a tiny bird on top of this big elephant and the bird is telling the elephant where to go and what to do. So it's like maybe saying, go left or go right and take me to the watering hole and the elephant is just listening to the bird and like doing all the work, maybe moving stuff out of the way and things like that. And we can kind of think of Ibis in this way doing that. So the idea behind Ibis is that we want lightweight Python code to be able to tell a larger backend or a database engine what to do and to do the heavy lifting for us. So instead of pulling all, so usually what in pandas if you're using a data frame library like pandas, and it's not just pandas actually, there's lots of other great data frame libraries in Python as well, that usually you would pull all your data into memory for example, and then pandas would be the one that would do the heavy lifting if you're doing things on your data like filtering or transforming that data. And so what we want to do instead is we want to allow users with Ibis to write Python code and that Python code is then translated into an SQL string or SQL alchemy expressions and that those expressions in those strings are then what then tells the database engine what to do. So the data actually isn't then loaded into memory, the database engine then just carries out the operations. So this is a lot more efficient than loading in your data into memory like with another data frame library. It takes a lot less longer and oftentimes can be done like with a lot less effort and things like that and can maintain data types as well. So that's also a really positive thing. So let's actually look at some Ibis code and see what this looks like in practice. Here on the screen, I hope you can see it. I actually feel like it's kind of small now, but sorry. But this, I will try and have a library or something available, a notebook available after this talk so you could always take a look at the code. But basically what we have is an IPython terminal that's up there and this code is very like pandas like. So the idea with Ibis is to make it as familiar as possible so that you are very used to being able to use the syntax. So the first thing we're doing is we're importing Ibis like you would with any other library. Then after that you're sitting interactive to true and all that does is that it makes sure that when you run an expression you're getting the results printed to your console, whatever you're using. So we're doing that and then after that we're connecting to the crunch base database and we're using, as you can see it says like, hopefully you can see it says Ibis.sql light and so you can choose another database engine. You don't have to use SQL light, but you could use like MySQL or something like that, whichever database back end you want to choose, you can do that. And we're saving that database connection to a variable called con. And then after this Ibis then allows us to be able to take a look. You can do things like list out the tables that are available in your database. You can also then choose a table and do different operations on your data set. So here in the example, we're taking the name column and we are finding the number of unique values in that column. And the next thing we're also finding the number of unique values in the region column and you can do things like group buys and things like that as well. And so you can do like usually a lot of the intuitive things that you can do with a pandas data frame you're able to do with Ibis as well. But in a way, so it's really cool in terms of you are kind of treating an SQL table in a similar way that you would use a pandas data frame, but it's not loading that data into memory. So it's not taking as much energy and you're also not having to write SQL strings, which I don't know, depending on who you're talking to is like a plus and some people find SQL strings fine. So it just depends who you're talking to, I guess. So let's actually talk about what this looks like under the hood. So when we start out, like I mentioned before, you're creating your Ibis expression. And so your user is writing that pandas like syntax and then what is happening from there is that Ibis then, whoops, Ibis then takes that expression and it types check, it type checks the data and then it also optimizes it and it converts that into either an SQL alchemy expression or it converts it into SQL strings. And then that string or that expression is gin passed to a database engine. Whichever database engine you want to use, there's lots of options with Ibis as the backend that you can use. And so then that's executed and then the database is the one that does whatever the operation is that you would like to do. So yeah, one of the things that we can keep in mind as well. So for example, if you're looking at this current example that's there on the screen, you can see the dot head method is being called. And so usually if you were using something like a pandas data frame, what would happen is that pandas calling dot head would kind of load all of your data into memory and then it would carry out the operations on all of the rows in your data set. And then it would return just that subset of data from the head. But if you're using Ibis, what happens instead is that because a lot of database engines are optimized to be able to like perform as efficiently as possible. Instead, what Ibis does is there's a dot head command. It actually creates an SQL expression from your first expression in Ibis there. And it adds on a limit clause. So instead, so on the back end, what happens is that limit clause is passed to your database engine. And the database engine only retrieves the rows that the user has asked for in that limit clause. And then it carries out the operations on that small subset of rows and then returns whatever the result is. So as you can see again, it's just emphasizing that it's using the most efficient path instead of like having to consistently like load huge amounts of data into your memory. So going back to our example of elephants. I'm actually, you know, that's, well this is me in Zimbabwe and this is me by an elephant. And just so you know elephants are really big. And they're really awesome creatures actually if you ever meet an elephant in person. But that's me besides an elephant for scale to show you just like how big they are. And in a similar way that elephants are really big, your data can be really big as well. And so sometimes when you have very large data sets, you don't want to use the same database engine. So you might start off using SQL Lite, but you want to switch and use something bigger. Maybe you want to move your data into the cloud and use something like BigQuery or something like that. And so because you want to be able to switch your data, sometimes you also don't want to like change your code a whole lot and maybe do the same operations on the data. But you want to go from a smaller data set to a larger data set or something like that. And one of the cool things that IBIS allows you to do is to scale up or down based off of your data with very few syntax changes. So this is kind of what it looks like in practice. So the first cell there that you can hopefully see on the screen is if someone is using an SQL Lite database. And so there you're just connecting to that SQL Lite database. And you're running the same code there. You're trying to like using a group by and you're mutating your data as well. And if you then decide, I want to try and use something like Postgres, all you have to do is change the connection. You don't actually have to change the code that's following, but you just have to change the database engine that you're selecting or the database specifically that you're using. And you can do that again, scaling that even higher to something like BigQuery. And so all of these options, something that is a cool use case for this is if you're using maybe a sample of small data, and then you carry out your operations on that sample to kind of test out and see, do I like what it's returning? Does this look good? And things like that in a quick way. And then after that, you can then decide to take maybe that same exact code and then run it on your full dataset on using a larger engine. So really helpful, I think, for people who often need to scale between different sets of data, of different sizes. Okay, so going back to the beginning of our talk and we talked about art around the city of Dublin. So we want to go ahead, I think this is a great example to look at a practical way that we can use IBIS by exploring the art around Dublin. So the first thing we're going to do is we're going to install IBIS. I did, when I was doing this, was I installed IBIS. And in order to install IBIS, you use a PIP install IBIS framework. This is not really intuitive, honestly. Like it should be PIP install IBIS, but it's not because the name IBIS was already taken by another project, apparently. And so I've done this by mistake and installed IBIS, just the framework itself and it won't work because it's like a templating engine. So you're going to have some issues. So just as a reminder to keep in mind, just install PIP install IBIS dash framework. And once you've installed that, you want to import IBIS just as IBIS in the same way that you can see on the screen there. And then what I did to get started was I hit interactive to true. And what I had done is I had gone onto that double-linked website and I downloaded the CSV data and I stored it into an SQLite database and I named that database DublinArt. And so from there, I'm saving that database to a DB, to the DB variable there. And then after that, we can kind of look through a table. So the table that I saved all of this in was called DublinArtTable. And if you can look at the data that's available to us from that CSV file, we can look at the councils that the art is contained in. We get information about the artist themselves, their name. We get some information about what the artwork is titled. And we also get like the location, the specific location of where the art is and things like something really cool as well is a preview image of what the art looks like in the city as well. So as you can see again, this is very, if you're used to using pandas, this is very intuitive when you're writing the code. And so an example of querying the data, one thing, a couple of things that I decided to do was I used a distinct method just to find out and count methods just to find out how many artists were involved in this project. And there were 420 unique artists in this project, which is amazing because I'm like, that's a lot of artists like in one city. I feel like that's awesome. And then after that, they were also five councils. So it wasn't just Dublin City Council that was involved in this project, but several other councils were involved in like getting art out there. And there are several methods that are available for you to use in IBIS. And if you want to just take a look at those methods, one of the ways that you can do it is if you have your table already, you can hit the period and then the tab button. And just in the same way that you would, in a normal Jupyter notebook, the methods would like pop up on the screen for you. But another thing you can do is you can also take a look at the docs. They're over at ibis-project.org. Ideally, it would be ibis.org. But I don't know who's naming all of these things because I don't know, this is not intuitive, but it's fine as well. But you can visit ibis-project.org and get the docs there. So what I ended up doing, the method that I used here was I used the filter method to be able to select the locations of all of the arts specifically in Dublin City. So all of the art available. And these were the locations that I found that were available. And I wanted to specifically just have a preview of the art that was available at New Street South Dublin 8. Was the one that I just decided to start with. And this was really pretty straightforward to be able to display the art there in a Jupyter notebook. So all I did was I then chose the location column and looked for that specific street or that specific location. I executed the command to tell the DB to get that data for me. And then I specifically used the ILOC function to be able to get the first art that was located in that location or in that column. And so here on the screen, I was able to pull up the profile image that was based off of the location where I found that art. And you can see it on the screen. It's like in between. I'm also not sure why they didn't paint the other boxes but like they painted the middle one. And so it's, I'm not sure what it says but that was like the first art that I saw. And then after that, I decided as well that I wanted to specifically try and find the art that was surrounding us here at the Dublin Convention Center. And I think it could be kind of cool. I don't know if, well, I'll try and make the notebook available but even if the notebook, even if you don't use the either thing itself and if you see one of the pictures around and wanna take a picture of it and tweet it or something like that, that could be kind of cool if you see an art canvas or something around. So I wanted to find the specific locations closest to the convention center that had the art. So I could kind of like spot it once I was here. And so to do this, I decided to use to change to Postgrace SQL. And Postgrace actually has an extension called PostGIS I guess is how you pronounce it. And that allows you to use sort of geographical or geospatial geometry data so that you can visualize like geometry data or use a GeoJSON file and get data from that file a lot more easier than you would with other databases for example. So switching over to Postgrace was fairly easy like I also demonstrated earlier. All I had to do was change the backend to instead of ibis.sql light, I was now typing out ibis.postgrace and then connecting to that Postgrace database. And I had previously before created the Postgrace database and I'd load it in the GeoJSON file that I got from the website. And all I was doing there or I'm doing on the screen there is I'm saving the specific table that I created which I named Dublin canvas to a valuable called canvas locations. And then I'm listing out the columns that are available in that table. And the main thing column that we are interested in that WKB geometry column and that just contains WKB data which is like a form of a way to store like longitude and latitudes as well. So in terms of working with the actual geographical data I decided to use GeoPandas to help work with this data. And so I got I was able to using ibis I was able to get the WKB geometries from my Postgrace database. And then I was also able to get the titles like the respective titles based off of their geometries as well in those first two lines of code. And then from there I'm using the GeoPandas library to then convert that WK data into latitude and longitude and save that into a series called Geo Series. And then I created a data frame GeoPandas data frame that contained the titles of the artwork with the geometries as well. So I'm not I'm sure there could have been an easier way to do this, but this is like I was just experimenting and this was like the easiest way that I could find to do this. So then after that I was able to also find the geographical location of the convention center itself the longitude and latitudes of the convention center and I added those into that data frame as well. So I could like kind of see where the art was in respect to the convention center. So then finally in a Jupyter notebook which is one of the reasons that I appreciate using ibis is that you can do almost everything within one Jupyter notebook. So I then took all of this data and I was able to visualize it using folium and GeoPandas as well. And using the explode method from GeoPandas I was able to show the geographic locations of the art that was closest to the convention center. And actually if I was hovering over this is just a screenshot, but if I was to hover over the other icons, the names or titles of the art pieces would then show up. And so this was really cool that I could actually see which art pieces were closest. And there were two specific ones that were very close. The one on the left and the right of the Dublin Convention Center. There were two that was the closest. The first one is one called Rabbits. Has anyone seen this one around? No, now look for it. Now you have like a good excuse to look for it. I've actually seen it. But it was like, there's a little bit of graffiti on it and now I'm like, great thanks to the people in the city. But like it has a little bit of red graffiti on it at the moment, but it's called Rabbits. So if you spot it, hopefully you can spot it. Another one as well, the one on the other side closest to the convention center was an art piece called Bicket. And apparently Bicket is an Irish writer and also director of Pays or something like that. So I haven't actually seen this one. I haven't spotted it yet. But if you do, feel free to take a picture of it. So all of this I did using IBIS. I did this all in one Jupiter notebook. I was able to do this in the most efficient way as possible because Python, I was writing Python code, but that Python code was then being converted directly into SQL strings that then told the database what to do. And so this was like one of the most efficient ways to be able to work with databases in my opinions with IBIS. So that's actually all the time I have for my talk today or that's the end of my talk. If you would like to contact me, you can feel free to email me. My email is marlene at VoltronData.com. You can also tweet at me, it's marlene underscore ZW. And I will release a bulk post and a notebook that contains all of the code that I shared in this session. So if you'd like to take a look at the blog post, just feel free to find. I haven't released it yet, but when I do, it'll be at marlenemangami.com. So yeah, thanks everyone for listening. Okay, thanks Marlene for the very interesting talk. Cool package. We have plenty of time for questions, almost 50 minutes. So we also have a remote session. So we'll start with one in the room and then switch to remote if there's a question. So if you wanna ask a question, please come to the mic at front. Hello. You showed us, so thanks for the talk, Crystal. You showed us lots of ways to use IBIS and I was checking a little bit online on the library. It doesn't seem to be as good in terms of insertion and creation of data from Python to your database than it is in analytics. Have you tried that a little bit? Do you have any feedback? Yeah, I think that's a great question. And I do think that's an issue that I came across in terms of creation of data or if I have a CSV file and want to actually create a database using that CSV file in Python using IBIS. And it actually hasn't been like, I actually asked the people who are working on it actively what the plan for that was and they said that they're working on it. At the moment, I think if you use, there is a DuckDB backend for it and if you use DuckDB, I think there's an option to be able to, like as the backend, there's an option to be able to load in a CSV file that way and create a database that way. But I do feel like it would be very useful to be able to just like load in your CSV data into like and create like a MySQL database right there with IBIS. So I agree with you. I think that's a pain point, but I don't know when you're gonna fix it. So that's a good question, yeah. Yeah, thank you for a great talk and introduction to IBIS. I think it was on slide 10 that you mentioned that the, when you have an IBIS expression, you can either compile that to SQL query, SQL string, or compile that to SQL alchemy expression. And I didn't quite get in which case you would want one or the other. I mean, I guess compiling to the SQL string is the performance and obvious choice. So when would you want the SQL alchemy option instead? Right, that's a great question as well. I think from what I've seen, like it just depends on the backend that you're using. So if you're using SQL Lite, I think for sure it compiles to SQL strings. But if you're using something like, I think actually like for example, the DuckDB backend, I'm pretty sure that compiles to SQL alchemy. And I'm not really sure what the rationale is for each one. I think they try to find which one is the most optimized to be able to get you the fastest result. And so I think in terms of that, that's probably how they do that based off of the backend and maybe also based off of like the type of query that you're doing as well. So I think that's probably how they decide which one, but it kind of varies. And I will say as well, it also doesn't always like execute to or compile to SQL strings or SQL expressions. It's sometimes you can use like pandas as a backend or like PySpark as a backend or DOSC. And so if you want to like write Python code in Ibis that you're used to and want to like have that be actually translated into DOSC or PySpark, you can do that as well with Ibis. So thanks for the question. Maybe quickly check. Do we have a remote question? Maybe not. Please go ahead. Hi, Marlen. Thanks for that great talk. I also really enjoyed your recent YouTube talk on the introduction to Arrow that I'd recommend for anyone interested in that. Thank you. Looking at the connection. So Voltron is obviously the kind of company associated with Apache Arrow. Are there any plans to kind of integrate, sorry, integrate Arrow further into this and move away from kind of like a NumPy basis and have it more into that Arrow ecosystem? Into Ibis? Yes. Yes, I have definitely, thank you for watching that talk by the way. And yes, I definitely know that there are plans to actually have Arrow as an option for the backend. So to have like an Arrow backend that you can use with Ibis. So that's something that they were actively like developing right now, but it isn't something that is currently available. But it is something that I know that they're developing. I know as well, like there's some ideas to do this in collaboration with the DuckDB backend. So I'm not sure if it's just that you're just gonna be using DuckDB and that will convert it to Arrow. But it is, yes, that's a good question. And it is something the developers are working on right now. Okay, great, looking forward to it. Thanks. Okay, if there are no more questions, our next session will begin at 11.35. So we have about 10 minutes until the next talk. Please join me in thanking Marlene again for her great talk. Sure, thank you.