 So, I'm happy to introduce the next speaker to you, Valerio Manjo, he's with FBK, and yeah, please give a big welcome to Valerio. Good morning, everyone, and thank you very much for coming. Um, talk today, data formats for data science, uh, very, uh, quick slide about me. I'm a, uh, postdoc researcher in FBK, currently in the complex data analytics unit, uh, interested in machine learning, text, data processing, and recently, uh, with deep divergencies, with deep learning and stuff like that. I'm a fellow Patronista since 2006, and I'm one of the, uh, main organizers of the Pi Data Italy, that I, um, ask you, everyone interested here, uh, to, to check out. We have a Twitter account, and we had a couple of conferences in the last two years, and the, the, the, this year in Florence, and together with, uh, yeah, together with, uh, Pi in Italy, it was fun, we had a lot of fun, so please check out if you're interested. Uh, another thing that's worth by mentioning, uh, in O'Lungan, there will be a Euro Site Pi this year, and it will be by the, at the end of August, and, uh, actually the early bit, the, uh, tickets is gonna end today, actually, but it's definitely worthwhile, so if you, since you're in the, uh, Pi Data here, I think, uh, it's definitely a great conference, and you should definitely think to, to come, uh, and that thing, basically, yeah, that's it, so thank you, actually joking, yeah, so back to the serious part of the talk. Data Formats for Data Science, um, uh, the, the main goal of my talk is try to, uh, point to some very interesting libraries to process data, uh, in Python, uh, according to different formats they may have, and, moreover, let's try to, um, to see, uh, what should be or could be the most bitonic way to do that. Uh, Data Formats can, uh, came into play, uh, in the data processing step, of course, so in that case, the question is, what's the better way to process data, and since we're here, Pythonistas, the, the, the better question should be, what's, uh, what's the most bitonic way to do that, and we're gonna see some examples of that. Uh, Data Formats should also be involved in, uh, data sharing, uh, for instance, uh, what's the best way to share our data, and that's basically the second part of the processing, so it's for the, uh, presentation of data, so data visualization, and for instance, one possible way to, to answer that is, uh, try to share an interactive chart, so, for data visualization. Um, unfortunately, we're not going into this, but I strongly, I, I strongly suggest you to, to follow the next talk about Bokeh, which is a very great library for that. And by the way, the very, um, most common, uh, to date, uh, format to share data and, uh, indeed, data plus code, plus documentation is a Jupyter Notebook. I'm quite sure that any of you here already know, uh, uh, what Jupyter Notebook is, but, uh, in case you don't, please check out this very great project. So, back to the data processing. The very first example of, uh, data format we're gonna see is the textual data format, because it's the most common data formats we're gonna, uh, work with in our, uh, data processing step. And let's consider a textual file basically containing numbers. So it's a huge, uh, sequence of numbers. And let's see what's the best way to process that type of, uh, format in Python. Of course, the, the very simple, the, the most trivial solution for that is open the file and read the file line, line by line, put the, the content in the, uh, in a list, and that's it. Uh, probably a, a more pythonic solution should be using context managers rather than opening and closing files. That's more pythonic, of course. Uh, and basically we, we're what we need. We have what we need. We, we, we store all the information in the files. Um, of course, this is not so efficient because we have to deal with numbers and Python lists are not very good at it. So probably a better way for do that, uh, to do that is to using numpy, of course. Numpy to rescue and numpy, uh, provides, uh, out of the box a very useful function for that. So in case you have, uh, uh, textual file containing numbers that are basically, uh, uh, matrices or multidimensional arrays, uh, you may leverage on the low txt function in basically one line. You, you got what you need without, uh, being, uh, worried or concerned about the possible, uh, um, formats, uh, problem you may have in your file. And, uh, as output, of course, numpy, uh, low txt returns a numpy array rather than a, uh, Python list, uh, which is, of course, more efficient in processing numbers. Uh, if, if we, if we take a look at the low txt function here, uh, we're seeing the documentation we have, uh, many, many parameters here, uh, we may, uh, specify the type of numbers we want in output. In case there are comments, uh, in case, uh, we want to convert specific columns or, uh, we want to specify our, uh, number of dimensions for the file. Uh, that's very simple to use. There is another function in the numpy package, numpy library, which is gen from txt, and it's basically, does basically the same with the very difference that, um, that function is able to load data from a textual file, uh, also in case you have missing values in it. So the low txt, uh, uh, expects you to have a full matrix, uh, so the number of rows and columns should match. In case of gen from txt, you have a way to specify, uh, a strategy to, to deal with missing values in, in the file. Another very common textual format you may, uh, uh, come across is, of course, the CSV file. The CSV file, CSV stands for comma separated value, but in general, you may have, um, values in this format simply using different, uh, characters, not only commas, for instance, tabu, tab characters, tabulations, or spaces, or a combination of that. Uh, in this particular case, we have, um, uh, CSV file with the, uh, the very first, uh, row, uh, which is the header. Okay, so it keeps the information that's quite the case when we process CSV file. Uh, so if we take a look at the very, um, simple solution in Python, we have, in Python in the standard library, we have the CSV module, which is very, uh, specifically, um, specifically, uh, devoted to process CSV files. Uh, and in this case, we open the file, we create the reader, and that's it. So basically, we iterate over the file line by line, and that's up to us to decide how to store properly the information we process in the, in the, in the file. Um, uh, if you're in more into the scientific, let's say scientific system of Python, I think that the, the, the, the, the very first solution that comes to you when you think at, uh, when you think to CSV file is using pandas, of course, because pandas is very great at that. Pandas builds with, uh, the read CSV file, and that's very simple. Again, just one line of code, you put the, the path of the file, and that's it. So in output, you have pandas, data frame, ready, packed, and ready to, to use for data processing. If we take a look again at the documentation of read CSV, we see that we have many, many options, because actually when you process CSV file, you may, uh, care, you may come across very, uh, differences in the formats, in the, uh, handling of, uh, non-number, no, no, uh, null values, uh, no number values, and stuff like that. So, um, the, in, in this particular case, the, the basic idea is we're not actually dealing with a file containing all the numbers, but also, uh, data of different type. So the data frame is the best way to do that. And of course, as you may see, uh, in the left, um, in the left corner here of the slide, in pandas, you may have many, many functions, uh, already provided to, uh, to, to process many data formats with just one line of code. In particular, we see, uh, read CSV, Excel, or HDF, HTML, JSON, which are some other, um, formats we're gonna see in a very, uh, few minutes. Let's have a more compli- let's say more complicated, actually, not so complicated example of a CSV file here. So basically, the difference from the, the first example is that, um, here we have the first 10 lines, uh, in the file that are basically metadata, not actual data. So the idea is we want to skip those lines when you, when we get the data into the data frame. And that's very simple in, of course, in pandas, over, over there, you may see that we just need an additional parameter, which is skip rows, uh, and we say how many rows we want to skip, and that's it. So again, pandas is the, the solution for this kind of thing. So to sum up a bit on the textual data form of the very first and simple example we saw, uh, to be phytonic, of course, use context managers, uh, numpy and pandas are the solutions if you're in data processing, uh, numpy mostly for numerical data, uh, or data containing just numbers, and pandas for a CSV, uh, respectively, low-dixity and read CSV were the functions we, we saw. Um, the textual data format has some advantages, such as it's, it is very easy to recreate or recreate and share, and that's very easy to process, as we saw. Um, but, of course, it's not uh, so storage friendly, uh, but it is highly compressible. And, uh, Mohova, another, uh, drawback you may have, let's say, another disadvantage, uh, the format, uh, has is that it does not support structured information. In case we need to have some, uh, hierarchy in our data, the textual data is not the proper format to use. So, we come to the second example here, to the binary data format, and we start by thinking that if we think of the, our much, uh, space, so much bytes we need to represent numbers, we may, uh, see, for instance, integers and floats in native, in this example here, in native strings representation. Um, as you can see, the, uh, while, uh, the, the storage required for numbers in strings, uh, increase according to the number of characters we have, of course, the numbers of bytes required for numbers, stored as numbers, uh, is basically constant, according to the, the, the, the type, of course. So, the ideas are trying to use those information and store the data in the original format, just, uh, just like binary format. So, uh, but, of course, the space is not the only concern for text, but also speed matters. Uh, so, uh, when, when we have numbers stored as textual files, basically we lose, uh, time in converting those numbers, uh, those texts, uh, in numbers, and basically that's it because the convention to interfloat is not sufficient because of the underline C function A to I or A to F. Um, the very simple way to do that in Python to store binary data is, for instance, using the PICL module, which is imported in, uh, included in the standard library. Uh, we have an array here, so basically we have an array of 100,000, uh, 10,000, um, numbers reshaped by 10, uh, in 10 times 1,000. And we store that in a binary file here with a PICL dump function. So, we have here an array and we may load again from the binary file using PICL load. That's very simple to use. Basically, we don't need anything because it's standard library, so it's just Python. But, of course, the problem in this case is that when we want to store binary data, it's not just numbers. Most of the time we need also metadata or some descriptions in the binary, uh, format we want to, to, to leverage. So, in, in that particular case, uh, the, the option, uh, is trying to think to another format and actually the reason of the format, which is this so-called HDF 5 format, which is hierarchical data format. It is a free and open source file format, um, and it works very great with both big or tiny, tiny data. Uh, it's storage friendly because it allows you to, to have compression. That's very, very nice feature. Uh, and it's, uh, also development friendly. Uh, it has our domain-specific language to query the data in your structure, basically. Uh, it has support for multiple language. Uh, and that means that you may use that format, uh, regardless the, the person you're sharing the, the, the data you have is using Python or Java or any other language. So it's very, that's a very interesting feature. And as for Python, we have many libraries. The, the two most famous are PyTables and H5py. And I'm gonna show you a couple of examples with both of these libraries, just to, to, to see the, the, the very difference. Um, so if you want to create a new HDF uh, HDF 5 file, we just need to import the module H5py and then we create a new file. And we create a new dataset in it. We specify the numbers of, uh, elements we want. In that case it's 100 over there. And the type. Uh, so we have a new dataset object in output, which is uh, you may see here at the bottom. But uh, when you have to deal with it, it's basically an umpy array. So it's very development friendly. Um, and we may also leverage of the, on the slicing feature here. So we may, uh, get the 10th element or slicing at step of 10. So we get basically enough with an array, a numpy array of the type we specify there. It was integer 32 bits. Uh, actually with these file format, the numpy arrays tightly integrated, if we gonna use the other library I mentioned, PyTables, actually PyTables provides you out of the box a series of um, built-in uh, data structures for your HDFI files. And those are arrays, eray, eray, uh, VL array that stands for variable length array or table. Uh, the syntax is quite the same. In that particular case here, we're creating, uh, in the, uh, at the bottom of the slide, we're creating a new array, numpy array, here. And then we're creating a new table. And then we're uh, filling this table and accessing it, uh, through um, dot notation. So it's very useful. And we append the nights here, which is the numpy array we created before. And we specify these as, as, as an array of records, uh, with those types over there. So it's integer as the first build and, um, strings with 10 characters at most for the second field. That's very useful. And very easy to use. Uh, the other important feature of the HDFI file is that we may have hierarchy and groups. So we may structure the information in our file. So basically we might, we start from the root here. And then we may create groups or end create data sets and append those data sets to the group we created. So basically here we have a specific path to follow when we want to access the data set in the structure file we created in the HDFI. Um, over we may also create uh, starting from the file. We may also create a new data set directly specifying the path. And then we may access those uh, data set using directly the path rather than passing by the group we created. So it's very easy. It's very easy to use. And finally the, the last feature I want to show you is that uh, regards data chunking which is pretty useful in case you want to do in core rather than out of core analytics. The basic idea is when you have contiguous data sets basically the storage here is contiguous but when you have chunks you specify that to the HDFI file that you want to have sparse data. So you want to process by chunks. And uh, that's very useful in case you want to leverage those data processing in parallel. That's a feature supported actually by HDFI. In fact if you wanted to show an example here um, MPI is, with the MPI 4Py library is out of box integrated in the HDFI 5Py library here. So in this particular case here in the code we are modifying the file by multiple processes. And we are adding to the data set to the rank index which is an array of 4 times 1000 numbers of integers. Uh, these um, we're basically modifying the data set with these array. And we're accessing every process access each slice, its slice, its specific slice of the data set in parallel. That's very nice. Uh, if you want to learn more about the HDFI, I highly recommend this book. And also we're going to have another talk about HDFI and more into details and that's going to be on Friday I guess. Yes, it's it's going to be very interesting. Another um, binary format I want to show you is one I came across very recently. Uh, and it's the so called root data format. Uh, I don't know if how many of you here already know about root but yeah, thank you very much. Uh, actually root is um, um, a framework, a tool and also a data format. That's why I decided to include it. Uh, to include it here. And um, it's most of the time it uses for data processing in general but it's used in physics and especially in case you are in particle physics that's quite the case. You use root for data analysis. Uh, it's a great tool actually. It's written in C++ natively. Uh, but it has an extension in Python which is sometimes referred as pyroot. Uh, and by the way, root six which is the latest version of root ships with a Jupyter kernel. So actually you may leverage Jupyter. Uh, you may leverage the root functionalities within inside the Jupyter notebook. Uh, it defines a set, a new binary format which is the dot root and the basic idea is it is based on a serialization of C++ objects. Uh, so that's um, at a glance what root is. You have, you may leverage over here, you may say root ships with an interactive shell just like the Python one. So it's very useful. Um, and you may sometimes write um, in a sort of C++ code in these interactive shells. So you basically have a sort of interactive C++. That's interesting um, from some point of view. And that's the browser. So that's the file. So here you may see a very long list of leaves in this particular file. And every time you open a leaf which should be a data container um, you see an Instagram here because um, most of the time when you open root files you have Instagrams on your data just, you know, the distribution. Uh, but in case you want to go more into details and you want to extract the data from the root files, uh, it turns out that you have to write these long and boring C++ code actually to perform very common operations. So basically you have to access a tree and leaf. So the idea is that a root file, rather than talking about data sets and groups just like HDF5, it talks about trees and leaves. That's the idea. So branches and leaves. But the general idea is just the same. That's why I decided to show you. Uh, and the other reason is that we're basically here accessing a tree here and then we're, uh, that's a very weird syntax I want to show you. Actually this is a 2D Instagram. We're getting the data from the tree. We're getting these um, uh, expression here that's basically these values, uh, with respect to these other values. And we're basically forwarding the output of these rows to these C++ objects, which is H, which is an anonymous C++ Instagram. And we iterate over the entries and the beans of these Instagram to get the content and that's it. So we have to, uh, originally we should, uh, write these very awkward C++ code to do that, to extract data from these formats. Uh, fortunately we have the pyroot packages, as I already mentioned, and that's the general syntax to do that in Python. Uh, but as you can see, the style, the programming style lacks of any, uh, pythonic feature. It's very C++ style, okay? So basically you have no naming conventions, just like the ones we get used in the pet hate. We're just basically, it seems like we're basically writing C++ code. But fortunately there are a couple of projects I want you, uh, to show to and to point you out that are those, uh, named root py and root and py. I'm going to show you a couple of examples. They're very nice projects and very easy to use. So getting these examples here using the pyroot, uh, you may leverage on root py, and we end up writing a more pythonic code. First of all, let's see that in case of using the get function here over the t file to get the tree name we want, we basically here, that was in the Monte Carlo in that case, uh, we may access the tree directly using the dot notation, just like a Python object. It's very nice. And moreover another very weird thing, uh, root has, when you're going to define uh, to the Instagram, uh, basically you have to define the y axis with respect to the x axis, which is sort of counterintuitive. So they fix that in the root py project. So here you basically specify what's most intuitively, uh, expected. So x axis with respect to the y axis. And you basically avoid those weird syntax of, um, let's say, uh, moving the output here to the this weird anonymous object by just passing an attribute here. So you said, okay, I want this row to be stored in this h2, uh, these 2D, uh, Instagram here. Which I define here of type f, which means floating point numbers instead of th2f, uh, originally defined in root. Another example using the root numpy, which is very useful, so you want to get the data and avoid to process those files, um, bin per bin in each Instagram. So you just want, I want these Instagram, I want these 3, and I want the value in it, uh, all the values in it. I want an output as in a root, a numpy array. So that's the, the, the, uh, the goal of the aim of the root to array function here. So we pass the file, the name of the tree, and then the branch we want. And then we get an output, a numpy array. And the, the, the fun, the fun thing is that actually these library is tightly integrated in the pyroot, uh, um, ecosystem. Uh, in fact, we get these a numpy array. Basically, we, we are here creating an Instagram using the, the original pyroot, uh, library here. Uh, and then we're filling the these object using the root numpy, uh, function here. And then we draw again the Instagram using the original object. That's very nice to use. So basically you're going to use the two libraries at the same time without worrying about the details because it's up to the libraries. And finally another interesting feature about that, uh, root, rootpy ships with these root to HDF5, um, um, command and, uh, utility. That's a lot, that allows you to, to, to switch from the binary root format to the HDF5 format. Okay, that's it for the binary files. Uh, we're gonna see, yeah, thank you. We're gonna see another, I'm gonna, uh, go very quickly about this format because it's very common and I'm, I want to talk about this format more from a data processing point of view rather than the, the very specific reasons why, for instance, um, so far in web processing, JSON is the, the format of choice when you have to deal with API rather than XML. And the reasons are many faults. One of the, one of these is that it's, uh, less verbose, of course. And from the Python point of view it's more easy to process since we're basically having to deal with dictionaries and Python lists. Uh, in case you were wondering in our contacts where JSON is using, uh, basically JSON is the format under the hood of the iPad and notebook. So basically a Python notebook is JSON file. But, um, for this talk I want to, to, to talk about JSON because JSON is the format of choice for document-oriented DBs. So the, the so-called not SQL DBs. And I want to show you a couple of slides that, uh, of our tests I made, uh, comparing the, uh, the performances of HDF5 files uh, with, versus the MongoDB, uh, no SQL DB. Uh, so here we got, we're seeing that we basically had, uh, 100,000 of, uh, documents here. And those documents were structured. I mean, there were textual documents. Uh, that was, the basic idea was trying to build, uh, the sort of information retrieval index. So I want to store for each document all the terms and the frequencies of the terms appearing in all the documents. And more specifically I wanted to, uh, to store the, the particular zone of the text where all the terms were gathered. So, uh, it's a sort of structured index I wanted to build. Uh, so since the, these idea of structures I just, just tried to, to decide if, to, to test if HDF5 could be a possible solution. And what I got was that, uh, from a processing point of view, the HDF5 format is not so appropriate because it takes more and more time rather than MongoDB implemented in two different versions actually. So it was the flat storage rather than the compact storage. The, the differences were in the, how high structured the, the JSON objects going through the, the queries in the MongoDB, uh, storing or not, uh, respectively, the, the, the, the zone information explicitly in a nested object rather than encoded in the, in the terms. Uh, so, but basically the performances were just the same. Uh, it was just a matter of how uh, the, the, the easiest way to deal with it, is simply, I mean, but if we, uh, look at the storage performances, uh, HDF5 with these very, um, simple, uh, and already provided out-of-box BLOSC filter, which is a compression algorithm you may leverage, uh, it's definitely the solution to, uh, to, to go for. So in case you want, um, you have storage, uh, constraints HDF5 is a great tool. Of course, uh, it's not comparable in terms of efficiency in case you have MongoDB at least in this very tight case study, and uh, it's just, uh, of course, there are many, many things we may optimize, that's not in the case of these example, of course. For instance, the, the, the possibility to update or distribute it in multiple classes, and stuff like that. Okay. Another format of interest for me for this talk was the HDFS. The HDFS is data format for big table. I'm going to show you a couple of, uh, show you a couple of slides taken from the notebook here, uh, by Mark Rocklin. Uh, it's very, um, interesting to, to, to finally notice that there is a library which is called HDFS3, um, in the Python ecosystem. HDFS, of course, stands for Adupful system, is the distributed version of the file system on top of, built on top of Adup. And the data can be organized in charts and distributed among several machines. Uh, and basically it's the, the factor standards for big data. Uh, in Python we have these very great library which is HDFS3. Uh, it works very, um, very good on Linux machines. I had some issues, um, uh, to make it working on my iOS, iOS 8, uh, machine, but on Linux it works very good. Uh, and it has a native implementation of HDFS in C++ so there is not, uh, there is no Java along the way. That's, yeah, very nice. That's the point. Uh, so the example is let's try to see how we may leverage the analysis of, of CSV files distributed among the clusters. So here we, uh, create a new file system here, over there, sorry. We create a new file system, HDFS, sorry. And we alas this file system. We see all the CSV file we have here. We may read just one file here taken from the file system and using the read CSV file here and put the data in the data frame. In fact, that's it. But more, more interestingly, we may, um, read the CSV, all the CSV file here, uh, with the wild card here. So basically we're opening all the CSV files matching this query and we're, uh, accessing here the data using these executor here, which is the, the, the, the server that allows you to have the distributed computation. And the very funny thing is that basically if you execute these in the notebook, uh, the interactivity of the notebook is, uh, still available. So basically it's not blocking. That's very nice. Uh, when the computation ends, basically you have the data you have in the data for, but just like a pan of this frame. So that's very easy to use and very nice. Uh, definitely worthwhile looking at when you have to deal with HDFS. Uh, and finally, yes, we may also operate on data frame here, uh, to filter the data we have and then we go, uh, so we get another data frame here and we also further processing our data. And that's very nice. Um, since we're dealing with big data here, another mention I would make is, uh, like to make is that it is about Palumno database. Uh, basically uh, that's the, the direction in which the big data would be shifting uh, to date. Uh, so, um, we're moving from the so-called row based databases, the relational databases, to the columnar ones. Um, so far there are two families, two categories, two kinds of columnar database. The group A approach, which is the Google table, H base, uh, or Cassandra, uh, which is a sort of, uh, data model which is based on multi-dimensional mapping. Uh, rather than the group B, which is the the, the, the one choosing, uh, from these other tools here, which is the so relational data model. So basically the difference is that you have data organized in columns rather than in rows. And that's very useful when you have to deal with analytics because most of the time you end up analyzing data, uh, going through columns rather than rows. And that's very efficient and the, the, the tool I want to show you is the, the, this one it, it is called MonetDB. And the reason why I'm showing you this is that it basically ships with a built-in Python support. So basically you have, uh, indeed you have Python plus R, uh, built-in support in it. So you may write inside the database Python or R code for your analytics. Uh, in fact the MonetDB type are directly to numpy arrays. So when you have to, to process columns in your DB, they're out-of-box transformed in numpy arrays. So you leverage the numpy processing in it. That's very nice to use. For instance, here we're executing a query here that returns a table. That's a function in, directly in, included inside the DB. So that's working in the DB, uh, process. We're creating a new table here that has just one column of float and language of, of choice is Python, of course. And we basically creating a random array of numpy values and we're returning the values. And that's it. So basically we have an interpreter on, on MonetDB table. Uh, to make it, uh, working and to see it, um, working in a more concrete example. Let's say here we have two functions here in MonetDB and, uh, here we're basically leveraging all the functions of uh, scikit-learn here. So we're basically writing Python code in it. Uh, here we have the confusion matrix for some processing. And then we have, uh, more details, so more statistics on the confusion matrix that we're creating. So a new table with all the information we want to plot here. We have a curacy, precision, sensitivity, specificity, and F1. We are storing all this information in a very Pythonic way here because it's Python here working in the DB and that, and that's it. So we return the, the, the value here. And the way we use it is just included in a query. So it's a simple SQL query here. So we select the value from the two table in a nested query and we pass the value gathering the data from another query. And that's very easy to use. Uh, of course it's a very quick sample. Um, I highly suggest you to check out the D stock from, uh, from which of these couple of slides have been gathered, uh, in database analytics with Python mode DB. Yeah, thank you. Uh, okay. So that's basically the end. So there are a couple of things before, uh, closing. I want to show is that, um, a couple of things we missed along the way, basically. Um, it's more tools rather than format actually. And I want to, to, to point you uh, to a couple of tools very interesting, uh, and very easy to use that are, that now belongs to the PyData ecosystem. Uh, those tools are the X array and the Blaze tool. Blaze is fantastic. And uh, it's basically a sort of one tool for all the formats. So basically, uh, I'm going to show you a couple of examples next slide. Uh, and the X array is a sort of extension is, you, you can think of it as a, as an intermediate Y from the Numpy structure and the Pandos data frame, because the X array is basically a labelled end array. So the idea is I want to have a multi-dimensional array, Numpy array, but I want to describe the, the value and the columns and the rows I have in it. So I, I want to access the, the, the rows or the columns by name rather than just by index. So that's the, uh, labelled array. And it's based, it's a, it's a library based on the, uh, so-called net CDF format. It's, um, uh, very, uh, uh, cellular format in case you're in physics. And it's just based on a common data model that's called, uh, common data model that basically, uh, allows you to integrate HDF file, uh, HDFS or other formats in a one single data format. Um, and that's very useful. Um, okay. So Blaze, some guys in the, the ecosystem considered extension of Numpy sort of, I guess. Um, because it's a lot, it allows you to out of core processing, which is basically one of the limitation you have when you have to deal with Numpy. Uh, in this couple of examples taken from the documentation here, you may have, you may create the data object here from Blaze, which, which is basically talking to a database here rather than a pandas data frame. And that's basically, um, the same for you. Yes. The same for you when you're dealing with, when you're dealing with the code. Uh, in case for the X array here, you may create a data array gathering data from pandas data frame rather than a Numpy array here. And basically you operate over the data, uh, just like a Numpy, uh, just like a Numpy array. So I think that's it. So just in conclusion I would say complicated data require complicated format. Complicated formats are quite very good tools, but fortunately we have Python and all the, uh, Pi data ecosystem for that to tackle all these problems. So thank you very much for your attention. Yeah. Thank you very much, Millario. Unfortunately we don't have any time for questions. Next session's coming up, but I'm sure Millario will be happy to answer outside. Thank you very much. I can't be on time. I can't be.