 Thanks for the introduction. I have a short one myself. I'm the author of Polars. My name is Richie Vink. And I have a background in machine learning and software development. And yeah, I started Polars around three years ago. And today I'm going to talk about this. And I want to do a short introduction of about two minutes of what Polars is. And I cannot jam everything in it. So it will be jammed full. I still will not include everything. And then we're going to talk a bit about the promise of Polars. I want a quick raise of hands who is already familiar with Polars. Right. And the expression API? Same hands? Ish. OK. I'm not going to explain it today because that's something you can really read from the user guide and the documentation itself. But we're going to go a bit deeper. So the two minutes description of Polars. Polars is a query engine with a data frame content. And different from most data frame implementations out there today, it respects decades of relational database research. Relational databases come with a query optimizer. And they do not run your query naively as you describe it. But they will optimize this query and do all kinds of tricks to make sure that your query runs as fast as they can make it run reasonably possible. So for us to be able to do that, we come with lazy valuation. And that gives us query optimization. And with that, we can improve performance by one to 10 times depending on the query and how much we can optimize. We minimize materializations. That means that we try really hard to not construct full data frames, intermediate data frames. For instance, if you take a pandas data frame and you multiply first two columns, and then you only select the last 100 rows of that, you also multiply the whole data frame before you only take the last 100 rows. And we can realize that we only need to multiply the last 100 rows. These kinds of tricks. The data types are constructed to amortize access and allocation costs. That means the data types are efficiently laid out in memory. And with traversing the data, we do not have unneeded cache misses. We try to minimize that as much as possible. For instance, in pandas, until recently, the string was always a Python object. And a Python object is just allocated somewhere on the heap. And if you traverse that column, you will go through every random place on the heap, which will be a cache miss, which will be 200-time slowdowns versus a normal C string. It's written in a low-level language. It's written in Rust, which has full control of performance over memory and C-level performance. But also it makes it really easy to make code concurrently and parallel. It's designed for effective parallelism. What I mean by that is that we don't need any pickling, any serialization. We can just do some work in parallel at minimal cost. We just send pointers around in the same process. And it's designed for out-of-core processing, which means that we can process data that doesn't fit into memory. We process the data in batches. And if we cannot hold it in memory anymore, we spill to disk and try to finish that operation effectively for you. And we have tight integration with IO. That means everything, every read as we have written, we read directly into memory as we want it, not in an intermediate state, and then do another copy. I think Polars in two minutes was a bit longer than that. But yeah, now we can talk with what I really want to talk about. We make a promise that if you use Polars as it's needed, you will get these things. You get readable queries. You get optimization. You get performance. You get parallelism. You get out-of-core processing, which means larger than data, larger than RAM data. And you get knowledge extrapolation. And by that, I mean, in Polars, you only need to learn a small part of the API. And we have an API that's composable. And that knowledge you learn for that small part will be applicable in every part in the API. And we will promise strictness. And this is a huge benefit, in my opinion. So for instance, if your data type doesn't match, we will throw an error. And we do this before the query runs and not 20 times into your pipeline. And trust me, that it will save you a lot of headaches. We don't use any NumPy or Numba. We have written everything from scratch. And we don't want to see any applies. An apply we see as a failure on our side in providing you with the domain-specific language, the API, to express yourself. Of course, you can still use an apply to do something exotic, like using a large language model. But if you needed an apply to do some data manipulation, we see that as a failure of our API. We're not expressive enough for you to be able to express yourself. So this is the starting point. Everything starts with the domain-specific language, which you can use and which doesn't use any Python UDF. So we can analyze it. And we can make understand what happens and make sure it runs effectively. So the domain-specific language uses mostly expressions. And expressions are a functional abstraction over series. And their functional definition is pretty simple. They take a series as input and they produce a series as output. And the series can represent different things, depending on the context. It can represent a column. It can represent a group. In a group by aggregation, for instance, it can represent the element of a list. It can represent a single value. Or it can represent a literal. But if we look to the right, we can see that those functions, those expressions, are composable. Because the input is the same as the output, you can combine these expressions indefinitely. And this is similar to the syntax of the vocabulary of a language. For instance, vocabulary of Python is pretty small. You have four in, I think you can put it on a sheet of paper. But with combining this vocabulary, you can do anything. And that's the same with the expressions. You can combine them indefinitely. And they're super composable. And you can do all kinds of stuff that we would never have thought about, because you can express your own business logic in our expressions. They're lazy. That means they're optimizable on our end. But we're also, because you use our expression, we can understand what happens. And if we understand what happened, we can make sure we understand your intent, and we can see how we can do that faster. I will explain that a bit later. And we go pretty far with this, that we want you to use expressions. For instance, if you use an apply with something that could have been an expression, and we realize this, we will give you another error, and we propose you the proper expression to use here. I'm pretty happy with this, because users really use apply way too often where it's not needed. And of course, we cannot catch everything. This is rather new. But we introspect this Python function and give you an option to write a Polish expression. And with this, you would have a two on the text speedup, because you don't run into Python loops. If you, the apply is double evil. One, because you want Python. Two, because you take the global interpreter lock, which means we cannot run anything in parallel anymore. So if you use the expression, if you use the Polish API, you get one thing for free. The first is parallelism. And we deal with parallelism in two engines. This one is our default engine. And our default engine has parallel aware nodes. And a node can be a join operator or a grew by operator or a filter or a select. And those nodes by themselves know how to parallelize most efficiently. You don't have to do anything for that. But expressions themselves, they will be thrown on a work stealing thread pool. Work stealing means we have a queue with work to do and we only will run as much threads at the same time as the pool is big. So if you don't set anything, the pool size will be the number of threads. But this means the threads won't take work from each other. They won't toggle, which would be very expensive. If you just spin up a thread and you have 200 threads running, the OS will block those threads and you will have a lot of wasted time. So we will make sure that the threads are filled with work efficiently. And if a thread is done with working, we take another job from the queue and execute that one. I can explain a little bit more about this, but I will not because of the time. The second engine we have is a streaming engine which works out of core. Here the parallelism works differently. We have parallel pipelines. You can see the red bars. Every red bar represents a thread and we take a batch from the input data and we push it into a pipeline and we can just keep on pushing it until we reach a sink. And a sink can be something like a sword. For instance, a sword needs all data. We cannot sort the data before we have collected all the data. But an operator, for instance, can be a filter. You can already, if you have a batch, you can already filter out rows. You don't need to see the other batches to be able to filter those. So operators can always process any dataset size. It doesn't matter if it doesn't fit into memory. The things do sometimes do need to have all data memory and we will ensure we spill the disk if a sword doesn't fit into memory. Parallelism, again, is fixed for you. You don't have to think about it. And which engine we use? One, we toggle hybridly between the two without you knowing, but you can also force something to run on the streaming engine if you want. The second thing you get from Polos, if you use it idiomatically, is you get optimization for free. And we take this responsibility from the programmer. There are optimizations you, as a programmer, could have done, but often don't. Because it's, you want to express your intent. You just want to write your query readable and you don't want to say in the scamper case, for instance, which columns you would need to read. If you read too many columns, it means you materialize those columns, which is a waste of memory, waste of time. In Polos, we automatically figure out for you which columns you need, which predicates you need. So we do the, a predicate is a filter. If you do a filter, we look at the parquet statistics in the file and we can see, we check if we can skip a row group. And we also do the filters at scan level, which means that we don't need to materialize as many rows and this can save a lot of memory. We will not first read the whole file and then apply the filter. No, we will apply the filters while we're reading it in. Here we do a query, we do some group buys and some aggregations. And then we do a join with itself. And if you look at this and we do a self join, that means we would read the same source twice, right? If you do it lazy and you have a computation graph, it would mean you would read the same source twice. And this get optimized away. I will get to this a bit later. If we look at this query and we run it without any optimization, we would see it would run 1.2 seconds. If we run it with optimization by default, it's 1.59 microseconds. It's six times faster just out of the box because Polar does the optimizations for you. You could, some optimizations you could apply yourself. You could declare the columns you need. Some optimizations you cannot because we can only do these under the hood. Here we see the optimized query plan. And if you see here, we cache the join with itself. So we evaluate one part of the query plan and then we join that part with itself. We figured out that we can cache that one. You could have done this yourself by first collecting the query plan into a data frame and then join it, but we figured it out for you. What you also can see is that the filter is gone. The filter is actually, it's shown with a Sigma. It's actually done at the scan. So at the source itself. And you can also see at the scan, there's a pie sign. Pie means how many columns we have projected, have selected and we selected four out of 17 columns for you. And the other columns we didn't touch. And no work, the best work is no work that will save the most time, right? Another optimization, this one we will add. This is, I have the PR ready for this. If you look at this, you see this expression. We take column quantity and we multiply it with column extended price. And for one, we compute the mean and for the other, we compute the sum. But the multiplication is the same, right? The intermediate result. Does everybody see that? You could write it out by first making the temporary column and then make that intermediate result. This is called common sub expression elimination. We will add this, the PR is ready for this. And I know one of our benchmarks which is slower than it has to be because we don't use this optimization. We will add this one as well. So as you can see, we take this responsibility from you as a programmer. You don't have to think how, don't apply these silly optimizations yourself, just write what you want. And we will apply those optimizations for you. So you can write readable idiomatic queries which just explain your attempt and we will figure out how to make it fast. Some things we do, the ones in black are things you could have done yourself sometimes. The ones in red, you definitely couldn't have done. Also not in NumPy. For instance, if we take fused arithmetic, we take a column A times B plus C in a single allocation. So if you would do this in NumPy, you would do first A times B and it would allocate a new column and then B plus C and would allocate a new column again. And we do this in one allocation. And allocations are expensive. We reuse buffers, if you for instance do A times B times C times D. So you would do A times B, then we have a new allocation. But after that, that allocation is needed anymore. So if we multiply time C, we can use that same allocation and store the result there. And this really adds up. If you do this 20 columns or so, you're much faster than NumPy because NumPy goes to the heap allocator every time. But this also puts a lot of pressure on the cache. CPU caches, well, the more data you get from memory, the slower things we'll get. That's the gist of it. We do some de-vectorization. Won't explain what it is, but it makes stuff faster. One important one is we do schema checks before we run the query. This means you will get your error instantly without, instead of 20 minutes into your data pipeline. And another one is constant folding. If you take column one plus one plus two, we don't go left to right, but we first, we replace that with column full plus three, which would have saved an allocation error. Oh yeah, the replacing function is also interesting one. There are many functions. If you write them out declaratively, we can figure out, hey, you first sort the column and then you take the first 100 rows. This can be replaced with the top K algorithm, which is faster because it's efficiently implemented for that specific need. There are many of those functions where if we can analyze your query and we understand your intent, we can rewrite it in a different way. Besides optimizations that make things faster, we also make things faster because it's just written in a really fast language. We have written it in ROS, which has C, C plus plus level performance. It's got more memory safety guarantees. Those same guarantees also make parallelism and concurrency a lot easier to work with. So we are very, very confident in our parallelism and our concurrency. I think that we can count the deadlocks we had in the whole lifetime of a project on one hand. Those bugs already have been fixed, but another misconception I want to clear, the Polars is not based on PyArrow. So everybody who says that pandas who will use PyArrow and now they are just as fast, that doesn't make any sense. There are totally different implementations. We've written Polars from scratch. Every compute is from scratch written in ROS. Those are two different implementations and you cannot compare one versus the other. MySQL and Postgres are two different things. That's the, yeah, performance, just some benchmarks. These are the TPCH benchmarks which are made for databases specifically. And you see here, left completely as Polars, which only on the first query it's a bit slower. And this is because we don't apply the common super expression elimination optimization yet, which I have the PR ready for that. So we will be faster once that's merged. The other ones, Polars is the fastest. And sometimes tied with Dr. B, which is also super fast. And we also have pandas and pyArrow, no pandas and ROS and I cannot see it. Dask and Spark, yeah. But as you can see, it's in a whole lot of level. Here's another benchmark run by Dr. B where I want to have a side note that we are faster than we are here. There were a lot of optimizations which are not yet in this version. Dr. B ran with their latest version, Polars not yet. Interesting to see here is also that arrow is shown here, which is the pyArrow implementation. This is, as the world is today, how fast pandas would be if they would swap out entirely for pyArrow for their implementations. So it's not, for instance, it's not near data table or Polars or Dr. B yet. So, yeah. The final thing I want to show you get is readability and knowledge extrapolation. So if you learn the Polars expressions, those composable building blocks, you can use them anywhere in the API. You can use them with columns and select in a filter, in a group by, so you can group by an expression, so you can do a multiplication in the group by itself, which will then be the result we group over. You can do it in the aggregations, in joins, in list elements, and also in pivot aggregations, but also of course in expressions themselves. And as I said, this knowledge transfers. If you understand the expression API, you understand the whole of Polars. And yeah. So with this, I want to go to my summarizing slide. Too long, didn't listen. And that's the promise. If you use Polars ideometically, you can get code that is fast, easy to read, and strict, and strict means you can get the hangover upfront, but that's always better than have a surprise afterwards. Last words, I should swap those two around. Polars has started a company, and with this company, we want to focus more on scaling Polars, so we already scale out of core, but we also want to scale to big data and data sets that don't fit into a single machine, and so that's a long-term goal of us. We want to get more expressions in, and this year we want to hit 1.0. And as a final note, I added in this tweet from DataPythonista, who showed how you can plot with Polars, and this is not a misconception. So in Polars, we don't do plotting natively, and I often hear here that if it isn't done in Polars, you cannot do it, and it doesn't make sense to me. I mean, you can also plot with NumPy, and it also doesn't do it natively, and this shows an example of how you can just pass a Polar series to any plotting library. Yeah, and with that, I have two minutes left, so I think my timing is impeccable. If you want to talk with us, follow us on our Discord. It's really active. There is a lot of help from actually our developers, but also a lot of power users. You can get really boosted with getting started. Yeah, and like us on GitHub, and if you have any questions, hit me. So thanks, Ricky, for the talk and timing as optimized as Polars. So yeah, we can have a few questions. So we have two microphones on that side of the room, so yeah, we can just do that. Hey, Ruchi, nice talk, amazing. But can you go back to the fourth slide about the performance? It was hard to read a bit. No, no, one before. Which one is which? Yeah, let me get away from the mic. Maybe you can read out the colors. Okay, thanks, Ruchi. Yes. Okay, Ruchi, thanks for the presentation. I have one question. On one of the slides, you showed that a sample of just-in-time performance optimization where you call the same query a couple of times and the execution time is different, like next time is faster. Is this one? Mm, maybe. Yes, yes, this one. And is it linked to the query object instance or if I will just apply the same completely equal new query object and perform it, will it be optimized or not? Yes, so I think you ask if you do any caching here. We don't do any caching, we just take the query and we optimize it just in time and run the, yeah, so if you would swap these two rows, let's put it like that, you would have seen the same numbers. We don't do any caching in the query. So we do it in the query, but not once the query is finished, it's done. And if you run the query again, we do the recompute. I think that's the question. Okay, thank you. Hey, so you talked about schema check, which actually prevents from errors during the pipeline. That's great. I wanted to ask, how do you actually get the schema? Do you infer it or do you ask from the developer to actually give like a JSON or something like that first? And then the second part would be that, do you assume that all data points conform to this inferred schema or do you actually check it? Like a scan all the data and check if it conforms. I don't think that's the case. Yeah, so one question is, do we infer the schema? Yes and no. If there's, in a Pakeh file, the schema is defined. We just take the schema from the Pakeh file. This is also with an arrow file, but in JSON or in the CSV, if you make a lazy query plan, we take the first 100 rows or something you specify, we infer the schema and you can check the schema before you run the query. And if you're not agree with the schema we inferred, you can modify that part, but Polar's knows the schema on any point in the lazy frame. So on any node, you can ask what the schema is and we will be strict about it. So we will not never change the schema during when we run the query. That's your second question. If we compute something, we will adhere to the schema, the lazy computation promised it was. Thanks. I'm a teacher at the university. Can my students just use this? Or should they first subscribe and pay and have... No, no, MIT license, you can do whatever you want with it. Okay, thank you.