 Our next speaker says that it's possible to avoid this by use of event streaming architecture. Let's welcome Kai Venner, Field CTO of Confluent and find out more. Kai, nice to see you. Okay, thank you. Let me just jam my screen and say hi everybody. We can certainly hear you Kai and we can see your screen so take it away. So I hope you can hear me now. We can hear you. We can see the screen. Yeah a little bit of sound problems but I will start now and I hope you can hear me. Well, we can hear you. We can hear you. And I'm also sharing my screen so I can't see that yet somewhere. So can you just give me a thumbs up? This is the problem of the live conferences, right? Kai, we can see the screen. Everything is okay. We hear you and see you perfectly. Okay now we see everything. I will start now and sorry for the technical problems. That's the co-founder. So here we go. Today as introduced I will talk about event streaming. Okay, we're going to take a five minute break. We're having a few audio problems here. So we'll just take a short break and we'll be back. Thanks. Hi, we're back again. Let's see if we're lucky this time around. Kai, can you hear me okay? I hope so. I can hear you very well now. Now you're streaming really well. I hear you perfectly. Okay, cool. So let's try it again. Try it. Can I get started? Yes, please. Okay, cool. So sorry for the technical problems. That's the COVID times, right? As introduced, I want to talk about event streaming and machine learning and a few different architecture options you have here. So let's directly get started. Thank you for our story. So no worries if you're not working in the automotive industry, you can adopt that to any industry like finance, insurance, telco or whatnot. But it's always easier to show you an example. And we will use the example of a connected kind for structure today to show you how you can enhance that with machine learning for real-time predictions. So the scenario in the setup today is that we already have a connected kind for structure to connect to hundreds of thousands of cars in real-time. That's what we have done with several customers already in the real world too, like Audi for example. And we want to enhance that with machine learning for predictive maintenance. That's in the end the use case I want to talk about today step by step. Definitely beginning just quickly. If you don't know Apache Kafka well, I really want to explain what I mean with event streaming. So it means to continuously process data, often at high volume. But the point is really that you correlate and aggregate data from different sources. This can be real-time sources like sensors from the car. Or this can be databases and backend systems like a customer relationship management tool or ERP system. And then you correlate that data in real-time. That's what I mean with event streaming and that's the heart of Apache Kafka. And based on that, this is such an architecture where we have a connected kind for structure. On the left side, we have hundreds of thousands of cars which produce data in real-time, continuously. And then the streaming platform is used to integrate with these systems with the sources and with the things and to process the data. And so this can be real-time systems like a monitor and alerting system. So that you send an alert to the driver or car that he should stop the car in the next 10 minutes. But it can also be batch system and analytics tools where you create reports maybe over the driving behavior of the last week or something like that. So this is a very common architecture, and often you have more than just one consumer of the data. So now this is again in this talk because we're not introducing Kafka yet, but really want to enhance it with machine learning. This is the status quo actually already. And with that now, we want to add machine learning to the story and see how it works together with Kafka and so many projects. Here's just a few different examples. I think at this conference here, you know what machine learning is and how it works. My key point here is really that translation for your Alexa, that's a one-predictable use case. But on the other side, it's also there to improve existing use cases like fraud detection or something like real-time tracking of the cars. And that's what we are working on today in this example. And we are using it for predictive maintenance using sensor analytics to do predictions on that in real time with analytic models. Again, I don't want to talk too much about machine learning here. My point is really it doesn't matter what algorithm you use. It depends on the use case. In this example today, and we have built that at a demo, so we're using an autoencoder, which is one kind of neural network or deep learning architecture. It's unsupervised and it's used to detect anomalies. So that's perfect for sensor data in the car engine because you can detect if things change and it's not expected to change. And then you can send an alert by that to the driver to stop the car soon. That's the main idea here. But again, no matter which algorithm and use case you have, it applies the same architecture you see today. So here we go again. This is the same picture as before with the architecture, but now we enhance it with machine learning. And the great news is you don't have to change your existing architecture. You just add additional components. In this case, in green you see on the left side, we do the model training, which is the first part of machine learning. You take historical data, either from a data lake or a picture without a data lake directly from the streaming platform to train a model on that, to find anomalies in historical data from the hundreds of thousands of cars. And then as a second part to leverage that, you deploy this model into an application for real-time scoring, as you can see on the right side. And again, so often this is not a news case, but you want to improve existing use cases. So you might already have an existing real-time learning system analyzing your ODB2 data from the cars. So the onboard detection data is that. But here now we embed an analytic model into the same architecture and application to do real-time scoring on every single event coming from the car. And this can happen parallel to all the other applications you have in place already. So let's talk a little bit more about how to do that. Because the big problem is that actually writing analytic models is just a small piece of the puzzle. And when you go to your favorite machine learning framework on the website and tutorials, you see that you can write 20 or 30 or 50 lines of Python code to great models, which are accurate and do great predictions. But then when you want to deploy that production, it's a little bit harder. So how do you do real-time scoring milliseconds end-to-end? How do you do that at scale for 100,000 cars? And how do you do that 24-7 without downtime? And that's typically not what the data scientists' toolings do, like a Jupyter notebook, right? They are used for development part, not for the deployment part. And that's in the end this impedance mismatch we need to solve. And we will solve today. So just to make this even more clear, this is a great paper by Google, if you have not seen that before, they had a technical depth in machine learning systems. And if you map this to our connected kind for structure, you can imagine that writing the Python code in a Jupyter notebook, that's still, of course, an important part and not easy. A small part of the overall architecture and solution. Because first of all, you need to collect all the car data from hundreds of thousands of cars in real-time at scale, 24-7. And then build a serving infrastructure with real-time monitoring and all these kind of things. And therefore, you see the ML code in the middle here, that's just a small part of the architecture. So the great thing is, we can learn from many other companies which already did this. And here's a few of the examples, like Netflix with Chaspil, the scalable recommendation engine, like Uber, which does all of their data and conversations with the customers in real-time, like for the taxi drivers, for estimated time of arrival, for cost predictions, traffic predictions, all these kind of things have to run in real-time at scale for millions of users. Or like PayPal, where you don't have just one model for fraud detection, but really plenty of models. So what's coming about all these reuse cases is that they all use Kafka heavily, because it's such a good fit for doing real-time processing at scale 24-7. Because all of these mission, these workloads are mission critical. And that's really what we also want to do now in our connected car infrastructure. Or if you're working in the finance field or in insurance or telco, it's the same principles, right? But we can learn from other companies which have already done this and have written blog posts and talked at conferences about this. So if you don't know Kafka or not well, then this is the one slide to introduce it to you. So Kafka is an event streaming platform. This means it's built, yes, for real-time messaging for millions of events per second. So it's a scalable messaging system. But it's really important to understand that Kafka has much more than just messaging. It's also a storage system. So all these events can be stored in the middle in Kafka. And the key difference is what you see in this picture. They are not stored like in a data lake, where you ingest everything like in a table, for example, and store it there at rest. But here it's stored in an event-based manner, with timestamps, with guaranteed ordering in the logs. So it's still an event-based system. But it's also storage to decouple all the producers and consumers from each other. The car sensors continuously process data, no matter if the consumers are falling behind or maybe even have downtime. And all of these consumers are also decoupled from each other. They don't know each other. They just connect to a Kafka topic to consume the data. Some of them in real-time, some of them in batch, and some others via a REST API from a mobile app with request response. And that's the key difference between Kafka and other systems like a data lake. The data is event-based and in motion, but it's still a storage to decouple the systems from each other. And in addition to that, Kafka also provides capabilities to integrate different sources and things with Kafka Connect. And it provides stream processing capabilities, like you maybe know it from Spark Streaming or Flink or Storm, with Kafka Streams. The key is that all of this is Kafka-native then. And this is really what Apache Kafka is. So it's much more than just a messaging system. And I really cannot emphasize enough the point about decoupling. So each client application can use their own technologies, like the CRM integration can run with Kafka Connect, while some legacy integration runs with an ESP or ETL tool, which also has a Kafka Connect trend in the meantime. And on the bottom right, you see in another domain, you can use your own code, like Python for a data scientist or Java for the production engineer. So it's totally up to you. And that's the great thing about Kafka and its ecosystem. And this is very different from traditional middleware tools like an ESP and ETL tool, where you would have to put everything into the one single infrastructure and platform. Here, you can really choose everything for the client what you need. And here you see, I mean, of course, you can also use other frameworks and combine that, like, instead of using Kafka Streams or KSQL, they be the native Kafka technologies, you can combine it with others. They all have trade-offs and pros and cons. There is no single best option. So as I said before, Kafka was built for scale. I don't want to talk about that here anymore. That's battle-tested in thousands of companies. What I really want to emphasize is that it's used everywhere in the meantime. And what's even more important is that it's not just used for big data. That's what I always point out to new people. So Kafka was built for scale, and it still hurts. But over 50% of the projects I see today are not for big data, but it's more about transactional data and really important messages like instant payment platforms, like fraud detection, like all of these customer conversations. That's more like transactional data. So that's not terabytes and more. That's maybe a megabyte per second or even less, right? So maybe 1,000 messages per second. But Kafka is still so great because of its characteristics, like 24-7 uptime, distributed highly-available system with rolling upgrades, with backwards compatibility, and all these things you need in such a microservice architecture. And therefore, unfortunately, in this talk, I don't have so much time to talk about all the Kafka use cases. So I really just want to point out there are so many, and that's why people use Kafka so much. There is not just one specific use case for Kafka. 10 years ago, people started using Kafka as a messaging and a gestion layer into Hadoop. That was the first prominent use case also in Europe. But today, we see really any kind of platform that can use Kafka for different reasons. So that's really the key point to know. It's much more than just an ingestion system in her data lake today. And now, with that, let's go back to the machine learning story and how that relates to Kafka. You maybe can already imagine that. So again, machine learning is two things. It's model training, model building, that's on the bottom right. And for that, you need data collection at scale in real time. And that's what Kafka is. And on the other side, when you want to deploy an application where you want to do real-time scoring, 24-7 in milliseconds, even for millions of messages. That's also what Kafka is. And therefore, it's not just a core of Kafka, the messaging and storage layer, but it's really the whole ecosystem that you can leverage here. Like in this example, we use Kafka Connect for the data integration, both to the producing applications, like for cars, we maybe use the MQTT connector, but also for the consuming application, like a data lake or like any other analytics tool. And the data scientist maybe uses the Python client to use the data from Kafka in his Jupyter notebook and combine it with other tools like NumPy and scikit-learn. So that's totally up to you, and you can mix that like you need it. So let me walk through this process in a little bit more detail here today, how we built it. So first of all, this data ingestion. In this case, mainly the car data. So we use Kafka Connect and maybe with MQTT in this case, when you have an MQTT gateway somewhere in the middle, and then you ingest the data in a Kafka cluster. This can be running on premise or in a cloud, or you can use a fully managed service like Confluent Cloud, for example. That's the first step. And what we see in practice in real world today is often that many customers even separate this so that they use a production Kafka cluster for the mission critical workloads, with SLA's like almost zero downtime and zero data loss and low latency. And then they replicate the data into a second analytics Kafka cluster. That often has very strong SLA. So there it might be okay if it's down for an hour. For the production cluster, it's not okay if it's down for an hour. And based on that, you can separate based on your SLA's and requirements, and also by your teams and business units. So no matter which of these architectures you use, you ingest the data into the Kafka cluster as first point. And then we do the data preprocessing. This is the second part. So typically, when you are a data scientist in machine learning, you use a Jupyter notebook, you do rapid prototyping, and use your Python code for that. And that's okay for the building the model and so on. But if you need to process the data for hundreds of thousands of cars in real time, then your Python script in Jupyter is not the right engine for that. And therefore, at least most of the parts can be outsourced into something like Kafka streams or KSQL, where you do these standard steps like filtering, transformation, aggregation, extracting features in a Kafka native scalable way. So that's a huge advantage of doing that with this, because this is also solving an impedance mismatch between the data scientist and the production deployment. I want to show you one example here. This is exactly where we solved this impedance mismatch in a very smart way. Here, as you see, we still use Python and our Jupyter notebook for data prototyping and rapid development. However, we also use Kafka native technologies. In this case, we use KSQL, where you write a SQL query, in this case for an ETL process like filtering. In the same way, you could use the Python client, where you consume the data into Python. So it's optional, but you can use Kafka native technologies for all these ETL processes and still do it in your Jupyter notebook and rapid prototyping with that. But then when you have developed something that works well, then you can deploy the same query into production, into a KSQL server, without any code changes or any additional code. And this is really solving this impedance mismatch. And no matter how you process data, then you can find ingested, like in a data lake, where we want to do a model training. The key piece here is that really, while in this case, we ingest the data into Google Cloud Storage, the data lake, an object store. In addition to that, and completely decoupled from that, we also use Kafka with other consumers. Some are real-time, some are near real-time, because they are all decoupled from each other. That's a great thing. But now let's focus more on the machine learning part with the batch mode, right? We have ingested it into the object store in our data lake. So finally, we can do all these things. For example, TensorFlow. And this is not Kafka, right? So this is your machine learning framework of choice. This is not related at all to Kafka. Here we now do our model training, which can take minutes, hours, days, whatever your scenario is. And then finally, we got our model, which we can use. As I said, in this case, I use an autoencoder, an unsupervised deep learning framework or algorithm for doing anomaly detection. But no matter what it is, it's just a binary. It's a model. In this case of Protobuf, in this case of TensorFlow, it's Protobuf. With other tools, it's binary, it's proprietary, whatever it is. It doesn't matter. It's a model, which you can ship them there and deploy it. However, now, and this is really a key part of what the title said in the beginning in introduction, the other option, and this is what we did in this architecture, we don't even use another data lake. In this case, as you can see, we directly consume the data from the event stream, from the Kafka log into our TensorFlow engine, which is, for example, running in Docker containers or in Kubernetes or something else. And then there we do our model training. So this is really key to your architecture, because it simplifies it. You don't need another data lake like HDFS or something like that, if you don't want to use it for other use cases, too. You can directly consume from Kafka to train your models, like we did here with TensorFlow I.O. in its Kafka plugin. And this really completely simplifies your architecture. And I'm not saying you shouldn't use other data lakes anymore, right? This is the provoking part, of course, only, but you should think about it because it can really simplify your architecture. And especially because now you can also leverage tier storage for Kafka, which means that this is really one platform where you don't have justice, the brokers with this attached, but you have a decoupled object store somewhere where you offload all the big data sets. And with this architecture now, you can have one simple solution with Kafka. You can now call Kafka the data lake if you want. So the term doesn't matter. It's the architecture, which is important. But with this now, you have one single platform for doing both real-time streaming and processing and long-term storage for machine learning, for batch, for reporting, for request response, and so on. So this is the huge advantage of this architecture here. So let's think about that in a little bit more detail. Because if we think about machine learning, we want to consume historical data. So we need to store data in Kafka long-term, right? And this is possible with Kafka since the beginning because you can't set a retention time and it can be only a few days. It can be a month. It can be a year. It can be forever. That's possible for the retention time, you said. Minus one means forever. The big problem with that is that, of course, the storage gets more and more expensive and Kafka brokers are attached to HDDs or STDs. And that can get very expensive when you have terabytes of data. And also scalability gets an issue when you have to re-synchronize because a disk is broken. So honestly with Kafka itself, it's not really ideal for big data sets and storing historical data forever in big data terms. And with tiered storage, most of the data from the Kafka broker into a remote object store, huge advantages. Look, the cost savings are there because now, like in an other data lake, you store the data in an object store like S3 or in MinIO on Kubernetes or pure storage or whatever your object store technology is, no matter on-prem or in the cloud. And with that, you have better scalability and you have reduced costs. And now you can really store petabytes of data in Kafka in a cost-efficient way. Here's a picture which shows that even better. So the main idea is to really offload most of the data into an object store. The great news is that this is not any change to the application. The application is still the same Kafka consumer API like before. So you will not have any breaking changes in your existing Kafka applications. It's just a back end under the hood which changes this. And that's the great benefit of that. And concurrent tier storage is already available today in GA. And we are also working with the community to add this tiered storage interface into Apache Kafka itself into the open source framework. Uber has the lead here and we're working with them. And it's expected that tiered storage, the open source interface comes into Kafka in the Apache Kafka 3.0 around the middle of the GA and available for several different object stores like S3 or Google Cloud Storage or there isn't any bottom right. For example, for compliance and regulatory reasons. We have customers, for example, which use Snowflake as a data warehouse. And every time they change the schema, they need to replay the data because they need to reprocess it and store in the table differently. And so they reprocess it again from Kafka and adjust it again into the data lake or into the data warehouse. So there's really plenty of different use cases of why people process data. And the significant difference again is with Kafka, the heart of it is real-time and event-based. So you can have a real-time consumer like here on the right in green, but also you can have consumers which consume historical data. Like your data scientist which uses a chip in a notebook and consumes historical data. Again, the data is order and it has time stamps. You can just take the data you need and process it in your favorite machine learning framework, for example. So I talked a lot about model training and data integration and processing here. But in addition to that, let me spend the last five minutes on the model deployment part because this is typically really separated from the model training part. And often this is where the mission-critical workloads really are. While model training is batch, model deployment often has to be real-time and scale and 24-7. But this is no problem because again, you get a model from model training. It's a binary and you deploy that somewhere. With Kafka, there's different options. So I had a Kafka summit talk where I spent 60 minutes just on this topic. So it's complex in the detail, but on a high level it's pretty simple. Either you use an external model server like your data scientist know it. In this case, we use TF serving for TensorFlow. And then on the left side, from the streaming application, you do an RPC call to do a request with the incoming data to the model server. And then the model prediction is done and the prediction is sent back to the streaming application. So this is okay for many use cases. But keep in mind that an RPC call in the middle is not really a streaming pattern. And so you have to worry about things like what happens in a case of error? Or what if there is latency problems? Or if the model server only supports HTTP and this doesn't scale well for your use case? So there are some tradeoffs. And therefore, the other option is to embed your model directly into your streaming application. No RPC call to any other model server. With this, you have the huge advantage that it's very robust and low latency. And it's just a single application which includes both the stream processing engine and the model predictions. And for an example, like a connected kind for structure where you want to score on every single event in milliseconds, this is probably the better approach. But again, both approaches have their tradeoffs. Just check out my Kafka summary talk where I talk about this in much more detail. Here is one example of model deployment. So in this case, we use a case SQL query with your rights, which just get the sensor data from the car. And then we apply a UDF on that. And in this user defined function, I embedded the TensorFlow model we have built before in the cloud. But this model now is really executing every single event coming from the car. And the great thing is again, this can be developed in your Jupyter notebook with Python or something else. But then you can deploy exactly this query here into a SQL server in production, either in your self-managed Kafka cluster or in a fully managed service like Confluent Cloud. And with that, you can really process millions of events per second by just deploying this query, no coding around it. And this is really exciting, I think. And the reason why that works well is because this also is just Kafka under the hood. This is Kafka native technology. And so this is really one of the coolest examples to combine machine learning together with Kafka for model scoring. Because you really get real-time scoring at scale highly reliable this way without using another data lake or another streaming engine. So this is also important for deploying this highly available because this is just one ecosystem and infrastructure you have to operate 24-7 now instead of two or three or four different engines. So now if we come back to our architecture from the beginning, this is what we have now implemented here step by step. Right, as I said, so we have some gateway to the cars, often via MQTT. And then we get the data into Kafka. Here in this case, we leverage tier storage so that we can store the data long-term. And then we also do real-time processing like with KC SQL, they'll be on the bottom left and on the top left, sorry. And then the TensorFlow model training happens here on the historical data. So this means either you directly stream the data in real-time into TensorFlow and wait until you have enough data in the TensorFlow engine to train a model or you start it up every day or every week and consume the historical data from the Kafka look and train your model based on that. Again, the key significant difference is you don't need another data lake anymore. And you have to ask yourself, do I need another data lake? Sometimes, yes. So if you want to do map reduce or shuffling or these things where Spark and Hadoop were built for, that's what you still need a data lake for, right? That's what they upgrade for. But for some other use cases like training an analytic model and doing real-time processing at scale. So that's where probably you get it much easier and better and simpler if you just use Kafka's infrastructure for that. And then completely independent of that. So no matter if you train your model with a data lake or directly from Kafka, you then deploy your model into another application. So you can still deploy it into a decoupled, lightweight, scalable Kafka application like on the top right here. And completely independent of that, you have any other consumer, like in this case in our example, we have built a digital twin in MongoDB with that. And this is completely decoupled from all the machine learning stuff. And that's a great thing about Kafka. You can combine whatever you want to do over time step by step. And we actually have built this, what I showed you today as a demo. So this is running on top of Kubernetes. And we're using Terraform to deploy this. So you can really check out this demo on GitHub. I also have a recorded demo 15 minutes about this. So check that out if you're interested. Just go to my GitHub site and take a look at the demo. Just take a look at a YouTube video there or maybe run it by yourself or anything else. So my last slide summary is really what I showed you today is one pipeline to rule them all. And this is of course, a little bit of provoking title here again. So again, I'm not saying don't use any other data lake. So if there are use cases, do that. But I hope you today understand, understood that you can simplify many architectures because Kafka is much more than just a messaging and suggestion layer. Kafka has capabilities for storage where you can store it long-term and reprocess the data. For example, from your Twitter notebook, the storage is also great for decoupling producers and consumers. Like if a consumer is down or if a consumer is falling behind, that doesn't matter. The producer produces the data. Our car doesn't ask. They produce that data continuously. And Kafka handles the back pressure for that. And in addition to that, Kafka provides Kafka Connect for data integration, both to legacy systems and modern technologies and streaming capabilities with Kafka Streams or KsqlDB. And so with that, you have a real event streaming platform. And that's in the end the main goal, which I wanted to show you today, especially on the example of machine learning. And I hope this worked well for you. And with this in mind now, here's the last page where you can ask questions in the chat window. And I'm happy to answer anything and also feel free to connect to me on LinkedIn and Twitter to stay in touch. So I hope you learned a lot today. And now if you have any questions, just let me know. And again, sorry for the technical problems in the beginning. That's the corona world, right? So let's now check any questions. And otherwise, thank you for listening and have fun at the conference. Kai, thank you so much. What a fascinating talk. And I see you like provoking or provocative titles. It's good to see that data lakes are not going to dry up completely. They still have their place. And I very much enjoyed the way that you introduced your talk with the story there, the use of the car industry, and the clarity with which you explained everything was fantastic, even for someone not so technical, such as myself, it was pretty easy to follow. So congratulations on the talk. Now you mentioned Uber, Netflix, PayPal, and some of the tech giants that are already using and benefiting from this real time processing with Kafka. And then you went on to say that actually you could see almost anyone benefiting from this, although you were reluctant to pick a specific use case. Are there other places or sectors that you see where this is not being used yet and where it should be or it could be used? Yes. So actually, I mean, what's really more the cutting edge part is really storing data longer and then using the old data like in this machine learning example. This is where still people use data lakes because they have them. But this is where the advantage comes of event streaming, as I talked about. And also, I really emphasized a lot that Kafka is not a messaging layer, but still many people don't understand this because they learn to ingest data into Hadoop and then process it there. So this is really where people have to do more around that. But as you said, I didn't cover many examples here. We have a public example for connected kind of structures with Audi. And also there's public content from Tesla where they process trillions of messages. And the great thing is really that there is examples from every industry, no matter if it's financial services, telco, media, whatever it is or gaming or whatever. So what I can recommend really is to take a look at the past Kafka summits. That's public, that's free, that's on demand, that's slides, that's video recordings. And there you will find use cases for Kafka in every single industry. Right, I was going to ask you, you mentioned that to check out your Kafka summit talk. So where do we find this? Yes, just a Kafka summit website, there is all on demand recordings from the last release. And it's really hundreds of recordings and slides from every industry and a lot of mine talks also. Right. If we had to, and I know I'm forcing you here to actually pick one particular example, but if you could pick one example as a way of demonstrating where we can see the benefits more clearly, the before and after and we can compare and perhaps see how this technology has really benefited the company, which one would you suggest we look at? I mean, the one example where you really more or less have to use Kafka there was the easiest to find it out and that's typically when you have to process high volumes in real time. That's actually why Kafka was built 10 years ago at LinkedIn. They had to process millions of messages in real time. That's what ETL tools cannot do because they were processed for high volume, but for batch. And on the other side, messaging existence also exists for 20 years and for real time, but they were built for 100 messages per second. And therefore, if you have to process higher volume at scale, then this is where Kafka is the only option. And that's often where people get started for that reason. Okay. We have a question here which asks, how do you train a model with Kafka with a year of historical data? Yeah. So again, so that's also very important to understand. So Kafka is not the machine learning part, right? Kafka is the streaming platform. And then you connect your machine learning framework to that. But the important thing to Kafka is it's a lock where you append events and every event has a timestamp. And so if you have one year of events, they are appended to this lock. And then when you have your machine learning framework, like in Jupiter with Python, you just say, give me all the events for a specific topic for the last year or for the last month or only for May 2019, for example. And you then get all these events into your machine learning engine. And then there you train your models or pre-process them with your favorite tools like NumPy and CyclicLearn or whatever you use, like with any other database from a CSV file, for example. So that's exactly the same. The only difference is instead of connecting to a file like CSV or to our batch system like Hadoop, instead you consume the data from the Kafka interface. That's actually the only difference. Okay. Kai, the time is against us, unfortunately. I would like to thank you once again for the amazing talk. There are lots of congratulatory comments here. So I think everyone really enjoyed it. And I encourage people who have more specific questions to contact Kai directly, maybe via the networking section on the website or via the addresses that Kai gave us earlier. So Kai, thank you once again. Thanks a lot. Goodbye, guys.