 Hello everybody. Welcome. Great to be here at MesosCon Europe in Prague. Today I want to talk about Apache Kafka and Mesos, a combination we see a lot of different customers. So I work for Confluent. We work a lot together with the Mesosphere guys. And I want to show you today especially how to build highly scalable streaming micro-services. I know there's a lot of buzzword in there, but it's also really working well and that's what I want to show today in this presentation. So here you see the agenda for the next 30 or 35 minutes. First a short explanation about what I mean with scalable micro-services and what's the motivation for this talk. I will also talk shortly about Apache Kafka and the Confluent platform so that we all have the same knowledge about that. And then I will go into more details about Kafka Streams, which is part of the Apache Kafka open source project, to build streaming micro-services or in the end to build stream processing. And I will explain a little bit how it differs from other stream processing engines like spark streaming or fling or storm and all these others on the market. And then I will talk about why we see many customers using Apache Kafka, both the broker side and also Kafka Streams together with Mesos and DCOS and what's the benefits of that. And then also I have built a use case where I show a short live demo on how to deploy Kafka and also Kafka Streams micro-services to Mesos and scale it up and down. That's the agenda. Let's start with the motivation for this talk, scalable micro-services. And I think I just have one picture here about micro-services we all know this term already probably. I think the main goal is that we do not build the big huge complex monolith anymore, but smaller independent services. So here you already see this is not a monolith anymore. This is different independent services with specific functionality. I do not think here that each one is really micro, so I think the term is misleading, but it's its own function, its own domain of one of these services. So often I simply say service instead of micro-service also. And why do we want to build this kind of micro-service architectures instead of the monoliths? I think there are two key ideas behind that, why this gives us a lot of value. The first is they are independently deployable. So that means we do not have to monolith, which we develop and test and version and deploy. So we independently build autonomous micro-services. And this helps us a lot to scale. And with scale, I mean two things. I mean it's one about people. So that's very important thing so that we don't have all to work on the same project or on the same software package itself, but we have our independent projects, which we can test, debug, develop, deploy, reversion and deploy new features on that independently from the other teams. That's the first very big benefit of micro-services. That's more about the organization. And then of course about the technical perspective. And so we can scale in infrastructure terms. That's also very important. So we don't have to scale up or down our monolith, which is much harder to do. But instead of that, we can independently scale up and down our business functions in the end. So that's a huge benefit of micro-services. And so both technical advantages and also organizational ones, of course, all with the trade-offs and complexities behind that. But the question is if we want to use micro-service for our new project, how can we build that? And there we have seen many of our customers. To get there, you need a few characteristics in your architecture. One is loose coupling, which is very important, so that really these micro-services are independent of each other. If they are still connected and you can't upgrade one micro-service without relating to other ones, then it does not work that way. They really have to be independently deployable. And also the other key point is that it's even driven, so that this can happen whenever it works well for you. This means when we have micro-service which produce data, that's absolutely fine. But also this has to be done independently of consumers of the data. So some might consume this data in batch mode for further analytics. Another one needs a real-time update because of alerting or something like this. And so we really need this loose coupling and even driven architecture where we enable things like operational transparency behind all of that. That's more or less how we get to micro-service in a scalable way, which we can manage. That's the motivation for this talk in the end and what I want to show you here. First of all, I want to introduce shortly Apache Kafka and the Confluent platform. I assume most of you already noticed, but just on two, three slides that we have the same knowledge of that. So Apache Kafka was built as a distributed fault tolerant commit log. So it's not like traditional messaging where you have a queue in the middle and you send messages and then read them and they are gone from the middle, from the queue. But here you send them to a commit log. This is distributed and scalable. And many different consumers can read the logs when they want. Again, one can do it in batch processing, another one in stream processing in real time. Like the one could be elastic search, the other one could be a Hadoop cluster. That doesn't matter because Kafka in the middle also stores the messages. It's not just messaging, it's also storing of the messages. And that's what was the foundation of Apache Kafka. That was the beginning seven years or so ago when some guys had linked and created Apache Kafka. And this is also the guys which created Confluent three years ago, where I work now and focus on Kafka ecosystem. So with that, Kafka was created as messaging layer. That's what everybody knows it about. What many people do not know about that, it's already much more. So even if you just take a look at Apache Kafka, the open source project, which you use for a download from the Apache website, it already includes Kafka connect and Kafka streams. So in addition to the messaging, which you see here in the middle as the Kafka brokers, you also have connect to integrate with different systems. And it also leverages all the advantages under the hood of Kafka, like fault tolerance, scalability. And like other integration components or products, you have layers and connectors to things like elastic search, HDFS, S3 relational databases, and so on. So it's pretty easy to configure the integration with other systems. As option, you can still write your Kafka producers and consumers with APIs like Java.net, Go, Python, whatever, but you also can leverage Kafka connect if you want. And on the other side, we also have Kafka streams, which is also part of the Apache open source project. If you download and use the messaging layer, you also have Kafka streams already with that. And that's built to do stream processing on top of Apache Kafka. And that's what I will also explain and use much more later. But that's the foundation, that's Apache Kafka, the open source project. On top of that, there's also Confluent open source projects, which also many people use already. They're also under Apache 2.0 license. So that seems for example, like the rest proxy. If you do not just want to use Java or .NET or Python to produce and consume messages, also HTTP. Another component, I think my mic is off. Okay, now it's working again. So another component is the schema registry. The schema registry is there to define structures in Afro and then to validate that incoming and outgoing messages on the right way. And it leverages all the features of Afro under the hood like schema evolution. So that you don't have to deploy updates on the producer and consumer side at the same time. And that's a very important component for microservices where the components are really independent of each other. So the producer can upgrade to version two while the consumer is still version one and so on. So this is the Confluent open source components. And then of course, as a company, we also have enterprise tools like the control center for the end to end. Okay, I think maybe we take another micro, I don't know. Not sure yet, let's take the other one. So this might be better. And then we have things like the control center for end to end stream monitoring from the producer via the broker to the consumer to find things like duplicate messages, latency issues, lost messages, and all of that is a lot of powerful things. Or for example, the Confluent replicator to do data center replication, which is a much more powerful tool than Miromaker, for example. So enough of the Confluent enterprise part. In the end, the summary is that Kafka evolved. It evolved from a messaging layer to much more to a streaming platform. Even with just Kafka, you have included a Kafka connect and Kafka streams, and then a lot of open source and commercial tooling around that. That's in the end, the summary of an overview of Apache Kafka ecosystem. And with that, if we go back to the microservice thinking, it helps us a lot because as we see here, we have a lot of different microservices. And in the middle, we have the commit log with where you can send your messages to and where you can read messages from. Either we are Kafka producers or consumers with an API or with Kafka connect or with Kafka streams. And in this way, we can decouple the different microservices from each other and can upgrade them and they can be down independently of each other. So that's the huge benefit if you use Kafka in the middle here. And that's what we see at many people when they build their microservice architectures with many different technologies that they use Apache Kafka here in the middle as central nervous system for that. With that, we come now to Kafka streams, which I will then use also in a live demo later to build some real scalable microservices with Kafka and Kafka streams. Let's shortly talk about what Kafka streams is and what we mean about stream processing here. The first key point is about stream processing. I mean, probably most of you are aware of that. So we don't talk about request response where the data is addressed. This means typically if you use HTTP or soap or other technologies, you store something in a database or a disk and then you request it and read it again to get some updates somewhere else. So in contrary to that, with stream processing, we process the data while it is in motion. So we continuously process data in an even driven architecture. That's in the end also what Kafka is doing. Even if you can implement batch processing and request response via REST proxy, the main core is even driven here. So that's the main definition of stream processing. The point here really is that many people still think about stream processing. Well, that's faster than MapReduce. So that's how it started some years ago, where we had things like instead of using Hadoop with MapReduce, we then used Spark Streaming, for example, or Apache Storm and deployed that onto our big data cluster to do stream processing, which was faster continuous processing, but still on the big data cluster. So, and here is where Kafka Streams differs a little bit from that. No matter which one you take a look at, Spark Streaming, Fling, Storm, Kafka Streams, the concepts are the same, the idea. So you'd have a stream processing pipeline. So from the left, you'll see you have all the input information from sensors, from social networks, from logs, whatever. And then you process that. You process that, you do things like filtering, transformations, enrichments, and you can do more powerful things like aggregations, like applying contextual rules, or even applying machine learning in such a stream process. The important thing is you don't do that, we request response, but really continuously while the data is in motion. That's the basic concepts behind stream processing, no matter which technology you use. And so now let's take a look when to use Kafka Streams for stream processing. And the idea behind that framework is not really to have your own big data cluster for stream processing, but to really do it where it is needed. So it can be small stream processing. It need not always be powerful analytic stuff. It can be maybe just a transformation from an input or an output layer or something like this. And therefore, if you take a deeper look at Kafka Streams, the goal was to build a powerful but still simple tool to use. And that's what I want to show you to know how that works and why then also Mesos is a great combination for that to scale it up and down without using your big data cluster for that. So Kafka Streams is just a child library. So that means you can embed that into any kind of Java application. So here you see a few different examples where you can run that. You can use the very cool stuff like Kubernetes or Mesos or Docker and so on where you run Kafka Streams processors. You cannot still use the uncool things like our file for web application or anything like that. So we still have existing application monoliths and even there you can add that. That works pretty well because it's just a child library. I will go into more detail here later about how Kafka Streams works. And the main motivation for that really is that we have not just one big data cluster where different teams have to think about how can I deploy my services there and upgrade them and version them and change them or fix something because we can focus on our application. Like here you see some examples like a fraud app or monitor app or a commander app and a payment app. So you can use your technology. So one can deploy it locally, maybe in a web application. Another one can deploy it on Mesos. The next one on Kubernetes. Maybe Kubernetes on Mesos. I don't know. But they are all working independently. They can use their own versions and upgrade that when they want. And that means both. That means the technology under the hood with Kafka Streams but also on top of that the business application. So you do not depend on the big cluster which you have to manage and care about the resources there because you have your own infrastructure where you use Kafka Streams. That has all that trade off. So I'm not saying do everything with Kafka Streams now. So for some situations it might be better to have your big data cluster where everything runs. Then it's still the right thing for Spark Streaming of Ling or so. But this is simply another option if you want to have simple, lightweight but still scalable stream processing services. So let's take a closer look to that Kafka Streams. As I said it's just a library which you embed into your project. And from the features perspective it's pretty similar to all the others. You have the functions you need to do aggregations and windowing, transformations and so on. The huge benefit is that as it leverages Kafka under the hood, like all the other frameworks, like connect, it also leverages all the scalability, the high volume processing. You can do reprocessing if you want to start from the beginning again. All these features which you like Kafka for, you can do the same for Kafka Streams under the hood. And also it has local state. So while it means that you have different Kafka Streams instances, let's say 10 Java applications which run anywhere in a Docker container or in Kubernetes or in Mesos. And then if one of these fails, Kafka Streams automatically manages the failover. So it only uses the other nine of them, but without losing any data. Because under the hood it has local store, but also it uses Kafka as backup in the end so that it sends information to the topics so that you can be sure to not lose any information with that. And for that reason, you can also use Kafka Streams which is lightweight, but still powerful because you can scale it up and down without any data loss. The key operations look pretty similar. You have things like map filter, aggregate and join. That's the same like if you have a streams library or Scala collections or reactive things with these frameworks and so on. The key difference is really this is already stateful and fault-tolerant and distributed. And you don't have to care about that because you leverage the Kafka cluster. And also one more feature I want to talk about what's a little bit different from many others. Kafka Streams uses tables and streams. So both concepts are built into Kafka Streams. So streams are, you have continuous information with your process. Tables means like a database from the relational world where you store it and update it. So you only have the newest version in your Q and your topic. This in the end under the hood uses compacted Kafka topics if you know the details about Kafka. And so with that you can easily build applications or microservices which also store state easily with both streams and also with tables in one API in one instance. So here you see one example of such an application. We have an input topic. So Kafka Streams always gets information from Kafka topic and sends it to Kafka topic. If you want to store it for example also in a database or no SQL or Hadoop you would typically use a Kafka producer after the output stream or you would use Kafka Connect for that. You could also write directly from the stream process to some database but that's not really a continuous flow because then you have to handle this exception sensor on there and that's not that recommended way but you could do it as it's just a Java library which you embed anyhow. And then you run it this way. So the topics of course run in the Kafka cluster and Kafka Streams is deployed anywhere as I have discussed before. So here's one example for a source code. I don't want to go into the detail here and also in the live demo I will focus more on how to run that with Mesos because I think that's the focus of the talk. But here you see one example of a word count. So this is like my produce word count but it's not a batch mode but this is continuous word count. So for every new Kafka event which is coming in it continuously counts the words and so on. And the goal is here just to show you how simple that is. So it's the app configuration of your lines. It's the processing which is the key part of the class where you define all the aggregations, the filtering, the transformation and so on. And then you start processing. And this is embedded into one class which you then package into one char file or run on one Docker container or in Mesos like I will do later. So I will talk mostly about Kafka Streams. Now I also show it in the demo. We also announced KSQL two months ago. KSQL is a SQL like streaming engine on top of Kafka Streams. So that means KSQL is a Streams application under the hood but you don't care. You just write SQL queries to do continuous processes and continuous queries and so on. So it simply makes it much easier especially for non-Chava developers to write your own streaming queries on top of Apache Kafka just as an option. But I will now focus more on Kafka Streams and how to deploy that on Mesos. But in the same way we could use KSQL if you just want to write scalable queries with a query language instead of writing Chava code with Kafka Streams. So this was the intro to Kafka and Kafka Streams. And now let's take a look how that works together with Mesos and DCOS. So I typically use both words for the same. So I only use DCOS of course here for the demos also. And how to build scalable microservices with that. Here we see the Mesos architecture. I think at a Mesos con I don't have to talk much about that here. But in the middle you see the Mesos master quorum with one leader and two standby masters here. It uses zookeeper of Mesos here so its own instances. And then on the left side you see the frameworks like Marathon and on the right side you see how these are used for execution of tasks. Where in the end you do the execution then later. That's the architecture. And how does that relate to Kafka? So Kafka has different components. First of all in the middle we see the Kafka broker. That's the key part for the messaging. Where we send our messages to and where we read messages from. This scales up and down. You can have more and more brokers. So this can scale up horizontally. And on the other side it also needs zookeeper. The same way like Mesos. And as you have seen in the keynote this morning for Kafka it's also the idea to remove zookeeper the dependency. If we will ever see that I don't know both for Mesos and for Kafka because it's not easy to remove it. So right now it's the same status as for Mesos. You need zookeeper its own instances. By default it uses the instances of DCOS as we see later. But for big deployments we would recommend the own zookeeper instances for Kafka. And on top you see the Kafka streams applications. That's Kafka, Java consumers producers or Kafka streams applications which run independently. So that's the micro services which we built here. And on the right side we also see some other components like arrest proxy or schema registry which you can run in the same way also on DCOS. They are also available in the catalog as we will see. And so if we want to combine that here we see on the left side. In this case we can choose frameworks like Marathon or Kubernetes. And then on the right side we see how we run and the executers there. So on the top the Kafka broker. And on the bottom it's the Kafka streams instances and they can independently scale up and down. That's the architecture. And so before I go to the live demo let's talk a little bit about many of our customers combine Kafka and DCOS. In the end there are three key benefits summarized. So the first one is DCOS as you hear at Mesos. It allows automated provisioning and upgrading of Kafka components. This is pretty straightforward if you use Mesos and if you know DCOS no matter if you script or use the UI. And you can do it for all the Kafka components. And here again it's important to understand that many people do not just use the Kafka brokers but also the Kafka consumers, the Kafka producers and Kafka streams, Kafka Connect and all commercial enterprise components from Confluent. They can all run on DCOS if you want. Or so you can have some outside of course if you want. The second big benefit is the unified management and monitoring. So I really started also for the demo it's pretty easy to do with the UI and also the benefit is you can manage multiple Kafka clusters on one infrastructure including multi-talency. That's one of the huge benefits of course if you use DCOS under the hood instead of just AWS for example. So that's the second huge benefit and also here of course you can combine with other big data components. So as we have seen in the keynote so many of DCOS or Mesosphere customers use many technologies like Kafka, like Spark, like Cassandra and deploy all of them on DCOS. And that makes it so easy and also to give the resources to which framework needs it for the moment. And the third big benefit is then the elastic scaling and fault tolerance. I mean these benefits are typically not much different from the other components like Cassandra or Spark. So the huge benefit is still the same. But there are some special things so for example what I like a lot when I get started here is the Kafka VIP connection so that you really have one static bootstrap server URL. So this doesn't mean even if you start new Kafka brokers or a completely new cluster. So for example for my demo I started one on AWS and then one week later I restarted it again and I never had to care about things like IP addresses because I always use the same VIP connection so that I have just one URL which I use also from my microservices which communicate with the Kafka brokers. And therefore this worked really very well together with the Kafka ecosystem. So with that let's now take a look at a use case. In this case I built a scalable flight prediction microservice. I simply use this one because I use it in a lot of other demos where I talk more about machine learning with Kafka streams. So I have simply reused the same microservice which I have built before or implemented before in Java. And here you see the technologies I use DCOS and then Kafka brokers and Kafka streams. And of course in Kafka streams I can use any other kind of library because Kafka streams is just a Java library. You combine it with others like you want. In this case I built an analytic model to also show some more powerful example and I embedded the analytic model which was built with H2O framework and embedded that into my Kafka streams application. The use case is airline flight delay prediction so that we can predict the future flights are probably delayed or not. I use decision trees here but that's again just one use case no matter which kind of stream process you want to implement. It would work the same way. So if we go back to our architecture here we see now what we do. We have Marathon in this case and on Marathon I have two different kind of executors. I use the Kafka brokers so I have three different brokers running on my mesos cluster and I have Kafka streams instances. And we will see in a minute how we can scale it up and down and automatically of course DCOS manages if an application fails but also the Kafka streams application itself manages the scale up and down. So if one of them is stopped or gets crashed the processing is done with the other four instances. So here we are. That's in the end process. It's a pretty simple stream process. We have incoming data. We do some filtering and then we apply the analytic model and here just we see some examples of the source code but this is again to show you this is Java code which is in the end and one char file which we want to run. And so let's take a look at that now. As DCOS does not work perfectly on my laptop I tried it before with Vagran locally but the cooler got too loud and so I decided to do it all on AWS. That was a little bit easier and therefore I only have this as recording because otherwise you never know at conferences how the connection to the Wi-Fi is. But anyway, you see the same thing. So here you see the dashboard where I have all my stuff running. This is running on AWS with cloud formation and that was really great. I could set it up in 10 minutes or so without any knowledge about mesos before. And here you see the service I have already... Oh, sorry. It's good to know. Let me see. I think I have to get out of my presentation. So here this looks better. Let's get back a second. Here we are. Now this looks better. So here you see it again. Now this is the dashboard of DCOS which I have running on AWS with cloud formation. I started that in five minutes or so. And here now we have the services running. So I have now here a confluent Kafka cluster which you can select from the catalog. You can see there in the catalog of mesos all the different Kafka components and confluent components. And here we now see already I have the scheduler running for the Kafka brokers and I have three different brokers running. I also switched to the zookeeper managed by Exhibitor. Here you see this is the mesos default zookeeper and you'll see that it adds some information for a Kafka like the broker information, the topic information. Here of course it's important again. This works for small deployments for huge big deployments. We typically recommend to use an own zookeeper infrastructure on top of mesos for the Kafka parts. But that's then for the bigger deployments and for more discussion later. So this is the services I have running here already. And what I want to do next then is I want to also start my microservices. So we have the basic infrastructure running the Kafka cluster. And now we can deploy a microservice. In this case I give it a surveyed ID and I start with one instance and scale it up later. I say where my Docker container image is that is on Docker hub in this case. And after I give it a CPUs and memory, I simply selected some random numbers here more or less. And I select Docker engine to deploy it here. The configuration as you see here is done also in a minute or so and now I can start it. It's just a small Java application. It starts in I don't know three seconds or so. Only the refresh here in the web UI isn't refreshed every three seconds but it's already running now. And so this is the Kafka streams applications which is waiting for input data to do the predictions and then send output data to another topic. So here now we see one instance running. I had another few running before. So this is one active one. That's the important part. Right now I only have one microservice running. And so now we have the command line where it's important to see. So what I did after I created the cluster on Mesos, I also used the DCOS command line with the Kafka commands to create the topics. So we have an input topic and an output topic. The input topic is where we get messages into the streams microservice and the output topic is where the streams microservice sends data to. Here you see the URL. That's the important part about this VIP connection so that I always have to use the same connection URL no matter if I restart brokers or if new instances are created or added. And that's also a huge benefit I think for operations and not just for development. So as next part here now we can create a consumer and a producer. So I'm just waiting until it happens. So on the left side we have the consumer running which is waiting for new predictions. And on the right side I have a script which creates new input data. So everyone line is one new input information sent to a Kafka topic. And this is done continuously every second to simply demonstrate or simulate some input information. And as soon as I click return it will show on the left side the predictions. So in this case I admit I have to improve it because I always send the same line so it's always a prediction of yes with no more information but you see the kind of idea behind that. That I started producing messages continuously and the Kafka streams microservice processes the messages in real time and sends the information, the output, the prediction to the output topic which you see here on the left side. So that's what I did mainly with one microservice and now with DCOS we can easily scale that up or down in different ways. In this case I used a web UI. Of course we can do it via command line or via some kind of auto scaling features of DCOS. And what I will now do I start five of these services. This is five independent Kafka streams microservices. In the end I have Java applications and as soon as they get started which takes probably two seconds for everyone or so they also start processing. And that is what we will also see in a second. So each of these Kafka streams applications registers to the same Kafka broker and the same topics because it's the same application. And so Kafka automatically sends the information from the input topic to the different streams applications. So here you see just the old one is still running. So it's still producing messages. But now we will also see that it's processed by all the five microservices. So every message is only processed once and it processes it by each of the five instances processes some messages. So here we take a look at one of the log files to see that. This is one of the ones which I started later. This was not the first one. But still I see here some logging. So I implemented the microservice so that it does a log for every prediction it does. And you'll see here that it already has received some ones. And now we also see the one which I started first. Not after one minute ago but four minutes ago. And here we will see that it also still processes data. So how that's happening is that by default as my Kafka messages do not have a key it uses round robin to process the messages. So one two three five four one two three five four. And you can configure anything here. So for example if your messages have a key then it automatically orders it by key and sends all with key one to a specific streams application and by two to another one. Or you can kind of write any custom partition. That's the same concept as Kafka for Kafka streams here. So that's mainly what I wanted to show you here how you can scale it up and down. And how that works on DCOS so that you can scale up and down the microservices. So let's go back to the presentation. I go back here. Okay this was the demo. A few more notes on that. As I said on AWS it was pretty easy for me to get started with that. I just used the cloud formation scripts from Mesosphere that worked pretty well because I'm also not the DCOS expert which would like to set up this by your own. And then I configured the Kafka broker. So here you see some of the web UI. The interesting one is also on the left top side where you see there are many different components. So I just used the Kafka brokers here but all the other Kafka components from the ecosystem are available like the REST proxy, the schema registry, the control center and so on. You can all start and scale them in the same way we are DCOS. And on the next slide this is the only thing where I had really problems when I built this demo here for DCOS together with Kafka Streams. Because on the top left you see the Kafka client which Mesosphere uses and all their tutorials and demos when to use Kafka. The problem with that is it still works but it's a pretty old version. It's used as Kafka 0.9 and the current version is 0.11 which is much, much more sophisticated. And for just Kafka's messaging this still works in all the demos and tutorials but for Kafka Streams it does not work anymore because Kafka Streams messages require timestamps and timestamps were added in I think 0.10 and so the Kafka Streams applications on DCOS throw exceptions and so I had to build my own Mesos Kafka client. So this is just in the end the command line which I use to produce and consume messages. So simply just to give you a hint if you want to build your own, if you want to follow tutorials of Mesosphere with this is the only problem which does not work. You have to upgrade that or simply use my Docker container which I uploaded to my Docker Hub. So here is in the end the Kafka Streams microservice. It's also on GitHub with the links you see the slides later. It's pretty simple Docker files and it's really just a Java file. Kafka Streams is a small application. This is in the end a main operation, a main method in a Java class and just I don't know 20 lines of code where I use the Kafka Streams API and I use H2O for the model as I said here. So here you see the Kafka Streams. That's what we also saw in the video. It's more or less for reference and also if the whole video also with a little bit more details on YouTube so if you want to watch that again or try out and one last note here. So again what we did before now is Kafka Streams. Since two months you can also use KSQL. It's built on Kafka Streams but you don't see the Streams part but it uses the same concepts under the hood and with that you can write SQL queries to do continuous processing, continuous stream processing. So you can do things with that like anomaly detection, real-time ETL, building real-time dashboards. That's much easier with KSQL and so we already see a lot of adoption of that and the same could be used of course in the same way with DCOS. So you can use it in the same way in a Docker container for example and run it via Marathon or via Kubernetes on Mesos works the same way as you do it with Kafka Streams. So with that let's go to the key takeaways of the session, Apache Kafka ecosystem on DCOS for highly scalable fault tolerant microservices. I think that's really the important part that of course Kafka is used by many people on DCOS already like Cassandra and so on with this MAC stack. We had heard that in the keynote also but also the other parts like building the microservices around that. No matter if it's just Kafka producers and consumers or if you really want to build Kafka Streams microservices it works in the same way and very well with DCOS. And the great thing is you have a lot of features out of the box. Most of them are also true for all the other applications of this MAC stack and other components but also some specific ones like the Kafka VIP connection and that helps a lot to develop and operate Kafka applications on Mesos or DCOS and therefore also true about the Kafka Streams microservices. That's what I in the end wanted to show you and as we see many customers also using Kubernetes more and more. I would not be surprised if for example the brokers are running out of the box on Marathon as right now and many will use Kubernetes for the orchestration of the microservices. So the stack will then be DCOS Kubernetes and Kafka Streams microservice on top of that. So that was my presentation. I hope I go good overview about Kafka, Kafka Streams on DCOS. And now we also have time for questions. Yes. With Kafka the question was how long does or how does Kafka store the messages? Is it forever? And the basic thing is that it's configuration. So you can do whatever you want. There is a concept called retention time in Kafka. And then that can be either configured via time. So for example for the last four weeks or it can be defined by storage. So let's only store 100 gigabytes. And that can be done by topic and very specifically. And in that way you can manage exactly what is stored and how long it is stored. So it's just storing on disk. So it's not really an object store. There's still some kind of query on top of that. So you can also query information out of that. But the important thing, so it's not really a SQL like layer where you have indexing and so these things. So that's not what it's used for. So it's not simply also a search engine. So typically therefore use things like elastic search or so. Yes. So the question was, is it possible to do joins with Kafka streams and case SQL? And the answer is yes, there are several different joins available and with all different aggregation windows. But the key here is the joins are right now done on the keys. So the Kafka message have keys and values. And it's none of the keys. So you cannot do any kind of join which you can do with some other query or search engines. But some kind of joins are possible. Yes. The question was, is there an upper limit to the size of the, which can be put into the queue? So I'm not sure if there is, if you can put it a one terabyte file or I don't know, but so for the real world example, there's not really a limit. So there's, you can also use that for storing, for example, pictures or something like this. So it's pretty, pretty big limits if there are limits. So your question is if you want to aggregate information from different Kafka topics, you mean, right? Okay. So the question was how to aggregate information from different Kafka clusters, which all run on DCOS. So in the end, there it's the same answer as we would give without DCOS. So if you have different Kafka clusters running, and we have these situations, for example, if you're a real world example, Uber, Uber has, has Kafka running, for example, New York and San Francisco, and they have it run it locally to improve the processes of the taxis and so on. And then they also want to match that together for aggregation and so on. And there you typically do that via replication. So you'd replicate between the different clusters. This can be either done in active passive mode. That means that you have active clusters for the current processing. And you have a big backend cluster, let's say, where you store it from all the other clusters to do analysis, to do, for example, machine learning and find out insights. Or you can also do active active where you have three systems, which are all three important. And you have to do, if you update it at one, you also send it to the others for aggregations or so. That's also possible. There are two options for doing this kind of replication with Kafka. There is the open source mirror maker. And there is a commercial tool, Confluent Replicator, which has a few advantages and, let's say, more enterprise-ready. A comparison about that is also on the web. But the replication is the way to do that, typically. What's the question? Can you also run that on Marathon or? Yes, I think actually that's what happening. So that the Kafka brokers are running via Marathon right now. And I selected for the Kafka Streams application with Docker containers, but you could also do the same with Marathon. So that works the same way, yes. Yes. So the question was, how can you expose the Kafka interfaces to the outside world, outside of DCOS? Is it possible just to be a REST or some other ways? So that simply depends more or less on the security configurations and so you have. So what many actually really do is they use REST for that and the Confluent REST proxy, which is open source, because this is the easiest way. On the other way side, you have all the trade-offs of REST and HTTP, because it's lower throughput and lower latency. So if you can open the right parts and configure the securities in the right way, you can also do that. So Kafka support security standards, like Carboros and so on. So that more or less depends on your configuration and your setup, but you can also expose it to the outside world. And I think that's typically done at most customers because many of the producers and consumers are outside the DCOS cluster. Okay, then thanks a lot for coming. Also, connect to me on LinkedIn if you want more information and so on. And thanks a lot.