 Well, we're live streaming this on Facebook, and we are very short on time, so I'm gonna let them introduce themselves. They've been long time open shifters, so they've got a lot to say, and they're doing some really cool stuff, so take it away. Thanks Diane. I'm Adios. We're doing IT services for airlines, and we have been on the road with open shift since it started working on Kubernetes, so it's like four years now. We love it. Every year we find new users. We'll talk about this today, so Pierre, Olivier, and myself, Nenad, and Pierre, please start. Okay, so I will introduce you what is data streaming architecture or what is commonly called stream processing or real time data processing. It's a new kind of architecture that is event driven, so everything is based on an event log. You need to have this persistent event log, where every event that your application is logging is persistently stored in this event log in a strongly sequenced manner, so you have the guarantee that what you will consume is consumed exactly in the same order as it had been produced, and you need to produce also immutable events because as it is persistent, you have this nice flexibility to be able to replay your events. You need to be immutable so that you can replay them. So that's the principle. Your application writes events on this event log, and you can have different processors that will handle those events. So the very first obvious use case is to synchronize different data stores. It's very common those days to have an application writing on different data stores, and instead of putting a stress on your application to write directly on those data stores, if your application is just logging the data change on this log asynchronously in an eventually consistent manner, you can synchronize your different data stores. Here I put some examples, but you can put as many processors as you need in parallel. And here on the cake for this kind of architecture is that you can write your own applicative processor that will do some kind of transformation, business logic on your events, and will produce a new event log as a result, and you can compose a full application with a graph of processors, handling events, producing new events, and you can design your application in doing things in parallel or in sequence that way. You can see those processors as a kind of microservices, but streaming microservices. So it's a very nice way to design a asynchronous event-driven application. This picture is not from me. It's from a guru of this principle called Martin Clipman that wrote a very nice book on this. So in terms of tooling, Apache Kafka is a leading open source to implement this kind of event log. It is false for performances. It's very simple. And in terms of scalability, the latency is very low. You have companies like Netflix or Uber that are handling more than 12 millions, even per second through a set of Kafka clusters. So in terms of real-time treatment, it's there. And after for the processor, you have the choice. If you need just stateless handling, you can write your own processor by just calling the Kafka API directly. Things get a lot more complicated when you need to go stateful. And in that case, you have a set of open source available. One of them is Kafka Stream that is quite simple. It's just a library that you can embed in your own processor. But you have Flink that is very popular as well. And an other set of processors. Spark Streaming also is very present in this world. But more on the big data, even if they did a continuous processor recently, Spark is in a better position for big data. So doing aggregation, data aggregation, asynchronously. So in terms of advantage, as you are decomposing your application in this kind of asynchronous stream processing, you have this close coupling so you can tune and plug and play your microservice in this graph very easily. So in terms of resilience, it's very nice. You have this capability because you can deploy your application widely on the cloud without any problem. You will put as many microservice as required for each stage of your application. It's very flexible because you have a graph so you can plug in your graph, a new microservice anywhere. In the example I gave, I had free data store in parallel to updates. If I need to add a new one, I just need to plug a new consumer for my event log that will write on this new data store. It will not impact at all the other microservices. And for the auditability and error recovery, it's very nice because you have every event that is still present in the log. So you have this high visibility of what happened on your system. If you put high retention on your event log, one week, several months if you have the capacity, in terms of error recovery, you can replay your events as far as required in the past to recover your bug. So this is an example of a typical architecture that you can build with this principle of streaming. It's a standard architecture with three layers, the first layer ingesting your data, parsing them, in parallel archiving your data. So you can also replay very, very old events if you need from this archive. You transform your data into whatever required in terms of materials view for your events. And then you can detect on the next stage the functional events that you had on your input stream. And in parallel, you can store your events on a hot data store for real-time consumption, for example, via REST service. So that's the data ingestion part. And then you have those two layers, big data, that will handle those data via an ADOOP store first. And you can see here that in parallel, the data are stored in ADOOP in the ODS and in different parts of the architecture, all in parallel. So big data on the top and the real-time business layer on the bottom, where you can have business rules, for example, interpreting your functional events and doing some actions. Or any kind of business process you can plug on your graph. So this is just a typical graph. You can compose your microservice as you need. This kind of architecture is very nice for what we call data-driven because in that model, you can have your big data layer, analyzing your data in an offline process, producing insights of what you analyzed and pushing those insights, again, in the data ingestion part so that your business layer can interpret those insights. And decide, take actions according to them. So it's what we call data-driven that you put intelligence in your application based on the data analytics you did on the BI part. For example, the business rules could use those insights. Another aspect of data-driven is machine learning. So you can have your machine learning model that is built on the big data layer and you can deploy dynamically the microservice handling the machine learning model in the business layer in real-time. So it's a very nice way to do this kind of data ingestion, data processing in real-time. So that's for the high-level concepts. And I will use the floor-to-needle for the implementation part of Kafka on OpenShift. So how do we run architectures like this in OpenShift? First thing, we need to run in Kafka. And, well, Kafka and OpenShift didn't fit very well some time ago because Kafka, you have a stateful application, you have a cluster, every broker has unique identity, you need persistent storage, you need actually another cluster which is ZooKeeper. And in OpenShift, you would get your pod names like randomish, like there. So thankfully, we have a stateful sets now. When we started, they were called pet sets and they were beta, not supported. But they are there now. And we can use it. So what is the great thing that comes with the stateful sets? So first, they have provide stable pod identity, meaning that now our Kafka brokers can have correct names like Kafka0, Kafka1, Kafka2, no longer random things. They provide stable storage. You will get your persistent storage for the given pod always, even if it moves. And there are new things like ordered startup and ordered shutdown and rolling upgrades. So it actually runs fine. We have been running for one year, I think, loads like this. We're having very well. And some experiences. So Kafka is a stateful application. It's pretty much disk, performance is disk-based and network-based. So you want to run it on a good machine, like having SSDs. So using network affinity allows you to do this. Sorry, not affinity, it allows you to do this. Loading on the machines with the SSD, with the disk label. And those you want to spread it across different machines because when you lose your cluster, if one machine goes down, that's where anti-affinity comes in. And then some counterintuitive findings. So common wisdom, if you lose your persistent volume, use persistent funds because if you lose your pods, you will lose your data. But in data streaming architecture, if the time life of your data is few minutes, you will not have a normal amount of data. So you can actually rely on Kafka replication, which basically means you can rely on empty data. So on the local storage, there are no longer, not yet there, local persistent volumes that will be coming. But you can rely on a local volume on the SSD and having very high performance in Kafka. And we will be using Prometheus and JMAX for monitoring this one. So it's fine. We can have Kafka brokers running there, and we can consume this Kafka. But it's not the only thing you need when you're running Kafka. You need to have to configure it for the applications, for the topics. We have dozens or several dozens of microservices running at once in this platform. Each is consuming from a topic and producing to one or several topics. And we want to be sure that those topics exist in each and every environment. And we want to be sure that they're deleted if they're no longer used. Able to react on how much disk space is there. So reduce retention time or increase retention time. Give credentials to the clients. And ideally, we want actually developers to express this thing. It's not that they want to write a work order to someone and then someone has to type these commands. What we would like really to use, and what we do is, I think it would be subject to, when he talks, is actually use operator. As I just mentioned in the talk before, we have a Kafka operator which is basically monitoring a resource existing in an open shift. In our case, it's a config map. It can be a custom resource. And this one describes the topic. It's characteristics like how many partitions, what is the replication factor, specific properties. And whenever it changes, it will apply these changes automatically on the Kafka cluster with control. So there is no actually human error anymore involved. The operator is replacing this thing. It's great. Vange is actually equivalent to, when you mentioned service catalog, it's equivalent to the provision and provision kind of service catalog. And it can also deliver credentials to your microservices if you want to have a secured environment. So we solve having Kafka, running Kafka inside OpenShift. We also have this operator, which actually we share these ideas with different actors, RedHead also. I think there will be some announcements on this summit around it about similar kind of tools. And then the next thing is you have your platform with dozens of microservices there. And it can get kind of difficult to understand what's going on inside. Because it's no longer monolith. It's easy to deploy one service. They're all decoupled. But somehow your platform is not all decoupled. You have to understand what's going on. And it would be great if we can define your platform in advance and you can test it and replicate it again. So here comes another operator. We call that actually a platform template tool operator where we give possibility to architects and our developers and designers to design the workflow inside a data-driven application. And actually, when you look here, there's a screenshot from there. Each blue box represents a deployment config inside OpenShift. Each blue arrow represents a communication which would be mapped to Kafka topic. So the operator will actually monitor another config map and saying, okay, I need to deploy all these microservices. I will need to define all these topics. I will configure it and I can keep my cluster in consistent applicative state. So you can do it visually. You can do it as a code because eventually it's a yabo. But it can also be used not only for this design part. You can use for this operator part because from a view like this, we can go directly to an OpenShift console or we can actually, as we are using OpenTracing to trace messages across different microservices, we can identify if microservices are behaving or not. So all this is a great thing for data-driven applications. At the MADDOS, we are using Kafka also for some others and I'm leaving up to Olivier to talk about it. Okay, so I will finish with a concrete application where we stand with the use of Kafka for a large-scale application. We are speaking about thousands of nodes, especially for what we call the shopping, the capability to search and price products that we sell at MADDOS. Usually, the business is to file new availability, pricing, and then we need to propagate these changes into the nodes, the thousands of nodes. We have a first level of cache, but if we have all the nodes targeting this cache at the same time, then it's the missile for failure. So we have a second level of cache at each node level and we need to stand in validation of those caches with the largest burst of invalidations. We speak about 20,000 invalidation per second with large bursts coming all of a sudden. So the algorithm is clearly you send the notification and then if you're interested, then you pick up the data for the central cache. In case it's interesting for you. Kafka in this picture, we are not at the streaming level, but this is where we target to go. We use very small messages, 200 bytes, because we know that it will very, very well scale. We have a good stability and we manage to contain the cluster size, the Kafka cluster size. We use JSON format for sending the messages. Today, it's deployed with Ansible and typically what we're looking at is a deployment with an open chief operator and an operator. The stream analysis and the decision mechanism to fetch data is enriched and then we'll try to convey more and more metadata in the messages at the same time keeping the messaging as small as possible. In this picture, we do have cache base. This is the central cache. This is where we store terabytes of data and sometimes it changes more than sometimes. We dissociate the notification from the content in this application because it's far too big. The number of fetches being largely smaller than the number of notifications sent to the nodes. Where do we see our evolution? Today, we use Kafka as a database. We use the cross data center replication mechanism. When we transfer the data from one data center to another one, to one region in the cloud to the other one, we have a large interest, again, in the use of Kafka with operators because it simplifies drastically the operation and the deployment of the clusters. We can speak about deploying a new Kafka cluster on the spot as an increase and scale up the cluster. Basically, we are at the beginning of how streaming can be used at a very, very large scale. We look at the architecture presented by Pierre. We have a large interest in the Kubernetes operators. This is the end of my speech. We don't have time for questions. You can reach us. We will be happy to answer your questions. Thank you.