 Welcome back. You know in 2008 an American psychic Sylvia Brown she said that in the year 2020 a severe pneumonia like illness will spread throughout the globe attacking the lungs and bronchial tubes and resisting all known treatments. Spooky eh? But then again she also thought that a volcanic eruption would completely wipe out Japan, that obviously didn't happen, that the US would leave Iraq in 2004 which they didn't, that she would live to the age of 88 and she died at 77. So not all that good. You see prediction is hard as Yogi Bear once said, especially when it's about the future. But that hasn't deterred our next guest. Kaivirna is here, his CTO of Confluent and he's here to predict the top use cases for data in motion in 2022. So let's welcome our very own Nostradamus. Kaivirna, are you there? I can't hear you Kaivirna yet. Let's see if we can get your audio working properly. Can you hear me now? I can hear you now. How are you? Good. I'm glad to do some predictions now. Have you got your crystal ball? Yeah, something like that. I can't see it. All right, well best of luck Kaivirna. We're looking forward to your predictions. Take it away when you're ready. Okay, thank you. So hi everybody and I'm glad to talk here. And of course this is not really just predictions but I'm taking a look at the market these days and I'm looking at what our customers are doing next year and therefore I think today it's clear that everybody agrees that you need to process data in real time today for many use cases and around that there is a lot of different scenarios and architectures and I simply want to show you where the trend goes from our customer base and this will be very practical so I will show you several different architectures and real world examples from real world deployment so that you can really get a feeling about what's coming and what you can take a look at. And so in the beginning when I also prepared this presentation and thought about not just what our customers are doing and planning but also what is the market seeing and Gardner is always a good way to look at not because they coin a lot of new bus words every year they are doing a very good job in that but really they do see simply no matter at which one you take a look it's really the topics are it's more about data it's about being more intelligent it's about elastic and scalable infrastructures and so increase the flexibility and agility to be more innovative in your business no matter what industry you are in and I think this is really the lesson learned for 2022 and this is also now what I want to present you today so just as having the same background everybody in the end what I mean with data in motion as it said in the title I really mean that for many use cases you need to process data in real time because that beats slow data in many use cases no matter if you want to increase the revenue or reduce the cost or reduce the risk so there is always business goals but in the end it's about having a better customer experience and having a better business revenue and here's just a few examples across industries but if you think about that and if we can take any business event like fraud detection and banking you need to detect it before it happened and not in a batch process overnight if you order a taxi you want to know when it's coming and you don't want just to see a notification after it's here and you can go on and on with these use cases and this is just a foundation of this talk what I mean with data in motion everywhere and we will see where the trends are going in the architectures and use cases so and this is really still a fundamental paradigm shift for many people I think everybody agrees on the added value of real-time data but still similarly to what the cloud did maybe five years ago today most people are going in the cloud at least for some of their use cases in architectures and in a similar way today when we talk about data in motion and the technology behind that with event streaming this is also new about thinking about data like continuously processing it that's a huge added value but it's also a paradigm shift for your architectures and for you how you implement your projects and that's what I want to talk about today in more detail therefore just one more introductory slide so that we really have the same understanding when I talk about data in motion and about event streaming then in most cases this means Apache Kafka which became the de facto standard as an open source framework for processing data in real time with this solution there is other open source frameworks and products in the market of course but the de facto standard is Kafka that's why I'm using this for most of the examples and in the end the point really is that it's not just about sending data from A to B that's important but that's a traditional messaging system what we can do for 20 years already now the point is that you can integrate with different data sources and data things and process the data and aggregate it and correlate it in real time at scale even for high volumes of data and all of this together that is what event streaming and Kafka is and that's what we are thinking about as a backbone for the use cases we talk about today so here are the five use cases I want to show you and this is a little bit my prediction right of course it's also what I see from some customers that are a little bit let's say early adopters of new paradigms and therefore as I said I will also show you several real world examples about these topics today the first one actually and the principle is not new it exists for several years is the kappa architecture so you might wonder what that is so let's start with another principle first with the lambda architecture that's what many of you might know so this is often used in big data architectures for the last five to ten years so this means in the end you have different data sources and some of the data is processed in a real-time layer in a few milliseconds or a few seconds and on the other side you still have batch processes for ETL jobs for analytics for reporting and then you have a serving layer so that the different consumers of these applications can get the data out of there right so this is one example of the lambda architecture there's a second example in this case now we have a completely separate real-time layer and a batch layer so that in the end these are two completely different infrastructures and then on the client side on the consumer side you sometimes need to mix it together so no matter which of these two architectures you choose in the lambda architecture it has some problems for many use cases and here I'm actually referring to a talk and we have seen at the last Kafka summit from Disney so they explained their problems of the lambda architecture and this is really what we see in the real world from many of our customers like often you have to write duplicate code because many pipelines then are implemented twice once for the real-time stream and once for the batch stream and of course under the hood this adds a lot of complexity and in the end you have to operate two different infrastructures one for the real-time pipeline and one for the batch pipeline and this makes many things much much harder than it should be and with this in mind now about the lambda architecture here is the kappa architecture that's a term that Jay Krebs coined a few years ago Jay Krebs is the inventor of Apache Kafka so I think he's know what he's doing right and in the end what he defined is here the kappa architecture in this case in the middle you see just one pipeline so it's a real-time pipeline data in motion this doesn't mean that everything is real-time or will be real-time in the future no you still have your batch processes for reporting for example for training analytic models that's batch processes and that's what you're maybe doing a data lake or with a business intelligence tool that's totally fine but some other applications of course need to be real-time like the alerting system but now the key difference with the kappa architecture is that you only have one single infrastructure to build it's much less complex to build this infrastructure and because the heart of the infrastructure is real-time now you can provide the data both for the real-time and for the batch systems so let me talk a little bit more about the detail here so in the past the ingestion layer was where you only stored the data for a few hours or a few days and then you deleted it out of Kafka or another system because it wasn't in the other data lake already the benefit today is that many of these event streaming platforms today also provide a tiered storage so this means even in the event streaming platform you can store data long-term and this can be for a month this can be for a year and this can be forever and this can be gigabytes or terabytes or even petabytes so you don't have to worry if you want to store more data in Kafka in a cost-efficient way right and this is super important if you want to build a kappa architecture because in kappa many consumers consume it in real-time but still there is often batch consumption where you consume historical data so you need the long-term storage then in many cases and now with this in mind let me show you three real-world examples for a kappa architecture and all of them presented at a former Kafka summit so you can talk you can listen to their talks in much more detail if you're interested the first one is Uber so here you see exactly what I showed you before so Uber has many data producers on the left side and then they have built one streaming pipeline with Kafka and then on top of that on the right side you see many of them are real-time consumers some are directly consuming via Kafka clients some others use other technologies like Apache Flink that's where you can choose this per service or per application and still of course Uber has some batch processes like Hadoop in this example and the difference to a lambda architecture however is that even Hadoop as a batch layer gets the data via Kafka and again this simplifies the architecture a lot because you only build one pipeline to all these different consumers and that's the huge added value another example is Shopify so Shopify described in their Kafka summit talk how they leverage the Kafka log as the source of truth for many different systems so here it's always the same story right you store data in Kafka for real-time consumers and for batch systems and for replaying the data later for historical use cases so there's plenty of different examples I will not go into detail here because that's a talk on its own and that's what these companies gave on the Kafka summit talks so the third example is Disney where we have seen the trade-offs before already about the lambda architecture and therefore Disney is doing the kappa architecture to keep the architecture simple to reduce the code duplication to operate one infrastructure instead of two infrastructures for doing all the data processing and therefore with this I know I'm more or less running through these examples but again you can talk and take a look at these in detail for their presentations but with this first point you hopefully learned that often the lambda architecture can be simplified a lot and still combining real-time and batch data the second example I see more and more is hyper personalized omnichannel so this is actually not just in retail but retail is the most prevalent example right but you need omnichannel everywhere where you have customers taking a look at your data and in the end the challenge is to you know build innovative new business models and provide a better customer experience but with that also provide a better operational backend that is efficient and real-time and this is the core challenge we see at our customers so here's a real world example for that this is AO.com an electrical retailer so AO.com had a lot of stores across the country right but now they are also providing a hyper personalized online experience and this is much more advanced than what you might know from amazon for 20 years so 20 years ago we already saw on amazon where this customer has bought this product so maybe you are also interested in another product what's happening here now is much more context specific in real-time per customer so while the user or customer is on the website and while he's using his mouse to take a look at products or attributes he gets context specific information like upselling information or additional information about the product or maybe a discount or coupon for the product and all of that works because under the hood in the back end all the data about the customer is correlated in real-time the data is coming from the loyalty platform from the CRM system from the log analytics and so on to act in real-time while the customer is on the website and this is a super powerful example for omni channel and in the end the point again is with the event streaming platform in the middle you can't use the data like you needed for many use cases in real-time and for some others still in batch so this is super complementary to the kappa architecture we discussed before like in this case um when you are a car retailer a car shop then you correlate different events from the past like from the newsletters 60 and 90 days ago from the car configurator 10 and 8 days ago and then when the customer is walking into the car dealership on site then you provide the real-time location-based service so that the salesperson gets all this context specific data in real-time before the customer actually enters the store so that you can do the right recommendations for the customer and all of this is omni channel and real-time and with this real-time infrastructure in place the same data again can also be used by other people in business units like the data science team or the business intelligence team and here once again we see how these different concepts are complementary because with a kappa architecture you see that you build this pipeline once in the first use case for real-time but then when you have this pipeline other teams can also consume the data and not all of them are real-time like a data science team uses a python client for Kafka and then they only consume the historical data once to train analytic models with python and with machine learning frameworks like tensorflow and this is the beauty of such an architecture when you're provided omni channel for the customer experience but then also for all these other business units you have and there is also plenty of real-world examples for that right like and walmart which is the biggest employer in the us they leverage Apache Kafka as the heart for everything they are doing for omni channel for the customer experience but also for the back end for the real-time inventory and supply chain optimization so let's take a look at that so and once again the good news is all the examples i'm talking about today you can take a deeper look at that and our kafka summit events so all of that is for free and on demand so that you can take a look where the end users present about this and in this case walmart has built a real-time inventory system and that's super important if you want to provide context specific recommendations and information to your customer and it doesn't matter if the customer is using the mobile app and buy something online or maybe then wants to pick it up in one specific walmart store so you always need to correlate the information in the back end in real-time reliably at scale so that every customer can get the right customer experience and on top of that you can then give discounts or upselling or whatever your use case is and this is super powerful and therefore again walmart is just one of the examples we have for this so let me now go to this third point this is a little bit more over all industries like multi-cloud deployments this is a clear trend we see everywhere now many of our customers have a cloud first strategy today however still many other things are still running in their data centers and then some customers start with one cloud provider but over time or maybe with merchant acquisition most of our customers have more than one cloud and therefore this is super complex to integrate with each other and I mean as you might know from many other examples already in the last years Kafka is a perfect tool for integrating these systems and in many cases not just in one data center but across different data centers or different clouds because Kafka on the one side is a real-time streaming system but on the other side and that's the unique differentiation to our traditional middleware or messaging system in contrary to that Kafka is also a storage system and with that it really truly decouples the producers from the consumers and with that you can connect legacy systems that are maybe batch or file based and connect them with real-time streaming systems and this is where Kafka is super strong to be used for and so when we are now talking about different data centers or different clouds then first of all this is a logical view many people today use a new buzzword for that with the data mesh right but no matter if you call it data mesh or still talk about domain driven design and micro services which are also part of the data mesh the point is you have different applications and different infrastructure in different infrastructures and different data centers and you need to connect it to each other well and as we heard in the beginning real-time data beats slow data so also for the replication between the different regions and locations and data centers the Kafka protocol is perfect for that right as you're seeing this picture and with that different domains can build their own data products but they can then also replicate this data in real-time to another domain where someone else can consume it and maybe someone else is not real-time that doesn't matter because under the hood it's still the same idea of the kappa architecture if the replication between the domains is real-time then your consumer can do whatever they want to do with the data and here is one example for such a hybrid multicloud architecture and this is really what we see in the real world more and more across industries and I do not want to go now into the specific technology here shown right so in the cloud you might use a specific data warehouse like snowflake or you use a security tool like splunk for log aggregation and maybe on premise you're running an SAP system or a mainframe that doesn't matter and it's different for every single deployment but the point is now with event streaming you can integrate all of these different systems and infrastructures and the heart of this infrastructure is real-time and reliable including the linking of these clusters so what we are doing as part of the event streaming platform we can link clusters together no matter where they are so you can link an on premise cluster to your cluster in AWS to your cluster in GCP or Azure and then there in the cloud you can connect all these other systems to the related Kafka cluster and this is a huge benefit because with that again even across data center and clouds the heart is real-time and reliable and scalable no matter what you're doing then in one cloud or in one data center and this is really a clear trend we see across all industries and markets to leverage Kafka and integration for these kind of architectures and last but not least and one note on that so because we are going more and more to the cloud you should really think about that when you're going to the cloud then an event streaming platform typically should be a fully managed service because one main advantage of the cloud is that you can focus on the business problems and use an elastic scalable service right the problem is however that when you're going to the cloud today most vendors are just provisioning infrastructure for you for Kafka and then you still have to do all these other things by yourself like the sizing like the performance tuning like the bug fixing and so on so this shouldn't be the case and this is just I want to give you as a reminder when you're taking a look into the cloud really take a look what fully managed means because many vendors are just using this as a marketing term this is just a hint I give you when you're migrating event streaming into the cloud and with that let me go to the fourth item of this agenda today this is about edge analytics so we now talked a lot about the cloud and moving to the cloud with a cloud first strategy and this is super important right and this makes sense for most use cases however not everything can and will go to the cloud there is many different reasons this can be security reasons this can be latency reasons and this can be cost reasons and therefore the edge is also getting more and more important for many use cases and therefore we see more and more deployments where customers deploy the cloud Kafka cluster with Kafka at the edge and at the edge then really means outsider data center that can mean in a retail store that can mean in a factory in manufacturing that can be on a ship that can be everywhere and this is another clear trend we see for doing edge analytics completely decoupled from the data center or cloud for some of the use cases there's plenty of examples where edge edge makes sense one example is low latency requirements and here is a few examples where often 5G is used also so that you really have something like more like 10 millisecond processing end to end the cloud is not the right infrastructure for that and here is just examples like if you want to build innovative services around a stadium when you go to a soccer match or around gaming when you need to provide the data in real time between millions of users and many other examples where low latency is required and that requires some kind of edge computing so here is one example of a hybrid edge architecture as I've talked a lot about retail examples let's stay with that at the top you see the traditional event streaming cluster this can run in a data center or in the cloud and here you build your traditional monitoring systems you connect to the CRM like Salesforce in this example and you build your cloud applications and integrate with a cloud data lake however for some other use cases you need to do edge computing like in a retail store at the bottom and in this case then each retail store has its own event streaming platform so because in the retail store it's also important that you process data in real time think about the location-based service while the customer is walking through your store you want to send him recommendation or you want to send him a discount and you have to do this with low latency before the customer has left the store because when the customer has left the store then he will go to the competitor to another store and not buy the new store anymore so this only works if you act on the data in real time and the other advantage of that is that you can even do this in disconnected environments so in the US we have already deployed this with many customers where they said we have hundreds of retail stores in big malls but during the day the malls have very bad wi-fi so this doesn't work with replication to the cloud so we need to do this edge analytics in the store and the benefit is with a Kafka environment you can do that because on the one side you connect to the mobile app with low latency of the customer but on the other side you also directly connect to the point of sale where you do the payment with the transaction so this is really not just about analytics data but also about transactional data and then when you have a better integration with the cloud when the network is better again maybe during night because then the no people are in the in the mall and therefore the wi-fi is better then you replicate all the transactions that happen during the day to the cloud and because Kafka is not just a messaging system but also a durable storage system that is reliable with no data loss you can do this very easily out of the box with a single technology and infrastructure and again I'm not just talking about theories so here is one real world example where exactly this is happening and this is actually the extreme case of this because in this case we're talking about royal caribbean royal caribbean has cruise ships on the sea there is very bad internet connectivity and it's very expensive so they need to run a mission critical Kafka cluster on every single ship that is disconnected from the internet so they need to communicate with the customer and process transactional data on the ship and then when they get back to the harbor after three days of a cruise then they replicate all the data from the last three days into the cloud and then they go on to the next journey and the same happens again so for most of the three days this is totally disconnected from the cloud and the internet but then for a few hours they replicate the data when they're in the harbor so this is a perfect example for hybrid scenarios even if you're disconnected from most of the time and again this is not just theory this is real world and we deployed this with a lot of different customers not just in retail or here in this case on a cruise ship but also for example in manufacturing or in oil and gas with that let me come to the last trend we see and a little bit and this is definitely the biggest prediction I do for next year because with all these successful ransomware attacks and other security issues we have seen in the last 12 months every executive across every industry will have budget for cyber security and now if you think about cyber security like all the other use cases we've talked about before it has to act in real time and the big problem is and that's an example for many of these real world attacks often it's not just a problem about your IT but actually it's about the problem of one of the vendors and software you use like many of the most famous successful attacks actually are supply chain attacks this means that not you are having a bug but your vendor has one and that vendor might be much more small than you and therefore much easier to attack and with this in mind it's true again that real-time data in motion beats slow data like I've shown you from all these business cases before the same is true for cyber security and security so here if you don't have the situational awareness and the threat intelligence in real time then it doesn't work and this has worked at scale reliably because if you find out overnight in a batch process that you had an attack then it's too late because the data is already stolen or encrypted with a ransomware attack you need to act actively proactively or even predictively on attacks and this is again where event streaming is the perfect solution for that because with that you can connect to many different systems data sources and technologies and correlated data in real time so in this case all these yellow circles that's what you build for your business applications this can be the enterprise it for your customer 360 this can be industrial iot or ot for manufacturing this can be anything but the red circle is where connected to this you can build cyber security in the same way in a cross cutting way with event streaming and just a huge benefit of that and once again this is always overlapping with these other concepts we talked about like a kappa architecture or like hybrid multi-cloud architectures so we have customers that deploy Kafka everywhere and yes part of that is in the data center that's what you see on the right side and also on the top right in the cloud but on the other side a lot of this processing also has to happen at the edge like on a ship or even with a single embedded Kafka broker into a drone or a very small device of hardware and then you need to integrate all of these with each other in real time in a scalable way and therefore the Kafka ecosystem can be used for all of that and that's the huge benefit and then you can use one technology in architecture and deploy it across all these different locations of course they have different scale in the cloud you're elastic you don't worry because it's a serverless offering in the data center maybe you have five or ten nodes in a drone you only have one node but the technology in architecture is the same everywhere and this is also a huge benefit to simplify the architecture and to reduce the effort and the cost and the risk and just to give you one example here this is in the end what we implemented is one example what we call confluence sigma so sigma is an open security schema and protocol so whenever you do log analytics no matter if you do it with elastic search or with Splunk or maybe with Kafka and confluent or maybe a combination of them sigma is an open source protocol so that you can define the schemas and the log structure and therefore it's very popular and either or no you can process these messages from all these different data sources with the sigma structure in a batch process in a in a data lake or with Splunk or in another case you directly process the data in real time at scale with confluent with the stream processors with technologies like Kafka streams or ksql and so this is a perfect example how you continuously create situational awareness right it's very different to process this in motion in contrary to processing it in a data lake at rest where it's often too late to detect a threat and a fraud and once again here's a practical example so this is not theory this is practice intel is one of the very popular examples we show often about how they build a cyber intelligence platform and they do exactly what I said on the last slide so they are combining in this case confluent as the heart of the infrastructure for doing data integration and data processing at scale in real time with many different technologies integrated and as part of that they also use Splunk for some things like anomaly detection where Splunk is perfect for so this is really yet another success story and once again for all the examples you have seen today you can take a look at the the links in the presentation and google for that all of that is available as public presentations if you want to learn more about one of these use cases well and as you have seen in this talk that's actually what we are doing with our customers of course right so Apache Kafka was created over 10 years ago so today I really would say it's the de facto standard for data in motion so every bigger company is using Kafka across projects and most of them with confluent because confluent was founded by the inventors of Kafka today we are listed on the nest egg and have so many customers across the global cross industries and so much Kafka expertise because that's the only thing what we are doing we're only doing event streaming and therefore as a conclusion really you should take a look at that from this perspective you can of course always use just Apache Kafka the open source framework I typically see it as a car engine so you can use this it's battle tested it's scalable and then you can build your own car with that on the other side obviously many of our customers don't want to build the car they want to focus on the business logic so they buy the complete car and that's what we provide is confluent while we provide over 80 percent of the commits to the Kafka project and have all the expertise around that we also provide a complete car including security connectors data governance and so on and if you're in the cloud you're even more lucky because in the cloud we provide you the self-driving car level five it's truly and completely serverless so that you can 100 focus on the business logic and you get mission critical SLAs and consumption base price and with that um this was my talk I hope this was a good overview and I mean next year we will see how many of these predictions or trends are true and what you are doing about that feel free to reach out to me and connect on LinkedIn or stay in touch and um take a look at our other use cases and with that I'm returning back to the moderate and I hope you learned a lot in this and this discussion today so much that was great we actually don't have very much time for questions we're over running and a lot of people here are getting very nervous so I'm gonna have to be very brief I'm sorry about that but I encourage everyone here to get in contact directly with Kai and ask questions directly to him and just one thing we do need to know of course Kai is what is your track record at predicting things in general I mean should we should we take you seriously or not as a as a great psychic I did a similar presentation last year and it was very okay so most of that happened and as you see um I mean most of this was really real world example so some cutting edge um tech giants do it already right so I'm pretty sure this is coming and let's talk next year again and we will see okay fantastic and if we can't find you then I guess we'll know you're hiding it's because your predictions weren't so good after all but I'm confident we'll see you Kai thank you so much Kai Verna from Confluent