 Welcome back everyone. We are on the ground at the data artisans user conference for flink It's called flink forward. We're at the kabuki hotel in lower Pacific Heights in San Francisco the conference kicked off this morning with some great talks by uber and Netflix and we have the privilege of having with us Chinmay somon from uber yes, and Welcome Chinmay. Thank you to have you so You gave a really really interesting presentation About the sort of pipelines that you're building and where spark where where flink fits But you've also you've also said there's a large deployment of spark help us understand How flink became a mainstream technology for you right where it fits and why you chose it sure So about one year about an year back when we're starting to evaluate what technology makes sense for The problem space that we are trying to solve which is the near real-time analytics We observe that spark is actually the spark stream processing is actually more resource intensive than than Some of the other technologies we benchmark. It's a more specifically. It was using more memory and CPU at that time so we That's one and I actually come from the apache samsa world It wasn't the link the same LinkedIn team before before I came to uber so I have we had in-house expertise around samsa and I think the reliability was The the key motivation for choosing samsa So so we started building on top of apache samsa for almost The last one and a half years, but then we hit we hit a scale where samsa we felt was lacking So with samsa, it's actually tied into Kafka a lot and then you need to make sure your Kafka scales In order for the for the stream processing the scale in other words the the Topics, yes, I guess and the partitions of those topics you have to keep the physical Layout of those in mind at that. Yes at that message queue level. Yes in line with the With the stream processing. That's right. Yeah, the parallelism and samsa is actually tied into a number of partitions And Kafka and furthermore if you have a multi-stage pipeline where one stage Processes data and sends output to another stage All these intermediate stage stages today is again goes back to Kafka So if you want to do a lot of these use cases You actually end up creating a lot of Kafka topics right and the IO overhead on your cluster shoots up exponentially Oh, and so it when creating creating topics or Creating consumers that do something and then output to producers if you do too many of those things You defeat the purpose of low latency because you're storing everything Yeah, so it the trade-off is it is more robust because if you suddenly get a spike in your traffic Yeah, your systems is going to handle it because Kafka buffers that spike, right? So it gives you a very reliable platform, but it's not cheap. So that's why we are looking at fling Because in fling you can actually build a multi-stage pipeline and have in memory queues instead of writing back to Kafka So it is fast and it gives you you don't have to create multiple topics for pipeline So all right, let me unpack that just a little bit to be clear the in-memory queues give you obviously better IO yes, and I and If I understand correctly that can absorb some of the back pressure Yeah, so back pressure is interesting right with if you have everything in Kafka and no in-memory queues There is no back pressure because Kafka is a big buffer. Yeah, it just keeps going With in-memory queues there is back pressure and now the question is how do you handle this, right? So going back to Some systems they actually degrade and can't recover once they are in back pressure, but fling as you've seen It's it slows down consuming from Kafka, but once the spike is over once you're over that hill It actually recovers quickly. Okay, right? So it is able to sustain Heavy spikes, okay, so this is so this goes to your issues with keeping up with the growth of data That's right that you know the system there's multiple levels of elasticity and then Resource intensity tell us about that and and the desire to get as many jobs as possible Yeah, a certain level of resource right so today We are a platform where people come in and say here's my code or here's my sequel that I want to run on your platform and They in the old days they were telling us oh, I need 10 gigabytes per container and any these many CPUs And and that really limited how many use cases we onboarded and made our hardware footprint pretty expensive And so we need the pipeline the infrastructure to be really memory efficient because what we have seen is Memories the bottleneck in our world more so than CPU Because a lot of applications they consume from Kafka They actually buffer locally in each container and then they do that in the local memory in the JVM memory So we need the memory component to be very efficient and that gives us You know you can pack more jobs on the same cluster if everyone is using lesser memory Okay, that's one motivation. The other thing that for example Fling does and sounds also does is make use of a Rocks DB store, which is a local persistent Oh, that's where it gets the state management. That's right. So you can offload from memory on to the disk Into a proper data into proper database and and you don't have to cross the network to do that because it's sitting locally But that's just to elaborate on what might be a kind of a techie what might seem like a arcane topic if it's residing locally then Anything it it's gonna join with has to also be that's residing local. That's a that's a good point You have to be able to partition your inputs and your state in the same in the same way Otherwise, there's no locality. Okay, and you'd have to shuffle stuff around the network And more than that you need to be able to recover if something happens Because there's no replication to this state, right? If a node with the hard disk on that yarn node crashes you need to Recreate that cache somewhere. So either you go back and read from Kafka or you store that that cache somewhere So fling actually supports this out of the box and it snapshots the Rocks DB state and HDFS got it Okay, so that's more resilient. Yes, and more space resource efficient. That's right. All right. So let me ask one last question Mainstream enterprises they or at least the very largest ones have been trying to wrestle You know their arms around some open source projects, you know very innovative that the pace of innovation is huge But it demands the skill sets that seem to be most resident in large consumer internet companies That's right. What advice do you have for them where they aspire to use the same technologies that you're talking about to achieve You know build new era systems. Yeah, but they might not have the skills, right? So that's that's a very good question. I'll try to answer in the way I can I think the first thing to do is understand your scale, right? Even if you're a big Large banking corporation you need to understand. Where do you fit in in your in the industry ecosystem? Yeah, if it turns out, you know, the scale isn't that big and you're using it for internal analytics Then then you can just pick the off-the-shelf Pipelines and make it work. So for example, if you don't care about multi-tenancy, right? If you don't if your hardware spend is not that much actually anything might actually work, right? The real challenge is when You pick a technology and make it work for a large set of use cases and you want to optimize for cost And that that's where you need a huge engineering organization, right? So it's in simpler words If if your use cases extent is not that big Pick something which has a lot of support from the community and then Most more common things just work out of the box and that's good enough, but If you do need if you're doing a lot of complicated thing like machine real-time machine learning or if your scale is in Billions of messages per day or terabytes of data per day then you really need to Make a choice whether you invest in an engineering organization that can really You know understand these use cases or you go to companies like, you know Databricks to get a support from from data bricks or maybe a cloud vendor or a cloud vendor or things like confluent which is giving You know Kafka support and then things like that. So it's I don't think there's one answer If to me our use case for example the way the reason we choose To build an engineering organization around it because our use cases were immensely complicated And not really seen before so we had to invest in this technology All right, Chinmay. We're gonna leave it on that and hopefully keep the dialogue going sure offline so We'll be back shortly. We're at flink forward the data artisan's user conference for flink. We're on the ground at Kabuki hotel in downtown San Francisco, and we'll be right back