 Welcome back everyone. Once again, I'm Austin Bingham from 60 North up in Norway And I want to present to you Konark Modi who will be talking about near real-time stream processing in Python with Storm and Kafka take away. Hi. Good morning all of you I understand I'm the only one standing between you and the lunch So I'll try and make it interesting for you guys. All right So the title of my talk is designing near real-time stream processing systems now What do we mean by data pipelines and near real-time systems? So everybody of us knows that in real time we are generating a lot of informations as and when we visit websites We browse a lot of applications and stuff like that And we need reliable and quick mechanisms to collect that data Once we are able to collect that data. We need to process and analyze that data as well, right now Systems like Hadoop let you efficiently do that But that's again batch processing system when we talk about batch processing systems We mean that there's going to be high latency of updates in your query results And when you have an application where you want to provide recommendations You want to provide better features to users you want to provide a feedback loop mechanism You want a layer which helps you work with the recent data and update with low latency of update time So that is where the essence of near real-time data pipelines would come How I treat data pipelines is majorly in three parts one is the input that is basically the messaging layer Where you're collecting all your user user data that is coming on your streams Second is the processing layer where you will be processing the recent data Okay, now when I say recent it defines say in n minutes or n days So it's a defined moment of data that you're going to process on your near real-time stream processing data I'm not going to process one months of data on my near real-time processing layer, right? And then when I process that data there has to be a layer where I can present my data in Probably a query will format or a format that my other applications can consume in a feedback loop. So today's talk Is majorly focused on two components one is the messaging layer That's Kafka that I'm going to take in the second part of the talk and before that we're going to talk about Open-source project which was open source by Twitter is now a Apache project known as Apache storm Primarily Apache storm is 50% closure and 50% Java But that does not stop us from using it in Python And that is why I'm here today to present how we can use Tom and its capabilities full-time using Python itself So how I try to structure the talk is Just before that can I have a show of hands? How many of us actually know what storm is? Cool, and how many of us know what Kafka is Right, so we have a majority of people that are still not known to storm in Kafka So I'll just give a brief of what storm is and what Kafka is Given the time frame and the kind of content that I have I might speed up speed up a little bit But do feel free to catch me after the talk and I'll be happy to show you demos and other stuff All right, so When we talk about stream processing there are a lot of challenges that you need to tackle But this slide only fits a few of them first of all For me storm the a near real-time processing system is actually an infrastructure that will do a never-ending data processing for me All right, so what I mean by that is it's not like a map reduce job that at a given point of time It will finish right a real-time processing system for me is it data keeps on coming in huge volume It will keep processing it and keep pushing it ahead, right? It will never end for me So that is what I mean by real neat real-time data processing pipelines, right? You have to store messages somewhere before your consumers can consume it You want you might want to replay your messages again. You might want to drop few messages and stuff like that So all these things need to be taken care by your real near real-time processing layers You need to route your messages for example, you're collecting few messages from application one You're collecting few messages from application two you want your all your application one messages to route to one particular worker Application two's message to route to particular two different worker So these are few nuances that revolve around a real-time data pipeline processing system, right? You want to scale for high throughput for example, you design a system that is taking care of thousands of messages today But tomorrow you might scale and you might want to use a system Which will then process about hundred thousands of message per second, right? So you need a design philosophy around your system that takes care of all that When we talk about storm it it takes care of almost all these things for example You use Tom and then you have Already solved a lot of variety of challenges that you can say for example You just want to process all the Twitter feed that you're getting from fire hose and dump it to your DB All right now you in systems like this you need really need to take care of the back pressure for example What if my processing layer dies where do all my messages go right that are coming from the Twitter hose or Finally my persistent layer is down. So how do I store my position layer or how does the Processing layer take care of the layer ahead what happens to those messages, right? So Twitter storm a patch of storm. Let's you handle extremely Broadcase of scenarios for example, you want to do a query on fly and again That has to be distributed across because your data is distributed across or it takes a lot of time, right? So a patch of storm has something known as DRPC topologies that helps you do that as well It is scalable. So when we go in the architecture of storm, you'll understand how what we mean by scalable in terms of a batch of storm So you have different services that do different Tasks in storm and that lets you scale as well when it when we talk about scalability We obviously talk about fault tolerance as well What happens to one of the workers if it goes down what happens to one of the services that coordinates all the workers go Downs as I mentioned earlier. I at any given point of time I do not want my data processing layer to go down, right? One of the components might go down they might come up But in the end, I never want the whole system to crash down for me because that is certainly not feasible It is program like language agnostic. That is how we are able to use Python with the patch of storm It supports all other languages as well So as I mentioned the core is in closure and Java, but it has multi-lang APIs that help you achieve a lot of stuff There are few advanced level of Mechanisms that we are still not able to use it Python, but I believe in their roadmap They are planning to open that up using the multi-lang API as well Achieving parallelism across components. So when we understand the various components of storm I'll brief you about how at each level we can configure what kind of parallelism do we need So these are the storm components I have tried and depicted them into two parts one is the conceptual view This conceptual view helps you when you're actually programming Your system for storm and then is the physical view. These are the actual services that will be running on on on your service on your on your systems, right? So first it's spout. Okay, treat spout as an ingestion layer. It's an ingestion component So spout is your interface to the outer world, right? So you write a spout that will basically interface with say Twitter fire hose or a rabbit MQ layer or a Kafka layer or anything like that and then spout basically generates stream Which we're going to talk about so stream is nothing but a unbounded sequence of tuples that basically go to bolts and other other components, right? So here in the example what I've tried and showed is a small Code snippet of spout Now it's doing nothing in the real world you will never come across the scenario because what's it's doing? It's not connecting the outer world It's just generating random sentences within itself and emitting them ahead for the further processing layers, right? So what it's actually doing is it says I want to generate random sentences from the list of sentences that I have and just emit each sentence, right? Streams we've talked about so If you can see in the code snippet what it shows is I'm emitting each sentence over here, right? So stream is non-bounded sequence of tuples so each sentence is part of the stream then So each message in the stream is the emit message that is coming from the spout Now these streams then communicate with bolt. So, okay There's a fine gray line, which you cannot see so basically bolt can have N number of inputs and N number of outputs So when we talk about inputs treat stream as an input to a bolt, okay? So a bolt can take an input from a spout a bolt can take an input from another spout as well a bolt Bolt's output can be input to another bolt a bolt's output can be a persistent store or anything else or a bolt might not just Emit anything it might be like a dead end for your topology, okay? How we knit all these together is what we call topology so in a topology you will have multiple spouts You'll have multiple boats doing a lot of stuff So as I mentioned spout is just a interface for outer world to get data into your data into your Twitter storm, right? Bolts are Essentially your processing layers where you will write your logics you will write your filtrings You will write your aggregation logics and all that stuff. So bolt is essentially that the whole architecture when we knit it together We call it a topology. So to your storm cluster you always submit a topology, right now The flexibility is such that your topology can be a mixture of all technologies All right, so you might write a spout that is written in Java You might write a lot of bolts that are written in Python and that's really helps you a lot because for example in my environment We are a mixture of Java and Python devs, right? So I would obviously not rewrite something that is already written in Java or those guys might not rewrite what already has been written in Python so defining a topology helps me do that How we define this topology Now there are multiple ways to do that because there are multiple libraries in Python that helps you connect with storm This is an example of stream pass library that I'm going to showcase throughout the talk today So this is actually a closure DSL. That's written that helps me define my Python spout and my bolts So what what is happening over here is I define that I have a word spout So this is actually a bolt which is taking input from word spout. Okay, we'll get into the meaning of what shuffle is and I've told this bolt that the program that you need to need to run when you receive an input is the sentence splitter So what it's doing is it's receiving sentences and it's splitting the sentences into words The next bolt receives the input from sentence splitter, right? Which essentially it's getting the words from there and over here. We've asked it to group by words We'll come to groupings what groupings mean and Then the word count saver is essentially doing nothing It's taking inputs from the previous bolt, which is the word count bolt and then processing it as I mentioned It's a dead end. It's the last bolt in my topology. It need not emit anything You can emit make it emit anything or you need not make it emit anything, right? So this is what's happening in a basic topology So what do we mean by groupings now? so groupings Will help you tackle challenges like you want to group by a particular user and then process its data across one worker Itself because it's a distributed system if your data gets distributed you might have problem doing aggregates, right? So take a case of simple word count example now You're your spout is emitting a lot of words and your bolt is processing them and at the end of it you want counts of Counts by words, right? Now one way to do that is you have a persistent layer use dumping everything over there and that's basically say a red is INCR mechanism So you're pushing words over there and the INCR mechanism will take care of that in in terms of both What I'll do is I'll group them by words on the bolt itself So it makes sure that each word same word goes on to the same task and there I'll be maintaining aggregates for that The other way is shuffle shuffle make sure that each board to see if same number of same number of Streams, but they might shuffle across in terms of they are not grouped by any any fundamental key And there are a lot of other grouping mechanisms also. So the documentation has the complete list with good examples as well Coming back to the physical view In physical view, I'm sorry about the size of the diagram. So on the extreme left, you see nimbus So nimbus if you're coming from a Hadoop world, so nimbus is just like a job tracker What you do with nimbus is you submit jobs to nimbus nimbus make sure that these topologies are then submit to the workers. All right so Nimbus will take care of managing and monitoring your topologies around the cluster when you submit a new topology It takes care of the deployment when a task is assigned or in case of failure of a task It takes care of the reassignment. Now in the second slide, I had mentioned fault tolerance Now here comes the first part if my nimbus goes down my topologies will still keep running What what will get affected is the new topologies will we will not be able to submit the new topologies But the topologies already running will not get affected by that So so even if your nimbus server is down for say to two or three hours, your topology will still run intact, right? Zookeeper is the coordinator between supervisor and the nimbus supervisors are the actual services running on the workers that takes care of the topologies Now nimbus and supervisors communicate via zookeepers zookeeper is nothing It's a it's a cluster configuration tool that helps you manage what cluster stats are so that nimbus and Supervisors can be in sync again if my supervisor goes down So supervisor again has few parts like workers which has tasks and executors So if my supervisor goes down it will make sure that another worker is spawned So in that ways fault tolerance is maintained at each and every layer, right? Coming to the main part how storm and python fit in together first is stream pass. That's the library I'm going to talk about today. It's a library open-sourced by Parsley recently it got open-sourced. We're going to talk about stream pass in depth today stream pass Let's you write your topology. Let's you write spouts and bolts in python But you have to define your topology final definition of topology using a closure DSL The advantage of that is you can then mix multiple languages for example Java spout and python bolts So that is why I prefer writing that in closure DSL But if you want to write anything in pure python, there's a library known as petrol That was open-sourced by air sage last year. So that helps you write complete topology spouts bolts everything in python Now we have another talk on 25th, which I believe is going to talk about integration using jython So Jim is the guy who's going to be talking about that So he has a library called clamps that basically takes care of mixing your Java and your python stuff So that's even an interesting library. So I think you should visit that talk as well and the native Java Project of storm itself lets you write topologies. So but it will not let you define a topology in python. You have to write Your topology in Java. You can write bolts in python There are few implementation where people have written spouts in python, but that or not that's a successful So that's the last option. I would say you you can use Why I love stream pass. So this is basically the architecture diagram of stream pass if colors are visible to you So all the gray part is what is storm cluster, right? Now what stream pass let me do is it lets me create an environment around my whole storm cluster All right, so I am on my dev machine and I'm creating a storm topology, right? Now I'm defining bolts. I'm defining spouts What I'm what I need to do is sparse S-Parse is the utility that comes with the stream pass. All right So when I say S-Parse quick start a particular project, it gives me this beautiful project layout, right now each Folder and each file has a significance for S-Pars. So what it lets me do is it lets me Configure all my workers why I need workers is basically when you submit a storm topology via S-Pars via stream pass It basically does a SSH to each of your worker and deploys your python environment over there So for example, if I'm using something like WebSocket client or Kafka client that means it's external dependency that does not come with native python And this will make sure that it first goes on to the worker node and creates a virtual environment over there Where it will install all my dependencies, right? How I define my dependencies are in one of the files wherein you can say that these are all my dependencies So it's going to be in virtual environments and you create a requirements.txt file over there Right topologies is where you define your closure DSL So topology slash whatever name you want to keep for your topology you define your topology over there This is an example of a topology so it's it's a word count.clj file So what it does is it's tell that I have a spout which is words.py I have a bolt which is word count.py. It's as simple as that what you see in the bottom bottom row is p2 which Essentially defines the parallelism that your bolt needs to perform with Enough of talking. I have built a very small application around this which I would like to showcase so the application is built around Wikipedia edit trends what is happening at Wikipedia whenever somebody edits an article they release the edit log for that So it's real-time What what I've done is there are a lot of tools available to capture that feed from Wikipedia So I've just taken one of the tools it gives me a fine JSON message which looks like this. So basically we're going to trend three matrices out of it First is going to be action So when you edit an article on Wikipedia you can have multiple actions edit delete create update etc It tells you whether the article was edited by a bot or not So is bot false is bot true Whether the user was logged in or not while making the changes or not in case the user is logged in You have a username if not you you get an IP over here, okay? So we'll turn three matrices over here one is going to be action-wise trends So how many deletes how many edits how many updates then it's going to be humans versus bots a graph about bar chart humans versus Bots and last is going to be logged in users versus anonymous users. All right Now I'm not parsing the real-time logs because I were not sure about the internet connectivity So I've taken a subset of logs from From the Wikipedia feed. I have dumped it on to the Kafka Layer and I'm going to use Kafka for that after this we're going to come to what Kafka is all about. All right So what I did was I just did sparse quick start Wikipedia edit logs underscore trains It gave me that beautiful project layout that I was talking about After that, this is the product layout that I was talking about all the files are there Now what I do is I define a very simple Kafka spot. I mean there's nothing Kafka about it It's simple pub sub that I'm using currently over here What it's doing is it's reading a particular topic and getting all the log files and emitting that message, right? So when we say message over here, it's essentially a Jason log that's getting generated Now this bolt basically parses my Jason. Okay, so what it does is It checks what kind of action there is it checks whether the bot is human or not it checks whether users logged in or not and Stream pass let's me emit in form of batches So I would say emit underscore many and list of words is and is a list of list. So basically one output of this would be Human g2 underscore edit comma logged in or not logged in. Okay Now I'm appending keys like g2 and g3 because I'm not that good with JavaScript So that helps me at the web interface part of it. All right, so please bear bear with me for that So this is how I counted now whatever what I've done over here is I have a web socket server running when I count that I Dump the whole dictionary onto my web socket and that lets me plot my D3 graphs on the web interface So that's the simple final word counter board that I have for this particular example This is the DSL over here I'm saying you have to use Kafka spout then forward the outputs of the car first out to the past Jason and Once the past Jason has done his task past its output to the count port. All right So this is what it looks like after parsing. So this is a real-time parsing that is going on So this is the final output on my web socket server that is coming in and these are the graphs that I was talking about So first one is it's create edit and I think last one is delete Then you have bot versus humans and finally you'll have logged in versus not logged in users So the whole point of this demo is that it's pretty easy to get started with storm It's very easy to write your spouts and bolts and finally you have a real-time application that is coming in and it does not take much of Your task to manage the cluster as well one of the one of the headaches that come with systems like Hadoop is you have to do a lot of Cluster management with that but with storm systems like storm you do not really have to do a lot of cluster management And because we know that once a topology is running it will never stop how it will only stop is I have to manually kill that topology Otherwise it will always be in the running mode So as as in when the logs are coming they will get get parsed and that is what the meaning of real-time data pipelines at least for me is So this is how you run it. So when you do a sparse run It starts a local instance for your storm cluster to test your topology right there on in system Once you've tested it and you're sure that this works pretty fine for you You can do a sparse submit and name your topology what it will do is Now all the topologies get submitted only to the nimbus nimbus takes care of shipping your Code to the workers But in case of s-pars first it will SSH on each of your worker nodes because we need to create a virtual environment for python That basically takes care of all your dependency So that's an extra step that s-pars will do otherwise It's similar if you submit a Java topology or a python topology because it gets shipped to your nimbus via thrift Coming to Kafka Why do we need systems like Kafka? I mean we've got we've already got so many messaging layer around we've got a lot of databases We've got a lot of caching solutions around so where does Kafka Kafka fit in and what is Kafka? So for me Kafka is a high throughput distributed persistent messaging system that takes care of pubs up as well That takes care of high throughput that takes care of distributed storing off my data as well Now in the earlier challenges that I were mentioning I the first point and made was I need a robust message layer that Can that takes care of the messages until they are consumed all right and also once they are consumed I still want to retain those messages because there might be other consumers that want to consume them So when we're talking about a distributed processing layer there might be n number of consumers that want to consume the same message Right, so consider a topic for example wikipedia edit trends topic now One consumer is reading those topics and there might be another consumer that is dumping all those log files to my Hadoop layer Right, so what you need is you want multiple consumptions of the same messages You don't want all your messages to be on one box for say failover and high availability of a data You want your data to be partitioned across multiple clusters So Kafka would let you do all that stuff along with that it lets you maintain high throughput So you can have high number of rights and high number of freeds How it lets you do that is because Kafka traditionally is basically a file file system oriented queue So it persists everything on disk all right and by design it persists everything on disk It's not an add-on feature that is enabled disabled by default and you switch it on to persist message on the on the hard disk So basically Kafka will store everything on the hard disk and let you retrieve that by a partitions and various brokers that you have These are the few important concepts for Kafka if you're getting into Kafka This is how these are the basic things that you need to take care of clusters So cluster is basically set of brokers now you can have one broker Single node single broker you can have single node multiple brokers you can have multi node multiple brokers It totally depends how you want to configure your Kafka cluster, but Kafka lets you do that you even if you have say You are a multi data center environment Kafka lets you replay your data across data centers as well something which is known as Kafka mirroring Topics so topic everybody coming from a message queue background No topic is just a layer which lets you group your messages together now the advantage of Kafka is I have multiple consumers reading the same topic Now each consumer can start from a different point and read to a different point. It's not like Say I have read a red n messages from one consumer So each consumer in that layer would read from n plus one point that is what is known as offsets offsets In Kafka so along with zookeeper you can maintain your offsets wherein you say this topic has been read until this point So it's simple like a tail minus F that you do on a log file Partition is basically you use it for application and you can partition your topics across multiple clusters So it's basically how I how I treat it as rethinking of how logs work So basically in your environment don't treat it as logs treated as a log centric environment wherein your logs will be used by multiple services for example your For example, you're pumping logs now one application is processing for real real-time analysis And the application processing it for probably say a daily analysis you're persisting that to her do so It's the same log, but it's been getting used at a lot of different at a lot of different layers How we can use Kafka with Python there are currently two Two implementations to that one is Kafka Python That's the that's the first module in the community and the other one is samsa. That's again open-sourced By the guys at parsley I'm not sure what version does samsa work on but Kafka Python works on the latest version of Kafka as well I'm not very sure of samsa. I haven't tried that out Kafka plus Tom really makes a near real data pipeline a very robust and a fault tolerant mechanism for you So if you're looking forward towards implementing that you should give it a try both of the systems work independently So you know you are not It's not a mandate to use Kafka with storm or storm with Kafka both of them work pretty much independently So you are free to use any one of them There are a lot of advanced features of storm as well. For example, drpcs topologies transactional topologies tried in topologies now When when you're writing a near near real-time data processing pipeline You have to make sure that each of your message messages get processed Otherwise at the end of it your counts will start to differ right now when we say each of the message gets processed once We tend to forget that it's not exactly once right so there are two implementation to this I will make sure that my messages will get processed at least once and exactly once right so storm Let's you handle both of these things in the basic implementation of storm topology that we saw it will only achieve you And let you achieve the first part exactly once all right because it comes with a reliability API So what reliability API does is let it lets you acknowledge that whether you have processed that couple or not If you've not processed a tuple at any layer during a topology it will replay that particular tuple all right What we mean by exactly once is something that you need to figure out with transactional topologies and tried in topologies That I won't only want to process this tuple it exactly once right so there where there's where transactional topologies come Drpc is supported by multi-lang API, but in the current implementation of the multi-lang API Trident topologies are not supported so probably in future they have that in roadmap But I'm not sure of when they release it for multi-lang APIs These are the resources that I usually follow and these are pretty good. The first one is the official github wrapper for the stream pass library Currently they're working Extensively on that and they have few releases ahead as well But in the current system also it lets you manage your whole storm cluster pretty well For example, you want to ship your logs from various workers and see where your bowls are Fultering you can do that from that you can submit your topologies as we just saw onto your storm cluster as well Then you have Kafka and Python libraries official documentations are pretty good for both the system So it's a it's a must treat for all of you and if you want to understand how a storm parallelism works There is a block by Michael Nell. It's it's one of the best blocks that helps you understand how parallelism works works in storm Just like a topology we can continue the discussion forever But I'll put a stop over here and open up open it up for questions and just before I forget I Forgot to show you the storm UI that comes with it So that's basically a storm UI that helps you keep track of how your cluster is behaving. So I Submitted a topology Okay, so that's Okay, so my topology was Wikipedia underscore trains. That's been running since not one minute. Let me refresh that So it tells you the uptime of your topology how your others other parts are behaving on the cluster. Yeah So it's been up from 50 minutes If you click on that you would see the different spouse and different bowls that you have in this So you have spouts and you have balls So I had one spout and I had two bowls count and split so it gives me how many messages are being emitted and stuff Why there's a difference between the total emitted and Okay, yeah, why there's a difference between emitted and transfer because one of my bowls does not emit messages It's it's a simple web socket push Right. Thank you Yes, it's still the case Yes, so the question is an earlier version of storm to update a topology. We had to restart the topology again It's it's the question. So you still yes, you have to do that There are multiple design challenges around that which does not let you do that for example Now depends what kind of change you're doing for example if you're doing simple parallelism counts change and stuff like that Now it depends how your tasks will then we get distributed, right? So you have to bring a topology down There is there is a feature request in it which is known as swamp I think so that lets you do storm swamp So that will help you take care of this part wherein you define your parallelism on the fly and you just do a storm Storm swarm that will basically rebalance all all your cluster in terms of the new parallelism that you've defined But I was reading an interesting mail chain on the mailing list itself that had other side to it that why is it very difficult to To create systems where you can change them on the fly and then let them behave the same way because your counts your aggregates How your messages have been grouped across different workers really get affected by that? So I don't think that's that's coming in one of the near future releases for that You mentioned the reliability guarantee So the question is around reliability API Supposing the question is we've written a topology in Python and one of the Components start failing. So what happens to the whole data pipeline, right? Now there are two parts to it. First of all, I Can probably go back to the code snippet So this is how you define acknowledgement in failures in your system So for example failure might not might be because of a component failing or failure might be because of a business logic failing Failing right so in any of the case I can explicitly fail my tuple and that will get get then replayed But yes, if one of my workers go down the supervisor will make sure to restart the workers If for example, say I'm using a module that is not available on one of my workers Stream powers will not let me submit that topology itself So this is how that will take care of that in case of failing of each in message individually The reliability API explicitly lets you do that to acknowledge or to fail either of the messages Both on cases of business logic or any other scenarios that you have The question is how we can scale the machines by adding more number of machines and increase the throughput for the same So supervisor is what runs your workers, right? So all you need to do is add more machines and start running supervisor over it It will communicate with the nimbus and then how you will scale your topology Yes, you have to that that's the second one is coming You'll you have to restart but you and it will not take care of the dynamic loads For example, you know that your data load is going to get increased so automatically I would like to scale my storm cluster storm does not come with with that facility But it's very easy in terms to scale because all you have to do is start a new worker worker around it And all your tasks will then get redistributed Yes, so essentially if you do a ps-ef on on your system and grep for jvm's you would see different jvm's running on it So essentially what is happening is each supervisor runs a set of workers So each topology will have set of workers which in term have executors and threads So each one of them will have a different process. That is how you guarantee fault tolerance Sorry across multiple systems. So one of your components fail. You can still the other components will still run for you So that is how it happens inside Inside a topology. So, okay. So the question is how is Twitter storm compared with different frameworks that are available in the market So samsa is one of the products that is open sourced by LinkedIn itself So linking for its near real-time does a lot of samsa then there's one other product open sourced by yahoo Which is s4. I'm not really sure of those products. I've never used it But yes, there has to be some design philosophy That is why there is existence of few different systems that are around There is one that is used by Facebook. I'm not sort of that's open source or not But I've never really given a shot to other one of them because for me The components that Twitter storm list the features that list were pretty strong and the multi-lang API for me is pretty strong as well So those are my preferences for using Apache storm. Yeah, I Can't hear you can Sure, so the question is how does exactly once and at least once processing works So there's something known as the guaranteed processing wherein you will flow all your messages across your topology Once that is done. You are at least processing it once right now Exactly once is what I was talking about in the advanced features of storm where you need to write transactional topologies Which are abstraction abstraction over? Your base basic topologies that lets you achieve exactly once so for at least once all your tuples will flow through the topology That is what storm takes care of by its communication mechanisms So, yes, there are few internals of storm in which it communicates via zero MQ and other other mechanisms I have few architecture diagrams for that. I'll be happy to share that after the talk That basically explain how each message flows through Different workers for example one bolt one or bolt from spout to bolt and what happens when we explicitly fail it and stuff like that So I'll be happy to share that