 Hi everyone, my name is Raghu. I work as a principal architect at Flipkart. I work on with engineering cream that delivers any user facing services, which includes the website and the well apps. So I guess you can kind of guess who is to blame if the site goes down. And for those who are interested, I can hang around after this talk and we'll talk about exactly what went wrong when people try to buy precisely, I think it's this phone. So it's always great to be here at Fifth Elephant. This is my third year in the running. I was here last year to talk about anyone from last year. We will be open source one of our systems called the Flipkart Phantom. So it was essentially a fault resilience proxy that we use on our website. And you'll see me talking about some of it how we use it today as well. And the year before last I came here to talk about Adar, the principal architect of Adar from 2010. So today's talk is about Facebook style notifications using HMAS and even streams. Now the intent or the motivation for this talk is essentially to share some of our learnings and building a real-time or near real-time notification system, which hopefully others in this audience would also like to build and sharing some of the learnings, patterns and technologies that we use here. Now before I go onto the talk, how many HMAS users here? Okay, and how many using HMAS to serve real-time user interactive traffic? Why? We always known HMAS as a columnar data store, so the data is stored together so you can do a lot of analytics. Yes, right? So in this talk, I'm going to talk about how you can use HMAS as one of the technology elements in serving user facing traffic. And probably as we go along, you'll probably you might have even experienced this feature at Flipkart. So the first one, it's about serving user intent, right? The first the top one is screen grab from an email that I got from Flipkart. Now, I don't think we have any of our marketing folks here. What are we thinking when you send such an email, right? It says stock up on daily essentials get started. Yeah, so what am I telling you? Say Flipkart is an e-commerce company, buy stuff from us very low on intent. There's pretty much nothing we know about the user. We probably got your email when you browse store you registered with us. Let's get to the second example, which is again another email that came from that said you can buy school supplies at a certain percentage of and only for today. So it is time bound improved relevance and it is derived possibly from also a category of MIDI and recommendations, right? So it's becoming more relevant to to a parent probably who's around the school season. Let's look at user intent in a completely different context. Say social, right? Facebook Facebook likes and some some of our friends here. We know that even Mintra has a tagline and seeing living for likes. Am I right? Did I get that right? Live for likes. Yeah, so a very so what's so nice about this likes, right? So Facebook you put up a picture or a page and you just stay on the page. Why you're just waiting for someone to hit a like on it or comment on it, right? There are two reasons why this is of multiple reasons. The two main callouts are it's quick updates from friends, actions on things that most affect you, right? So it's very personalized networks. It's very relevant and what the person is trying to do personalized and the an interesting thing is it's non basis, right? So you have this notification that comes up in in in Facebook. So you can consume it at when you want to do it. So it's not like staring in your face like an email, right? So you the advantage with this is it's real time when you're in the app and it can be pushed to you if you're using the let's say a mobile app and you can consume it at at your own leisure and what's so good for a company like Facebook would be that they can actually get or for that matter even for someone like Flipkart you can actually get more spam here, right? You can probably push lots more content than what probably the user would have seen if you sent him this case 134 emails. Nobody's going to probably even read that. Now, how can you apply something like this to the Flipkart context? So I'm going to talk about this in app notifications. Anyone's used this feature on Flipkart. You add an item to your wish list. You browse a product on Flipkart and a price drop happens. You're notified, right? Now, how does this drive intent? I was talking to another friend of mine. He says he doesn't buy products. He just adds it to wish to his wish list and waits for the notification to say when there's a price drop, right? So now we are engaging with the customer on one dimension which is what is of interest to him. Now user intent here can be derived by just someone getting to a project product page, right? You search for a Samsung Galaxy S3. You just went on to the product page. You're logged in. You know that look you like this product or you are somehow interested in this product stronger intent. You add it to your wish list. You say that someday I might want to buy this stronger still stronger intent you added it to your cart. You still not decided to buy these are the places where probably we sending your notification might make you make a purchase, right? Or we could also do it like after your you have done. So this is where it gets more. We are trying to push content and you try to post by you're trying to solicit a review. You're trying to tell them that it's out for delivery and so on and so forth, right? Now the real value for this is when this can be real time and when I say real time you're on to Flipkart. You browse the product or you add it to wish list. You're still hanging around on Flipkart and there is a seller out there who's probably you know got a new lot of Samsung S3 phones and he reduces the price. You're there on Flipkart. You get notified saying that hey, there's a price drop. Right? So the value is in really making this as real time as possible. So what is a price drop? Let's say notification. We need to build this up to save some of the design principles that really we applied is essentially you see a user sees a product at a certain certain price at a point in time and then he come and then the on one dimension the price is changing and the user comes back up or he's probably connected and now it's available at a different point in time, right? So all of these are important transitions that have happened. Now one way to generate this quite easy. So let's say you every time the user connects to Flipkart and he's browsing or adding items to wish list. You gather the user intent when he visits the next time. You probably go and retrieve all the products that he has ever browsed or added to his wish list or his card and then go and query the the product data store to say has there been price changes, right? Now try to do this so a product catalog today is probably about 30 million and maybe we have updates of 5 or 6 million updates happening through the day and again the number of product page views that a user would be seeing or the number of times he's logging in would be in the order of a few millions. So imagine running such a query again and again because and if you're trying to do this if you want it to be real time, then it means you're constantly querying the same database, right? Fetch all of his tensor products, go to the product database, see the price computer notification, try to show it right? What's the problem with that? It might seem that it's perceived optimal. You're just querying the data when you want it, right? So that should be in basically in the kind of compute that you want to spend the cons are the gathering processing and serving of data is coupled at the big hit that you'll have on this is the latency that you're going to be able to serve the data to use, right? And when he's probably logging on to Flipkart, maybe he's going to spend a couple of seconds on the home page. If you don't show the notification, he's probably navigated away from you, right? So you want that to be like extremely fast. Read path here is computationally expensive. High latency. We spoke about it. Importantly, also need versioning support of product data because everyone has seen it at different points in time, right? So how do you get that? Right? And another thing here is repeated computations, right? For a really popular product, there might be hundreds of people who have seen the product. So everyone gets onto the page. You're going and computing this again and again and again, right? Now, how can this change? So if you really look at it, the data here is it's not really a user dimension. It is actually a product dimension, right? All you're saying is look independently, there is this this user came saw the product. He probably added it to his list something happens and then there are these product price changes that are happening because a seller from Bombay got a new lot and he went and updated the stock. It has got nothing to do with the user or his act of seeing it at a point in time, right? So it's not really. So if you see the data so the consumer is yes, the end user, but if you see the data dimension, it's actually a product dimension. So flip it around and say that hey, why don't you instead compute these notifications as changes are happening in the system and just serve them out when the user comes onto the site? You suddenly it changes a lot of things, right? Then when we set out to build us, we said that look it has to be extremely fast on the serving part, right? We want low latencies when the user is trying to retrieve the data and and this relevant, right? So let's look at what what it looked like if we pre-created real-time and serve on demand. So what leads more? I mean, it's more of a step back thing. What leads to a notification, right? So you have an intent even stream. So when I say intent it is users of browsing adding to wishlist adding to cart. You have a number of these guys who are doing it, right? So you're browsing and then you have a change even stream, which is all of these changes that are happening on the product in our in our supply chain systems in our inventory systems and so on. What really notifications is about is an intersection of these two streams, right? So that's what I meant by saying that look, it's a it's a product dimension and not so much a user dimension, right? So what you need to do therefore is an intersection of millions of user event intent and product changes with me on it. So what do we do? So we say that hey, we want to have an intent capturing system so you capture the intent in some way we'll come to specifics. You write it to a store, which is an intense store and then you have an event processing matching system which is also receiving product changes and you go and write it to the store. Now here there's something called append. So this brings an interesting construct, right? Here every event is immutable, right? The fact that you saw a product at a point in time is an event that occurred which is immutable. So borrowing a term from data warehousing, it's a fact it occurred, right? So is a price change or so is any intent. So look at so let's say you get to able to you have the ability to accumulate all of these these events or these facts in a in a store and then be able to you know match them and create the intersection. Once you have the notifications, you can then serve out the notifications using a pull push a finite delivery mechanism to be a web email a user on an app, you know we want it. So patterns here a couple of patterns introducing here. So we use a stage event driven architecture and a variant of the event driven architecture. Why is this because the event stream. So given that you have two streams that are coming in, they could be coming in at very different rates and you don't want all of them to be consuming at the same rate and it just suddenly means that when there's a spike during the day when users are browsing the pages and your product changes are happening maybe at night. You're not trying to get both of them to process at the same time, right? So I said our whatever also allows throttling and allows consumption in a certain in a control balance. The other part where we're talking about intersection is really a case for doing complex event process. What that means is so there might be thousands of users who are going and viewing the product, right? What is of interest is hey, this person saw this product at this particular price. But the fact remains that there are maybe hundreds others who have seen the same product, right? So with respect to listening to changes on the product, you know, suddenly or your scope narrows down to the top few products, right? So if you take a typical case while we might have just taking an example 5 million product page views in a day, the number of products that are probably of interest on that day will probably be a few thousands, right? Because it is you're creating an intersection. So what this the CEP does is allows us to essentially query the event stream where we say I'm interested in the event stream only for changes in products which is a unique is a set from from all of the intents that happened for the day, right? So and and those of you who know CP engines are used to it will be able to relate to it. So it really flips. So you have a database is you have data address then you execute the query here. The data is in motion and then the queries address regular basically applying a query on a string that's going. And so let's move on to the data store. So what would help here, right? So we want us to a large six of data for products users tens of millions activity, which is whether he's doing a browser of you know, add to cart and events per day. Again, some of this tens of millions and notifications which depending on when the user is going to come and consume it, it could be in the order of maybe even up to a hundred million. Why is that so? It's kind of maybe you've got on to flip card. You're coming back after your your real time. There are some users who want to see it right away. There are guys who probably come back a week later a month later and ideally you still want to show him like saying that you know what this product is may be available at a reduced price. So it means that the notifications you need to keep it for a very very long time kind of like your timeline of Twitter, right? You go to Twitter you see tweets latest tweets from people who are there who you're following. So it's a data store that needs to give very high right throughput high read throughput for sets of data. So it means how is that it's both intent and also facts, right? So read throughput is when they're trying to match both these streams. So you got a product price change. You need to look who are all the users who expressed interest on this product, right? So you need to be able to query that data store saying that giving all users who have expressed interest on this product, right? So you need to be at because we're doing saying that whenever there's a price update happening, you want to go and determine this compute see if there is a change from what the users saw compute computer notification possibly for him and then store that data. So you're trying to look at high right throughput how he read throughput and you also want finally can you do high low latency reads on the user path when the user comes to flip card the next time, right? So if you're logged in and you you see the notification cell and you you click on it and and you see the data, right? So what can be suitable here? So we went to basics. We said, hey, let's look at HBase. Okay, now the characteristics of the database itself is yes, it's a columnar store, but the way data is organized is quite interesting. Data is always organized in a sorted order of the row keys, right? And that data is stored together. So which means and it also has some sense of a cache and we said that look, we could probably get the best out of a data store in terms of read throughput or read latency if we can hit memory, right? So serving data from memory is probably the fastest and then if you have to disk hit this, you're better off probably operating at this transfer rates than at the secrets, right? So if you're doing a lot of this seeks, you're going to be very slow, but if let's say you do a seek and you're able to stream out a lot of data together, probably that's the next best case as compared to memory, right? And all of the data that we're storing here, you'll see that none of them are a signal lookup. All of them are the easiest example is look at your inbox. When you get to an inbox, you always see one email, probably see 10s of emails ordered by time, right? You might even group it elsewhere, but what you're always looking at is a set of data together. So you want to optimize for querying a set of data or a range of data together, right? And those of us who used H base here would know that it's efficient for rain scans. You agree? Why is that efficient for rain? I mean, you need to work a little bit around it. So the fact that it is an LSM tree, a log structure merge tree the chances of keeping data related data together. So the way we construct even a rookie is something like we say user ID. So the key is kind of made of user ID, a reverse timestamp and also the product there. Okay. So when you're trying to query for a user, you're always saying, give me notifications for this user and and you want reverse time spent because you're always interested in the most latest data upfront, right? Now the way this goes on to the disk is when you're writing this data onto H base that other because that tells you that first, of course, it writes the writer head log and it also writes it to a mem store and only after the mem store it grows beyond a certain size. It's gets written into a h5, right? Similarly on the read path, you have something called the block cache. So data is surfaced into the block cache and then you read off it, right? So if you can try to leverage this these memory stores and if you have to hit this imagine hitting the disk where you probably you did a disc seek you're here now. Now all of this data can operate at transfer rates, right? So the cost of the seek essentially is amortized across the various transfer that's happening, right? So that's the approach we really took with saying that let's look at H base. So the tic stack in order to realize all of this so intent capturing system. So we have so for all intent that's happening via Flipkart. So if you're browsing the product we capture it using so we have this proxy the topic of my last dog phantom which is a reverse proxy. So we have code in the reverse proxy which goes and keeps upending and capturing this intent essentially keeps getting written out. We also have some batch based system because it's trying to pull up data like wish list and so on which is not we're okay with certain latency. All of this feeds data as up and intense into H base on the product changes. We have a system which when in our back end, you know, in a plenty planning systems and so on whereas in system whenever changes happen, these keep emitting events. So we listen to those events. We have the event processing built on top of SEDA and you know, S per CEP using RabbitMQ for messaging this this I'll tell you why it is very low latency right because bulk of the work is actually happening here. It gets upended into the H base store and this data. So even if you see this complex of the CEP system, we probably today run it at Flipkart scale on just one VM which has probably 4 cores on right. You're not crossing a whole lot because you're trying to do an intersection which is an in memory of the all products that are being held and then you're writing the data out in probably two different dimensions into H base and and when you're trying to do the intersection, you're querying H base again resulting in range and all of your work is happening on the H base cluster. Okay, so once the notifications are created, they're ready the serving path. So we have a lot of other things that go into place. So we first is while we want to have low latency reads, we don't want just because H base is slow or you know, we screwed up from where that we don't want user experience to suffer. So all of the fetches happen through phantom. So the query retrieval happens to that when we want to push it to a device, it goes through Flipkart which is a notifications broadcast system Tomcat simply serving the say get notification for a user. Okay, and even at that scale. So we probably run it on two boxes. It is only doing IO is doing a H base read and sending it back right. So here interesting. We also use memcached and CD and why because the notification data is you say, this is the product ID is the last price that you saw and this is the new price, right only this data is coming from from H base whereas other details like what is the product description? What is its image image your best of serving from a CDN, right? So that comes from a CDN the product description comes from memcached because a B3 look up for just that set of 20 products is probably going to be the fastest, right? You don't want to keep complicating that data again and again, right? So a combination of these three systems are on the serving path. So a tech stack the good thing is you can build out a very similar system. All of it is open source either public domain when I say public domain public domain non flip card or in flip card we are born in open source, right? Phantom is open source Cooper batch said I is open source flip cast is open source again rapid mqs per and in this interestingly you can replace some of this with your you know your pet technology. Like that, right? Here you can always someone can say that oh, can I use Cassandra here instead of H base? It all means right? So instead of Tomcat, can I use JT? Yes, so a private MQ want to use Kafka? Yes, but I don't think Kafka it's just an overkill to use Kafka for something like this, right at this and I'm talking about a scale where I'm like Wi-Fi is down but as I've shown you the live the only part here which is close source is the serics system but that is very relevant to ours flip card context for create target group generation but otherwise the rest of it is you know use them now for operating such a system and keep it running you need a lot of system that can aid in that right? So one is when we launch it we have a standard AB testing framework but a lot of user-pacing systems would know what an AB is right? So you can always say 50% of my users are viewing notifications that is the standard are phantom dashboards where if you see the query serving part we are okay at this point the cluster was doing about 40 requests second and we are getting median of I think I was just checking before I got in here it's all about 10ms so 10 millisecond rates from H base okay so you can if it is used well you can use it even for serving data in real time a few more dashboards around keeping how your data ingestion is happening and these are other dashboards which you need for determining you know the business impact the point I'm trying to make is for operating such a system you still when you want to run it on days on end you need to have enough metrics and alerting to say how this is going to work and impact users so users user reaction people like it people are useless to buy products discover products use it a lot and this last one I would still want to call out saying that somebody received a notification they went and looked at it they saw that it was a high price that we're trying to break them that leads us to what you can have as a problem in the system which is about consistency challenges I'll come to that in a bit some of the pros right so it's no very low latency read path resilience to failure we are okay to not show notification but not bring down the site we have other things to bring down the site scales well because LSM trees key value stores CDN for images and a mark difference is the thing of immutable facts right using immutable data stored in an up and only database which allows the ability to recompute data so not a few who also use real-time analytics and then some of even the newer of the programming to talk about immutable data structures right very useful construct so consistency challenges H ways itself is a consistent data store but the problem we had us when this instant happened before the person could come on to the website probably the product got sold out because others came and bought it and we couldn't generate notification enough to go and invalidate what we had created for him right so there is an eventual consistency problem that resulted in it but yeah but those are cases maybe we can live with and then there's this thing of you know there's a perceived pre-creating notifications you pre-create notifications even though the user may not even ever come back to you but the cost of storage is so cheap that it doesn't even figure right we don't even have a dedicated cluster it runs off our shared H base cluster some references H base the definitive guide so that's an ebook that you can where pick something on block cache and very interestingly the last thing about Facebook messages right some of you might know that Facebook uses H base for all of your messaging so that kind of led us to even start thinking saying why are these guys using H base on the heat path and explode and then once you've got to understand it a little better that really was the seat for us to really start thinking about such a what was otherwise a column that it has to try to serve data for online graph some of the other open source projects that you can leverage yeah that I'm done before we go to questions just a little announcement there's think all the flash stocks that's going to happen in auditorium two at two o'clock we're giving you all an opportunity to talk for five minutes on something you're very passionate about and we're going to get an audience from this community to auditorium number two at two o'clock we've been making announcements but if you want to speak kindly put your name and the topic of what you're going to talk about for five minutes and come and hand it over to me at the end of the session now we're going to go over to questions we have exactly eight minutes questions please put your hands in the air if you have a question there's a question on the screen I don't think so there's a question on the screen which yeah I'm answering that okay yeah I was able to connect and someone just connected back okay I get oh yeah here so you can see I was telling you about the low latency reads right so this from a production thing so the median is about 8ms yeah 8ms reads at about 70 QPS right so so that and there's nothing else in this path it's definitely hitting so if that works for you what's that when is the next lot of time we're coming okay so any other questions that pertaining to this talk and presentation yes all the way at the back black t-shirt yes new status I choose to stay away two things Sandeep Mukhopad and Agni Software Pune two things that you didn't talked about is one thing is about the schema the user schema how do you store that at that is really important another thing is that you didn't talk about the triggers why didn't you use triggers right so I heard something that there is a piece that are triggers and let's consider you have for a catalog you put a trigger there and then you change the user data can you do that if if you can do that why didn't use that sure so first is on the schema is a good question so when when we write intent or the fact data we actually write it into two dimensions one is the user dimension rather is a product dimension that's because if you go go back to how I said H base organizers data right so it is always in a in a naturally sorted order so when you are sometimes trying to query this data from a user dimension you still want data to be together and there are instances when you're trying to query it from the product dimension you still want data to be together like this of course it's duplication of data but like I said this is so cheap that it's it's okay to go and store them as duplicates right so not like a convention so don't use H base like a as an rdbms to understand what it can do and sometimes this kind of you know duplicated data is perfectly all right so what and in terms of a schema pretty straightforward because it's it's almost looks like a you know a sparse hash map so we use use a single column family and we have pretty much all of the data that we want to store stored in one or more columns so I don't think that is very I mean we couldn't have done too bad either ways if we had made any choices there but largely we do duplicate data because when we want reinscans to happen we want reinscans to hit lot of blocks of data together. Okay co-processors and triggers now co-process fire when you're when you update data but here we also want to know what we want to update so co-process wouldn't have helped us. Okay, so when you said triggers I mean for that so do you okay, so we want to do a lot of queries can query so if there's any way of doing more efficient scan queries. Yes, we're going to go to another question now. Okay, so on the web. Can you just repeat this question please? If you don't know. Yeah, so his question was do we push data to the user using web sockets on the on the you know on the desktop. The answer is no what we do is we just have a long pole. So these are just Ajax calls that are refreshing the data whereas because that pipeline exists we can also use the same pipeline with the whole setup pipeline to just push the notification to flip cast and then cast would even do a push notification to the end user so you can use the same pipeline to either push or yeah, so the resources on our side the fact that we protect our back into through phantom. So if there are queries that are going to take a lot of time and we also use the bulkhead pattern. So if you know his tricks has this thing called so phantom uses his tricks and his tricks has this thing of bulkheads right? So the number of threads that are at any point in time dedicated for just fetching this data are are limited and that way we have a way of containing it to a very large extent though we very rarely see those issues happening because by then the data is very often in the block cash and we'll be able to get quite fast. Our next question please put your hand in the air if you have a question green shirt just just hold on so we're going to give you a microphone. How do you process multiple notifications for the same product? Say let's say after some time there is another product price drop or price drop just cancelled out you we do go and invalidate that so the thing that we do is that's why every every every event every fact is stored in its pristine form so that if required a we use it to to compute and invalidate the the notifications plus because it is stored in its in its you know access form any corrections we can go and rerun that and and really recreate the data right? So that allows us to do that we're going to take one more question all the way in the front white shirt one was I think when you are fresh in the data you hold the mic a little closer that was sorry so I think we are talking of one is the user initiated notifications and the so we're saying that if for instance rock initiated notifications expire and price goes up there has to be a way for let's say the database to refresh the data we will again so every price change on the catalog system comes to us so the catalog system doesn't know whether you're you're you're you're stored any notification just says say at this it's like your stock price right goes up goes down and that so it's just that someone is listening to the street. Okay, so essentially you have a way of sort of filtering out that so that there's no false. Let's say someone wish list at a particular price and the price goes up he's not notified when he comes. Yes, you know those are the rules when it is finally when you apply the price change. So in this case, so it varies on the attribute that you're handling right so some of it can be stock availability. So in case of stock, it is an increase case of a price. It's a problem. So yeah, the question was related to let's say global low price. So for instance, if you have competing sites for the same product, is there a engine that actually use which actually ensures that you are the lowest price product. So that is outside of notifications per se. So that is our pricing engine pricing crawlers. We do have such a system but I'm not going to talk about that but that doesn't find find its way in this pipeline because that is really used as input to either the retailing team or the sellers to go and adjust their prices accordingly. But yeah, when the idea of the system is when they do get those signals and then they go and make a change want to probably be able to inform your customers fastest. That's the goal of this. Yes, we have a lot of people putting their hands up. I'm really sorry. I'm going to have to cut you off.