 Cool so good morning everyone. Yeah, I underestimated Bombay traffic earlier, but it's alright. We all are here on time So cool. I am Pushpesh. I am a co-founder and CTO at Moonfrog Some of you might have heard some of you most probably have not heard who we are So I'll go through that but what I want to especially talk about here is our journey in the early stages of a being a startup Okay, some of you might be already be founders or working in a startup or might want to do a startup Or might want to build a new product from scratch at some point in time, right? And we all run into this we are building a consumer product We want to understand how users are behaving what's happening in the product where are the drops where are Different to this going what features are working not working and then analytics or data platform We have been focusing on building the product or the application but not the data analytics platform So this is our journey how we did it early on like from the day one itself because we knew that okay We have to build multiple products So we will need data analytics again and again all the time on day one of the product launch So this are these are few steps few guidelines that we think that okay one should follow and generally keep in mind Again, this is our journey. You can obviously take your own path, but these are just guidelines to share from our learnings and kind of mistakes that will make So just to give a brief idea about who we are we are a mobile game gaming company game development company Which makes mass market mobile games? So we aren't talking small size. We are saying mass market anybody who's out there who's here Who's on the street? Who's in the shops? Who's driving cabs? Who's working in a big corporate company? Everyone is a player as a potential target audience for us not for not Just by making one product by making a suite of multiple products and again and again So you realize that we as a company have to make new products Continuously at every interval every cycle So we need data from day one for this new product to figure out product market fit again and again So this is very important to us. What's our current scale? We are approximately five Five this is little old data, but we are approximately six to seven million daily active users daily active players More than 15 million and more than 25 to 30 million monthly active users Across all our games on the right side. You can see that okay some of the games that we have made This is not all the games, but these are some of the prominent ones the top one We are team put these famous in India We are one of the top ones there and we are the top grossing across the board in India as a Teen put the game as a gaming company has anything you can say The other prominent games being one Aliabad game. We did we did the Bahubali official like the big Game, which is a real-time strategy game real like okay 20 mp real-time strategy game Like anything clash of clans clash of kings or all that kind of shape and size made for India Then we have and the other bottom you can see a Ludo club game Which is currently very very popular on especially not on the Google Play Store, but on the Facebook messenger So across all these games we have more than six to seven million daily active users all these games Okay, at some point and there are features there are real-time games real-time players are playing with each other in Real-time on Indian network or Indian subcontinent networks. We have been doing this for five years Obviously now internet speeds have improved dramatically and William continue to improve but it was not the case back into the 13 So we have been kind of doing this for India for quite some time All the games are cross-platform games optimized for our primary markets, which is India and the subcontinent and As a company we are profitable So this is our journey and in a short I'll talk about the current scale in terms of data analytics that we do We do approximately 20 billion events per day unique events per day as of now Total size of data that we ingest every day is more than 800 GB So this is not small scale. Okay, though the users are this but for each user also We kind of capture tons of information continuously all the time you're playing our game anything you're doing We kind of are tracking it in real-time as real-time as possible So obviously this is not where we started but we knew that okay if we get to a scale We will see a lot of scaling requirements in analytics. So how did we go about it? Now just to kind of go back rollback four to five years what we actually wanted on day one Like which were not negotiable. So we needed access to data of any product or game launches immediately Like you can't say that okay We'll launch the product and know about the data or how the product is doing next week or next month No, that is an strict no no for us if we do not have data will not launch the product This was the rule to be followed from day one Should be able to query roll level like in the beginning lay okay when you just launch the product You have one user you don't have hundred thousand users. You have one user two users third user just came So you need to know what's happening? What's that user's journey? What is he doing? What is he not doing? Where is he dropping off? What is he not liking? So for that you need to know each and every event that is getting fired by that user in that session as quickly as possible And as real-time as possible So we said in the beginning that okay less than five minutes latency to the warehouse where we can query roll level So this is the requirement we put on day one second cost sensitive cost sensitive in both terms infrastructure Because obviously being a startup bootstrap startup no money to start with so obviously we'll use AWS free credits So that was the first thumb rule second ops light Click you can't spend live when you are busy building the product You can't be spending your mind share and all this time cycles in fixing analytics ops, right? third resources You don't have all the companies all the products out there in the all startups for sure Do not have unlimited resources do not have resources to spare Then to build analytics rather than building the main product, right? So in the beginning we said we'll start with a shared engineering resource Only that we'll try to write down or kind of have our requirements very clearly and then increase it to one dedicated person After we start that's it not ten people not hundred people because you can't this is reality. This is our journey. So Not saying that it would or bad, but this was what it was And the third large scale up requirement. We knew that okay the games We had worked in games before this was not our first for a we had work me Microfounders my early people like everyone worked at Zynga Might have heard of it games like my favorite farm will take it will like really large-scale games like 30 40 million Deli active users kind of games So we knew what scale means and what it would look like if it was a success So we knew that okay We'll need to scale up at some point in time, but also we knew that we don't want to over architect So we said should not cause headache till one million daily active users So it's all right to plan for some future, but not necessarily for eternity Now so I've listed on seven steps kind of I have made it as concise as possible seven steps Which I feel we ended up following obviously in hindsight. It's much easier But take it as a guideline and see if it helps you in your journey of building a data pipeline For your products first Understand understand your requirements understands your business constraints Each business is little different from each other. Do not go by what's written in the blocks as like verbatim Understand what what is a must for your business and for the product or the product market fit to be figured out And what is good to have and what is okay to delay for future? That is gonna be very different for you as a business and as a team as well Who you are if nobody is gonna look up the data, then what's the point of just storing the data and capturing it for? sake of it and That's where as an engineer I feel that that's the difference between what you need what you want versus what is cool and it's very important to Differentiate early on because you don't have resources. So that's what is very important. I'll go in details second think generic This is more from our perspective Obviously, if you're building one product, you don't need to necessarily think generic to its definition But we have to we had to support multiple games. We knew that this is the first game We'll make another game in next quarter. We'll have to make four games in next year So we knew that we have to rinse and repeat the same analytics platform or the data pipeline So we had to think generic I'll give examples on how did we go about it? Which is a lightweight way of looking at generic data schema third produce data well if you do not have Very foolproof very clean system of data production your whole pipeline is not gonna be much useful like a lot of ML a lot of data analysis that you are gonna do or your team is gonna do later on if your data is noisy or has kind of Corrupt data or corrupt things coming in then it will be more headache for you later on So keep it clean keep it simple and produce it well fourth design v1.0 Mafia data pipeline do not design for eternity design v1.0 I'll talk about okay How we did it in the sense where to cut corners where not to and how did it pan out for us? Fifth open up enable many data interfaces once we have captured the data It's very important to open the whole whatever you have captured as quickly as possible to your team if you're not using the data And you have just captured it. It's useless So know how to use it in your product for the benefit of the product benefit of your business benefit of your tech whatever it is but Open up then six obviously tune and repeat, but specifically I will talk about optimized for usage Not necessarily for just so this is a newer new two level level in the market or somebody saying that Optimize for own usage because you have certain people you have certain business requirements fit for fix for that first And then go to the next step seventh upgrade to v2.0 after that after you are you know that okay You are capturing data. Well, you're storing it well or ingesting it well or all that stuff and you are using it well Before that do not try to upgrade to a new system no matter what and And especially this is important because we games do not give you warning before they scale up They can if someone tweets about it, they will go to the next level immediately like okay in a matter of a day I've seen games go to 1 million to 5 million daily active users. So Games do not give you warning. So I think that okay. We have an example where it's all right to think in this framework still with the market going Or market forces acting on their own will you can still catch up and kind of do the right thing So let's go deeper like a first understand requirements and constraint. This is example of what our business Requirements and constraints where I listed down You can obviously look at your own products Divided into three columns primarily business tech and ops We knew that we are a tech company, but we have a real business which we want to we did not want to just make games for the Games as a product part of it Games as a business is what we do. So we listed on first the business requirements or Constraints that real-time ingestion is important What we mean by real-time is not necessarily to the engineering definition of real-time what we mean a business definition of real-time Like a user click the button. I need to know about it in Sometime not one hour later Not a day later, but in some time but doesn't have to be the microsecond or a millisecond So know that what is the real-time definition for your business is and stick to that? Second fast query speeds since we had to we did not have ready-getting teams another thing Okay, so we have to in the beginning worry about okay How can we get access to can I run the query and get the result rather than submit the query and get the result Email or in a notification. I think no I want to run the query Then who will be running the query your founder will be running the query or you will be running the career the developer or the QA or the Product manager or all these people who are actually a one resource at the end of the day So you need to run the query and get the result immediately rather than submit the query and see it later third SQL query interface again, this is our choice we came from a gaming company We were already using something we knew that okay, whoever we are gonna work with they all will know SQL and That's it. So let's stick to that. So SQL was very important. We put it in the constraint up front Like let's not kid around here. We don't have time here. So let's put SQL as a important query interface Good that okay. We know that okay all product managers Have been engineers or generally have exposure to SQL very easy to learn SQL all resources your engineers No SQL your QA can also potential so anyone in the company can potentially learn SQL very quickly and write queries fourth grow level granularity need to know what each user is doing and fourth a fifth point in the business requirements rich events Cannot have Some other team to kind of figure out how to join or this event I have but I don't know what this user Smita data is about so try to make the event as rich as possible so that the event itself has some contextual information I'll go through the example and you will see what we mean by rich So that one person whoever that person is can kind of answer a Question whatever business question is or get the answer to that as quickly as possible alone without requiring a army of people Going to take constraints or requirements We wanted generic data design because we knew that okay. We have to rinse and repeat new games again We have to use it. So we wanted generic data design forward and backward compatibility Games as products you are releasing more or less every day and We are in business five years and I do not know of a week where we have done less than three releases on one game So we are doing releases continuously So we have to be forward and backward complete compatibility cannot break the whole data pipeline Third simple architecture because it will be simpler to manage and less resource intensive fourth immutable data That we put it as constraint will once written will not change it whatever the event is and I'll go through like how that has impacted our design Ops that's also equally important when we say that we are a data first company We need analytics on day one if you cannot keep the uptime 24-7 Then you're not really a data company then So hosted services we have to use hosted services. We don't have bandwidth. We don't have DevOps and SRE and all that stuff scale out capability will require from day one and Resilience to bad queries. This is very important And this is kind of due to the choices we have made we said that okay Anyone can access data in the company that means that any most probably half of them or most of them will write inefficient queries So it's all right. We will handle it. We will take care of it So that is a design how that matters in the design at the bottom I've given example of the kind of questions we ask in the business to give you a flavor of it So one question is this how many users who played at least one game yesterday? Very simple question anyone asks. We are saying we ask even more complicated How many users who played at least one game yesterday saw the bonus pop-up also? Even more complicated how many users who played at least one game yesterday saw the bonus pop-up for the first time and this is not a Complicated question for a business to ask. This is very simple. Okay while walking we will ask this Okay yesterday we added this pop-up How many people saw it for the first time and we have to answer it We have to convert this question into SQL query as quickly as possible So your schema has to back it up and then this is not where the business question ends This is just a kind of at the opening of the rabbit hole then the next question. Oh, how does it compare to the last week? What happened yesterday? How what happened with bonus pop-up? This is the trend. What is what about the other pop-up that we added last week? So it goes into that so Your whole data pipeline has to support that kind of questioning. Otherwise you will Hinder the questioning part from the product side itself, which again I think will show up in your speed at the end of the day But just to give you a flavor of what we do So this is High level of like three different buckets where all these requirements and constraints matter first data design Like data designs should support multiple features and games So generic data model required and changes to schema often required in the beginning reality. This is all right Interface and usage should have sequel interface sequel is of use for the teams and Resistance to error in queries as well as inefficient usage. So people will write bad queries and it will be inefficient They will do extra joints and inefficient filters. It's okay. We have to make it work Third scale have a scalable system scale up and down because the games do go up and down like this in terms of daily traffic I Will show you the graph at the end and scale out easily because you have will see scale in terms of number of users Also the ceiling going up. So how do you scale it out? Now let's go to the point number two Which is think generic on the right side in the box what I have written is some high level guidelines Okay, in terms of keywords that I would want to put if you don't remember anything It will be good if you just try to remember some of that First rule was generic data schema has to be generic. That was our requirement like that's rule Look, nobody is trying to ask question. Why it has to be generic. No, it has to be generic We'll add more columns as we go along. It's okay. We said we will add that's the reality So how do we support it third normalization? How much normalization and how much not we all go through that whenever we are doing schema design So we do have some normalization requirement But then also not getting to full definition of all CNF firms to create some way stop somewhere in the middle knowing your business requirements So this is the example. This is again a subset of tables. I'll give an example So these are some of the tables say we created we have created like first is a table count I'm table count the moon frog table count is for all generic events in the game Like which in Google analytics or flurry will have the normal whatever the table is every event comes there as a Dump second is m table economy like games if you have played, you know that there is virtual currency, right? Currency is coming in the game going out of the game users are gaining from bonuses and other things buying it and then spending it Another thing as game as a business. That's the most important thing that we have to worry about Like if any like more than the game itself the economy is important because if there is no economy You are not making any money So economy gets its own table to be managed m table user for all user specific event and user metadata m table open for all app open event sometimes when we're doing app development, you know that okay Half the time people have opened that and kind of tech problems and then dropped off or something of that sort So that doesn't require user data necessarily We just need to know how many people opened and the loading screen loaded or did not load So that's a sometimes becomes a little more aggregate level analysis So we created separate and a payment payment obviously we all know why it's important Now why this table helps us do certain things So this is like if you just join m table count and m table user It helps you kind of do answer some of the questions like this how many users opened bonus pop-up How many users from Rajasthan played poker and Rami both yesterday. So these kind of questions Okay, you can just do joins now each question The person who's writing the sequel is now thinking which two tables I can join and I just have to write that and it's okay And it will work lay okay, so very basic and it works It works for every single person engineers can do it very easily But it works for every single person in the company from practical experience. I'm telling Then maybe you can merge any two table like user and payment How many users made successful transactions in last five minutes? Fair enough. How many you think users on iOS face transaction failures yesterday? that also can be done by doing this and Since it is joined by user the subsequent question you can ask I need to know which user exactly So that I can pull up the whole session information like every single event that he fired in that particular session To retrace his steps his or her steps to know where the problem was or what exactly happened So this helps us kind of ask subsequent question very fast and answer it You don't need to kind of run this query somewhere and then submit the result to a central team saying that ago find out more about it No, you can find out yourself Now going little deeper example wise m table count what we did is very interesting And I don't know if you guys have seen this anywhere, but this is what worked beautifully for us This is a sample of what it looked like early on on in early days of what the table was This is the schema so counter count kingdom phylum class family genius This is specifically very important nomenclature wise. We all have seen this somewhere. We all have studied in school this works is Like we have seen Google analytics it gives us two three levels may be some other tools gives us three levels We wanted more and why did we want more? I'll give it to you an example But we came up with okay. It has nothing to do with the game that we are building For no matter which game for what feature for what part of the feature You can track it in this way in a very the whole biological system has been divided into this every single animal tree to Any living being fits into this kind of nomenclature. So why can't our data fit it? so this is a very interesting way of nomenclature wise and Trust me it has become first-level like okay. It's no longer in new joinees in the company are asking Okay, what is kingdom exactly? Why did you name? No, it is not common. Nobody is like having a kind of a Special moment or hold on what is kingdom? What's the difference between phylum? Frankly the usage of data doesn't have to care about what is kingdom and phylum and frankly the data analytics doesn't have to care What is inside phylum and class and genus and etc the context is up to you? Data pipeline doesn't care about context it cares about the structure and that's it So data is structured for the sake of tech But at the same time it is unstructured the context is up to you the way you want to use it now This is the example It's example when like we say start session is the counter. So whoever is the product manager or the Responsible guy for that part of the business looks up to that kingdom can be like player load It's up to them to define context is theirs. They can say player load or a player load Not whatever they want to name it. It's up to them. We just say it has to be where care of 50 characters max Now whatever they want to put Player load started where it started loading screen, which button did it click button 120? Genus empty. I don't want to use it. It's okay, which is the player ID Okay, here's the player ID What is the client timestamp or the timestamp on the server side when we saw it? So there can be differences and other things fair enough to capture OS which OS was he playing we say G G play Google play iOS web Whatever it is game ID, which game are you talking about? So these are the generic parts for data pipeline So when the data engineer is looking at it He's just looking all kingdoms are coming fine or not all philips are coming fine or not not looking at okay The player load is coming fine or no. We just look at kingdom is coming fine or not The same game for the some other event will look like start session player load player load started now player load finished Now what kind of questions you can ask by just by this these two events How many users try to player load? You can ask just group by kingdom. How many Start session have fired a started event, but not finished That also you can How many times user successfully loaded by clicking on button 120? Because you will change buttons, right? You will add more buttons remove more buttons and other things this also you can do So it helps you do all this stuff and data engineer data pipeline guy doesn't have to come into question He doesn't know what is inside the kingdom and it's all right So that was example now come to data production part produce data very well if you don't produce Properly from all different sources. It will bite you tomorrow and it will doesn't help with the next level Some thumb rules that we followed Keep your data producer dumb Keep them dumb Do not put brains on that side because then it will be very hard to monitor them change them fix them Where the errors are happening another keep it very simple very clean very dumb less transformations, that's a rule we followed we allowed data to insert the way we want to consume as Simply as that and the data structure that we talked about helped us do that in the rich the data Now you saw like a kingdom phylum class genus play family genus five levels so add more context The row itself will make sense to the right person By itself you don't have to oh, what is this saying? I do not know that itself has enough information and misuse is okay It's okay. People will Do a typo player load they wrote something else. It's okay. Why do you care as a data engineer? You should not care. It's okay. Let them fix the context themselves rather than taking kind of doing a very tight linking between these two we kind of remove the link completely So I didn't what we did here identified all the data producers and understand the requirements accordingly in our case We obviously have Android iOS and all these apps. So they have to generate data users are clicking directly there So we actually in one of the games we Captured for small time captured every single swipe that they do because we were trying to figure out Where they're dropping off why they're not getting this right swipe action in games. It's important, right? So we enable that also on the fly and it's okay So some constraints which come into reality cannot keep sending each event over network So we saw it by batching seems like cannot lose data Even if app is crash at crashes or is killed by the user and other things We would want to capture as much as possible hundred percent is not possible But still as much as possible so that we can we know that okay Why did he uninstall or why did he drop off in the pod? So use local disk storage and back it up and other things keep out of context from the application itself Know that okay when I said we do three releases a week on average for last five years in some of the weeks We are doing multiple releases on the day. So your code is under flux at a tremendous pace so your Data pipeline cannot come into that fire. So it has to be separate enough But at the same time not breakable just like that. So try to develop it as a library. It's okay Keep it generic things Second part of our stack is lot of micro services, which most of you must be also having so a lot of micro services We have a lot of data producers They will scale at their own will that team will decide what to do whether it's a monolith or whether it is a full-on Microservice whether they are horizontally scaling vertically scaling whatever they're doing so your data Production part also has to scale accordingly with it cannot keep sending each event over network Your micro services cannot be sending so much over the internal line itself that look at that becomes it starts to become a bottleneck at some point So again batching use local disks on the boxes try to see how you can do a network sync Easily keep data collection agnostic of micro service itself You don't know how many micro services that particular team is gonna write So today they have three tomorrow. They will have 30 and this is a reality They will have so develop it again as a library You should not care which micro service is calling or sending you data Why do you care? Data is the first level citizen data as it is Whoever wants to send send this like this send us like this. I'm not gonna make an exception. I'm not gonna accept any other This thing also So those requirements batching using local disk and standardized common libraries This became a requirement to write the data production layer, which is the SDK or Those you might have your own micro service to capture data so all that stuff example. I've given Leo Okay, you will see that okay There are three examples. I've given like count count sample visit and they all take similar parameters Counter count kingdom phylum class family genius exactly like the data structure was how it's gonna be in the table. So Direct mapping there is not much data transformation is happening and for the reason and also all these functions are just wrappers They're just abstractions So you have given that abstracted layer to the application developer so that they can Do the right thing but your data structure remains the same you're ingesting everything in the same format We are just given different interfaces and that's it sample usage being stats controller count So start session player load finish cancelled something like the example we talked about Now the fourth step which is design V 1.2 very very important the V 1.2 is the most important part Now this is a image of not what we did This is a very typical data pipeline architecture data sources then your integration layer ETL Goes to data lake then to data warehouse and then dashboards reporting bi engine uses it very standard Obviously, you don't have resources. You don't have time and how do you build it? So first we defined the guidelines less layers These this has too many layers for us to handle practically in the beginning. We don't have time in the sources keep it transparent Keep it transparent and okay. What you get is what you ingest the rule for the data pipeline engineer You ingested player load, so you will get player load. I do not care So nobody a product managers cannot come to a data engineer and say that okay I ingested player load, but I can't find it or I put by pop-up in the kingdom But I can't find it so data engineer has to be agnostic of that you entered in the kingdom, right? So it should be in the kingdom. I don't even have to look it up. I have not done any transformation So it will not go anywhere. So that's it. Did you put it in kingdom? Did you search in the kingdom? That's it. So it helps and it's only V1 most important know that it will change You will change it yourself more requirements will come You can you should make it harder or you should make it literally okay the enough questions should be asked But still it will change. So it's okay So this is what we started with high-level we kind of merged the data ingestion or the ETL layer To the left side, which is the data sources itself So microservices whatever it is everyone gets a SDK or a client to send data We have given it to the library you send us data using this this will directly ingest to our data warehouse not to data lake Why because we needed to query immediately. So if a date row has been ingested There is a end user here lying around. Okay, can I query that? Can I do a join? Can I do a select count star? Whatever so we had to insert in data where in a data warehouse first and then In parallel do it to the data lake and sink it to the data lake. So data lake became our secondaries in Multiple discussions with a lot of AWS and data folks around Bangalore and other things. I've realized that this is anti-pattern Like okay, we are going data warehouse to data lake rather than data lake to warehouse. It helps the use case if your whole Kind of data consumers or users can wait Can are fine submitting a query and coming back after one hour to look at the results? It's okay. Maybe the pattern works fine But for us that's not the case. So we had to choose this anti pattern and then users are directly querying the warehouse and The dashboards reporting BI tools engines and other things are also querying data warehouse. So it helps This is the first cut obviously this helped us do the early things on the products We launched a product. We knew immediately users are dropping or did not drop. How did the install funnel look like users? Install that. What did he do in the first minute for games? I'll like the most important question. We lay okay kind of Kill each other over this that what did he do? Tell me what did he do in the first session first minute 30 seconds? What did he do if we don't know we are blind then we are screwed It's not games are not transactional platforms. You don't know that. Oh, he wanted to buy a shoe. So he bought tissue No Users don't know what they want to do. So they clicked because Shahrukh Khan's picture was there now It didn't turn out to be that so we have to know that we have to know their favorite color before they do and that's important Some details of this you might have seen a squirrel kind of thing. So that's a the We name we gave this is our internal client So just in some details about that it was a data collector, but a thick client not him because we merged the whole data integration in ETL to it Written in Golang. We knew that it has to be performant to some extent Deploy it homogenously. We don't care which microservice where you want to deploy it here game team take it deploy wherever you want Rotate and move file in a local folder processes events only once guarantee only once and for sure Input interface was a file which is being written locally by the SDK and batch reading file rotating archiving cleanup So that the disks are cleaned up Output interface directly to redshift and S3 redshift being a data warehouse that we chose in the beginning again free credits worked SQL interface in The beginning works very beautifully. Obviously, we didn't have to pay in the beginning. So it was a good decision in that sense Badged rates to redshift and S3 because you're doing a network call S3 API versus redshift API is which one works at what pace you should do it and then accordingly tune tune the batch I'll come to that in the tuning part a little bit redshift started with DC to large whoever is knows the smallest node to Took over I think and then 15 GB RAM. We started with that because we thought that's enough But since it can scale horizontally, it was okay to start with Different users and queues for ingestion in usage know that okay. It's all going at one place So your compute is gonna happen. So compute should not starve ingestion and ingestion should not starve your compute So have different queues keep moving window of data started with 120 days So we started with 120 day retention so that disk is limited and daily and loads for that fifth step open up many Data interfaces keep it simple transparent misuse is okay again from behind and use third-party interfaces So we directly give access using DB visualizer and SQL workbench to do whatever several tools metabase redash Daily business reports through JDBC or DB C scripts and other things. We did a hack We use data dog as our charting library. We sent data to that in 15 minutes five minutes intervals and use their chatting That's a hack as I start up. Obviously, you can't build the whole UI Upfront so that's boarding alerting we use them Some few things we also had to do but we don't have time we can talk about it later Click since we had less data we had to load data back if somebody wants it They want it rarely once in a week. They would want older data. So we just enable that using rendix scripts Now tune and repeat know that not all data is important always backup at all levels Drop and rebuild quick any issue drop the data from your warehouse immediately and rebuild it quickly Have that capability if you can do that it enables you to do a lot more and data will increase with time Not just in terms of number of rows But also in terms of columns and the columns will also increase in complexity So redshift you have to enable compression on certain things and when to do when not to do also comes in picture So bottlenecks in ingestion we had to worry about it insert versus copy in redshift parallelization requirement thick client leading to problems as the application itself scaled up bottlenecks due to usage we realized we needed more columns as we went along your data scientists or product managers Cole equation of queries user asked one query then another then another then another like 10 queries Continuously to answer the business question that means that okay If the table is big and you are looking at only subset of data it becomes issue So we had to introduce split tables on the bigger tables like mTable count was split into count pop-up count button and other things To kind of give a faster query speeds Now then comes to the last step. This is how it looks like roughly v2.0 Play okay. Let's say like this side is all the applications different kind of application from mobile clients to your web services Microservices some 20 of them now. They all send data to what we call Badger pipeline. That's our new system centralized Kind of controls have written in golang nsq if anybody has used for distributed queuing works Beautifully over there for us very small footprint, but handles our current scale again data lake anti-part and we have still hold On to we are pushing data warehouse directly that shifted still but we added a real-time layer also Which is powered by mem SQL so that the SQL interface is maintained but at the same time we are able to kind of have some data much faster than otherwise and We have added overtime Priority queues and other things for different data different priorities and different SLAs and other things I'm not covering that here, but high-level The now we have moved from thin thick client to thin client which we called Badger does more or less Same thing but doesn't upload to the network just sends data over TCP to nsq and that's it So that becomes faster very light weight works beautifully for us and again keep them dumb Let's transformation misuse is okay and centralize for performance not centralize just for the sake of it centralize only when the performance becomes a bottleneck for you and current scale Total events per day 20 billion plus This is unique not counting some of the redundancy total size of the data 800 GB plus average events per second 200,000 Peak event per second. This is the bottom graph. You can see in Before play okay dinner after dinner games do see high traffic especially when everyone is sleeping. It will be low So peak events go up to 350,000 and even at the lowest point it is around 65 to 70,000 so it's never zero but at the same time so our infrastructure has to scale up and down Accordingly and also be ready for higher scales so this is what it looks like and I Think this journey has helped us get there Build it incrementally not worry about that big bang aha moment But kind of knowing the business knowing it inside out knowing the usage and I think the most important thing I will emphasize is collect well But at the same time make sure you are actually going to use the data and you are using it Make sure data is the first level citizen in your company in your product Otherwise all this is just a good tech problem to solve not a business problem or not a not gonna help the company of the product much So just a recap these are the steps. Happy to talk about it If anybody wants to go deeper into one of the specific things and other things Thank you