 It's a hard state price and I can't really talk about it. So if I end up screwing it up, it's fine. So how have you enjoyed the conference so far? Will the talks really, what you expected them to be nice? Well, each one of us had a conference where I used to, when I was young, I had the idea of a free t-shirt that I would take home. That's one takeaway thing that I have, like really I can take home. When this talk is finally done, I mean, everyone's already talked about how to do stock in terms of automation, et cetera, basics. Just bring your stuff out there. There's probably something that you can take home and just start playing. I started with this around like two weeks back and I started to make something usable, ready for the conference. So it's just a really quick thing. Now, I would just like to start with a joke. I really like this one. Why did Kogawa's mix-up happen with Christmas? Not 31th December. Do you notice your face? Yes. Let's see. Anyways, yeah. So how many of you or like deal with SaaS companies? You do. Anyway, you do deal with the SaaS company. Now, next comment. How many of you deal with customer-facing SaaS companies? Like end users? People use that product in the browser? So one, I co-founded this company called seminars.com. It's a SaaS platform for trading and online education. Now, one of the biggest problems that we used to face about our company was we were never able to quantify what is the end-to-traffic level. Because it's an internet company. It's not an OS software or something. So everything has to evolve around the way. The currency of web traffic is basically a static medium. It's either images or JavaScript or video or audio. If that doesn't work, it doesn't matter how fast or how optimized your website is. As long as the last mile or two is not optimal, it's out of waste. Now, so what does static content use like? I mean, like, we just discussed. What do you guys do with this? Now, with a lot of these new JavaScript frameworks, that means sites are rendered directly through your CDN, through your history bucket or wherever. Like it doesn't reach a middle server. So JavaScript is directly speaking with your API. Now, all of this has to get faster and faster. Now, hence, we started looking at options like Amazon's Cloud Fun or RaxPlay CDN. And also, I mean, I see pretty much does a very good job at replicating this. Now, since we banned so heavily on these, still there was no way to get a tangible content out of it. What exactly is the bandwidth that's going down? How are the people using it? Now, being a SaaS product is very important for us to optimize our pricing. Now, yeah, I would like to just roll back a step here and say how I landed on this. Now, most of us, when we talk of data, we see data as an end state. Like we have huge amount of databases where we store our data and we say, okay, this user's here, this is that, this is that, we do our analysis of it. Data is not really an end state. I mean, we should look at data in form of streams. Every activity which happens eventually leads to a state. This activity which is happening inside your product is what is a real benefit to you. And that is what I believe should be the metrics of our pricing as well and probably our infrastructure as well. So, yeah, so most of us do miss out on this important metric as a pricing. We do look at API laws. We do look at a lot of system health reports as well, but we miss out on the final output, which is the CDN or the static usage. So, this product, basically this small tool that I wrote is just an effort to address this problem and make it easier for us to handle it. Now, what can rich, such sort of rich analysis do for us? I mean, one thing is the amount of data that is transferred for tenant here would be, if you're a SaaS product, every guy who's using our product on top of it is who do we need to gain. Now, how much of that share each tenant is costing us? Then we can come up with a fair pricing. So, if we have a pricing which says, oh, okay, go ahead now, $10 standard, thank you. But one, maybe the guy's only using $1 of the total cost. Now, we can't even charge it for the easy research because everything is shared infrastructure. The one thing that each customer really bears a price for is the bandwidth consumption. That's one thing that we can do. We can create demographic usage of data and we know where we have to move our application server. If all the backend API servers are sitting in U.S. western but most of the traffic is coming from Singapore, I know where I have to move my API to Singapore. There's something which will really make my site faster and faster eventually. Now, yeah, generate a creative pricing model which is based on data consumption. Just discuss that. Check what is the average bandwidth of the video and audio consumption. Now, since we've done a lot of videos and audio on our platform, what it would mean is now, companies are also serving a lot of traffic in India. We don't have high bandwidth over there. So, what it would mean is that the amount of time it was taken for us to serve a video was sometimes up to 31 minutes as well. Now, that's too large because initially we would make the videos which would be encoded in high definition because the U.S. market would capture it and we'd render the same video to Indian people as well. So, we realized that this is not right. Now, had we not analyzed this sort of usage of our media, we would never reach a conclusion that, hey, this is the real problem that we're facing here. Probably what we need to do is we need to generate really low bit rate encodings of our videos as well so that people can see it faster. Now, this part of, yeah, this is another, the same thing. So, now we could actually, every time a video was uploaded, we could define that this is the most predominant encoding streams that we see that should be served because if it's a 5x5 and it's taking so long to serve, we know what is the most optimal amount of bandwidth we should serve. So, at our transporters, we could decide that, okay, generate probably four or five streams of it and as in when we realize that, hey, more and more traffic is coming from such areas, now we see a spike where the data is delivered very fast. We can re-encore the existing videos to really high definition and start serving those. Now, if the next time I log into my website and I'm logging in from the Indian region, based on our past metrics, we know what the traffic has been like in India. Now, we've pre-leaved that content and we serve it. So, it's way faster than what it normally would be. Yeah, so I'm going to take you a bunch of real examples here. Probably should show you a demo of this thing. That would be nicer. So, this is a graph that was generated out of number of bytes that have been transferred per day. This is the number of requests that came in per day. The red line is the number of requests that failed. The blue is the total, the green is the one that was successful results. Now, let's come back to this part. Like this is what eventually we can probably do with it. Now, just on the basis of this front-end consumption, we could come up with fine-grained statistics. Like, this is the revenue share. Over here, I'm comparing three NX. But this guy consumes most of the stuff. This one consumes not much. This one consumes like barely anything. Unique visitors, this is pretty easy. Now, since most of these consumptions were happening via the front-end, it was starting resources. Now, starting resources would mean that eventually we would also mine our information to a point where we say, okay, how many sign-ups happened? Because certain media would only show at a certain page after a sign-up and it was really easy for us. Now, all of this was just not available. There was no easy way for us to access this information. Sure, I could integrate Google Analytics over it. Probably we could start adding other analysis like from API and also etc. But that would be extra work. This is the information which is already there, which is not harnessing. I mean, most of your CDN, most of your S3, and your RAT address is already giving you that information. It's just that we don't capitalize on it. Now, I don't know if you can see this line clearly. This is completion ratio. Now, this is an online training background, so it's one of the most important metrics for us at how much has the user astute completed a course or not. Just based on this front-end consumption, we could also decide how many people are completing the courses or not. Yeah, so this is the tool that I put together. It's called CDN analysis. Basically, if you have all you need to do is you just take it home, install it on your computer, attach it to your S3, provide it to your secret key, your public key, and we'll start doing stuff. Now, what it consumes at the back-end is how many of you are aware of InfluxDB? I've heard of it. So, InfluxDB is a tiny-series database. It's a really nice tool. It's been written in Golan. Now, this is too much data. Let me give you a small demo of this first. Can you guys see this? Right. Now, this is the amount of tasks that we are trying to mine from one month ago. There's like, what is it, 7 million? Yeah, 7 million requests served when there's something wrong. It should not be 8.3. It's small. Now, can we switch off the lights? It's a lot easier. Maybe turn on the lights. Yeah. Okay, fantastic. Now, this is the request coming in. So, one moment. This thing over here. So, this is what I was telling you about. You see, over here, there's a spike that I see that it was 2.2 hours that I took to serve one single greeting. That's when I realized, okay, hey, there's a huge problem here. Probably we need to bring this time down and start encoding these sort of videos into very small encoding as well. Like, this was probably high definition, so this has to come down to 480p or probably 240p as well. That sort of information can be mined on the field. Now, changing this is really easy. Now, if I have to just sort of buy send, if I have to sort of buy receive, give you buy receive, I can probably change data to say, just show you last seven months, last seven days. So, it takes a little while. So, this is the front-end side of it. Now, most of these tools that exist out there will give you such sort of features. There's a handful of them. There's three stars. There is a couple of more of them. Oh, yeah. So, see, this is from the last seven days. Now, it goes over all those seven-minute records and then these stars are really quick. Now, all of this is also accessible via API. There's API influx as its own DSM. You can use it in language to sell. It's pretty much like SQL. So, you just select start from CDN logs where Blah is Blah, time chain is this, and gives you all that data. Yeah. So, what does this tool take care of? I mean, if you start doing this, you'll run into a bunch of problems. One is it's huge amount of data, and S3 has a pretty bad API. Now, you can't really stomach saying that, hey, give me all the data and give me all the bucket. All these logs are saved as GZ files in your bucket and eat GZ file as a header, and the header has rows corresponding to it. Now, one file can have one row or multiple rows as well, depending on how many requests came in in that particular time frame. I think it's five milliseconds. If multiple requests come in five milliseconds, it will start as one single file. The tool takes care of all that. It also makes sure that reports are never complicated. So, if you run the tool over and over again, it's really important. I just showed you how flexible the query can be. That's the front end of it. That's when you start using BlahFana. Now, it also takes care of user permissions. Like, because it's JavaScript based, one could really go to the command log and see what the user and passwords are, but since it takes care of only read-only and write-only permissions, you're safe in that way. If there is any error recovery, it handles that as well. Now, there's something interesting. This is why I was working with these state logs. I realized that logs never always come in the same consistent format. Their headers keep changing every now and then. This tool takes care of all that as well. So, if they're having better variations, it sizes them really easily. Currently, it ingests at a rate of 500 per second, because that's the rate that Amazon API fails at. As and beyond, our request will start failing and won't show up with results. So, yeah, it takes a handsome amount of time to initially feed the data. I forgot what this is all about. No, no, no design. So, that's why you're saying it's lost because S3 logs can get really slow. If I explain you a bit of how this tool has been designed, so this is S3 logs and Cloudsford logs. All of them are basically taken at 500. The whole tool has been done using Golang. So, it's pretty much a binary argument in this download and ship to your server wherever it is. It's rather easy. You can also create a Docker instance out of it. There's configuration on my GitHub that you can download, and that's your server. Now, if the logs are not being processed, they are sent to a high capacity messaging queue. From the messaging queue, there are constant readers. So, it basically forms the producer's sync and pipeline model almost at every step. That's just what design partners you must be aware of. So, there's high messaging queue. Over here, every content is first answered. The records are passed. Once the records are passed, they are sent to another high messaging queue which can take a lot of, which is persistent as well, a lot of long logs. From here, they can't be taken to any endpoint of your charge. And if you don't agree with influx, any area you want to use, Forceware or Mongo or any other data response, you can pretty much use that as well. This is a standard format that throws up. As with this project, there is one on the repository which is running on a CDM license, the other one is CDM license in trusteeb, which is the consumer of it. And the logs are just added to the database and you can start coding them. This is a small snapshot of how, probably I should show you a live demo. Maybe one of them is actually, I just started this job a little while back. Now, here he says throwing it here because I read the process and it says already process. Now, here's a small number that comes along with it which basically keeps pulling out the log entries from the database and transforms them over to the queue. From queue, if they really were not processed already, it wouldn't be added to the database. Yeah, so all of this do pretty much ingest around, I just showed you, from last month it was seven million records and it would ingest that in around four or five hours. But there's a whole amount of performance improvements that I think I should be doing here. One is we only have one feeder at the moment which goes through all your S3 buckets, I think I can spread it by folder-wise, directory-wise, and make it distributed. Might require some sort of consensus about my graph of access. I wouldn't really figure that out. There is, the queue needs to be changed. I think I'm looking at Apache Kafka which can be used for such stuff because the volume of logs can get really, really high at some point of time and from there it can be taken over. SubpoB is continuous to scale out heavy loads. That's why I'm also working on exposing this as a service. Currently it is a binary that you can download on a source code that you can compile by yourself and have it on your service, but if you're going to use it as a service, you can do that as well. Yeah, that's it. That's all I have. Any questions? Any questions for? No. Yeah, we have eight meetings more. If you guys need any more questions.