 So I am here to present a tool called stop-stop. Let's see what it's about. So, here I have my name is Raj. I am founder at stop-stop. I also work at browser stack as a software engineer. So how many of you guys know what competitive programming websites are? Okay, doesn't matter. I'll explain it. So let's go through a few terminologies that I'll be using in this presentation. The first one being a problem. So what is the problem statement? So basically what you need is a problem statement which requires some code to solve it. So for example, say you have to break a program to accept two numbers and print output of it. So you take two and five as input and you print seven in sdot. And there is something called constraints to the problem. Constraints is basically what would the input numbers be like. So here in our case we are having that you have all the numbers in the input with digit size thousand. A number would be having exactly thousand digits. Then the submission. Submission is basically the actual code snippet which would solve the problem. So let's see how a seamless test program would look like. So you would declare two variables, take x and y and just print output. Will it work? Should it work? Let me go to the previous slide. The constraints here are the number has thousand digits in it. Handle that many digits in the size. How would this python work? Should work like? Because python handles everything very optimistically. You don't need to worry about the constraints on the digits. So that is basically submission, the code snippet which solves the problem. Then there is something called submission status. So what happens is these websites have an option to upload your code and they will use these status. So the first one is exception. So they have a list of test cases which they run against your code. And they will give you these status based on what your code is giving out. The test case is where every test case is passed. One of the test case is failed. Time limit exceeded is basically you are forced to wait. But it is taking a lot of time to execute. Very little bit basically you have declared that it is not actually required to the solution. So it will give you a very little bit of accident. Run time limit is basically if your code is giving some set called something on some test case. Then there are contests. So contests are basically there are a few problems which you need to solve in a limited time period. And what matters here is your speed and the accuracy. So if you submit a wrong answer the first time you will be penalized with some amount of rating. And based on that your rating would be computed when the contest is over. And there are various problem difficulty levels and each problem can be easier or a bit hard to solve. And once all the contest is over there are ratings that are allocated to the user. And based on the performance you get some prizes or something based on the contest performance. And this is a generic one. So handle is basically on a particular website if you have a user name or handle. That is the unique identifier of the user on the website. So yeah any questions still now with the terminologies? So what are the competitive programming websites now? So these websites have a bunch of problems which you can put submission there and they will give you various statuses across that. And that is how basically you learn algorithmic programming and basically logical thinking. So where did the idea of Stop Stop come from? So I was in second year and what we used to do is we used to see each other's profiles, my friend's profiles across various competitive programming websites. I will open every URL and then check what the person is doing. What kind of problems he is solving, what kind of categories it is included in the problem. And yeah so basically that. So what I would do is, say my friend's name is Frankel and his code for this handle is Frankel R34. I will open codeforces.com slash profile slash Frankel R34 and see what the problem is he is solving. On code chef if I want to check, I will check code chef dot com slash user slash Frankel R34. I will check post dot com status Frankel R34. But if I see on hacker or hacker or he has a different handle hacker dot com submissions Frankel dot write dot want 21. Now how do I remember all of these handles and this is just one of the friends which I am checking and say if I have like 10 friends. So remembering all the handles across the website is pretty cumbersome right. So how is the checking of profiles helping. So I can know what kind of problems he is solving. I can attempt on it and say if I am stuck I will get help from him. And also so in vacations right you are not doing anything productive like apart from Netflix or something. So yeah and so that was a great source of motivation to work harder to get an internship in the third year right. So that was basically it. And how it all started was I started writing files and scripts which would retrieve data from each of the sites with providing a specific set of handles. I will just remember these handles for now. So I don't actually ask my friends. I will just Google it and find some handle and I'll just add it to my vital script. And I just created a dashboard which something looks something like this. So each row in this picture is basically a submission submission is basically the code snippet that is solved. So first is the name then the site name then the site profile and time of submission. The problem name the language of submission and the submission status right. So this was the basic dashboard that I came up with in the starting. That was just crawling and just getting this data. So what I realized is across this span of five to ten days I got a lot of submissions across all the friends of mine. So why just stop at submissions when you can have a lot of data analyzed with on these submissions. Now these submissions are not just submissions but actually the complete competitive programming progress of that user. Right because everything is covered every site is covered and I just need to open the profile right. So yeah how many of you guys know this graph. So first attempt on the submission database was this. So every basically green dot is the amount of contributions on a particular website submissions on the particular website. Right cool so not just this this was just one of the graph. So let's go through a lot of features that I could just pull out from that submission database. So the main inherent thing in stop-stop was to analyze your friends progress and along with that you make changes with your progress too. So how does it help. So the basic thing is if you see your friends submissions you will just go to your friend and ask to explain some problems. But there are also editorials on the problems. Every problem has an editorial. So if that editorial is enough for the person to learn from that we wouldn't need friends. So friends so basically you can I say yeah. So a friend explaining editorial which is quite cryptic along with that very short too. So that always helps. Then there is this thing that you read people's code and learn from it. That's the most best thing that you can software engineering is all about it. You have to read the code. If you are good at it you are good at software engineering. So we started as friend request feature. So what we used to do is basically users would register on our website and we can send friend request to each other so that each other can see submissions. But we eventually stopped that feature because people used to not accept the friend request or there's no as such data which is private. So why not just add the friend directly. So we just removed that feature. So now you can just add a friend and your submissions would be visible to other friends. So the dashboard looks something like this right now. Each row is a submission user site profile the same thing. And you also have an option to view or download that for directly from now. Now what you can do is also search problems from tax. So every problem has some specific time. So tag is basically said that dynamic programming or say easy or greedy or binary what that problem requires you to solve. So every problem say our problem was just add two numbers. It could have an easy tag or it could have implementation time. So every problem has these tags. Now every website has different tag names. So dynamic programming might be written as DP just DP. So on core forces you will have only DP as the tag name. So if a user wants to search by it, he would actually want to search dynamic programming and not just DP, just dynamic program. Similarly for bit manipulation you will have bitwise action operation and then bit manipulation. So what you want is something which clusters this for you and gives you all problems which are from that category. So this is what the search problems by the tax page looks like. So in the left you can just search for the problem, the problem tag which can be actual tag, just the name. But if you enter it on the right side, we will just club that for you and give you all the problems which are for dynamic programming. You can see the tag names there. These are the site tags actually. So we search for dynamic programming and we get all the dynamic programming problems. So the user can just search it, choose any of the problems and solve it. Now there is a page called trending problems. So a user wants to see how other quarters are doing, how other contests are. He will have to see which all problems are people solving. So what they do is they just go on the site and see the problems and they try this is the problem. But actually it would be very useful if you just tell him these are the problems which you can solve which are solved by most of the users. So this is what the trending problems page looks like. The left side is trending amongst friends. So these are the submissions in the last 15 days across all the websites. All you need to do is go to this page, just select the most trending problem and solve it. Because a lot of people are solving it, it might be an interesting problem to solve. And then we have a problem page. This problem page has a lot of analytics on the problem page. So I will just show you the problem page. So this is the problem name messenger. Here is the link, the green part here. Link to the problem, he can actually go to that problem and solve it then. Editorial is basically it has a lot of editorials which are added in the website itself. Or you can also suggest editorials as I will show you in the next slide. The right side is the accuracy graph for the problem. All the submissions submitted on that problem across all the users. You will see the accuracy graph there so that you can get to know what kind of problem is it. And there are three tabs, my submissions, friend submissions, global submissions. What you can do is just filter the submissions on that problem based on that. You can write an editorial and submit it so that the community knows about it and learns. So then there is a filter page. So filter is basically say I remember a friend submission. And it was on some site which was on, which gave him memory limit accident. So I will just search for it, the name of it. A friend of mine has this handle. And you can select the site and the problem name and you will get all the submissions there. So you can just see that code that what gave him memory limit accident. And then you can search for that problem. If the user has solved the accident, then what fixed the problem? What fixed the memory limit accident problem? Then there is a to-do list. So what normally people do is basically go to every site. They remember the problem names and they will just come back next day and solve that problem. Now this does not scale in a way because people would have to stick to a site and they will have to remember a very few problems, three problems maximum. But now you have a unified to-do list across all the competitive programming websites. Now what you can do is just add it to the to-do list and come back later on and stop-stop. You will just get the problem that you need to solve. And what you can do is you can see a friend solve problems and you can try adding it to your to-do list. Or you can monitor the trending problems every day and then keep on adding to your to-do list, keep on solving them. Or you can start problems by tagging. And many more use cases are there for to-do list. So this is what some of the problem names is shown. The problem site and the submissions number of users are there. You can also remove from to-do list when you are done solving it. So there is this thing called upcoming contest page. So as I explained earlier, the contests are basically organized by all these sites at particular time, say weekly or say monthly. And they want to know what are the next coming contests so that they can register for it. What they do is they just open this upcoming contest page. They can see all the list of contests which are coming up, which are already live also. You can see the first four are already running. You can go to that contest page. Or you can add a reminder, Google reminder so that it reminds you 30 minutes before off the contest start. So you can register for it if it requires an illustration. Now there is leaderboard. So if you see all the competitive programming websites are on stop-stop. Now if I define a formula for it, it will be actually the competitive programming progress for the user. So what it includes right now is the streak of the user, accuracy with what he is solving the problem, the number of problems he has solved, attempted and a lot more complex formula is there for this. And we have leaderboard for country, global, institute, every leader. And it will be something like the leader for handling, institute from which he is in, the rating. And the last problem is per day changes. So what per day changes is basically say a user registered on stop-stop today. So I will compute a value called the per day value. So till now from 2013 first chance, the number of days and the number of submissions. Number of submissions divided by the number of days. I will compute this value. Say the user submit 1.5 submissions every day. That is the per day value that I store. Once I store that value, I monitor that value and show the difference to the user. If the value is positive means that he has been solving more submissions after the registration on stop-stop. Negative means that he has been very less on the side of solving problems submitting on the website. So this is the way in which user get motivated to come every day and solve problems every day and learn. Then there is this profile page. So profile page is the main thing where the users are attracted the most. They can see that cumulative profile page of all the sites and they can see much more data inside. So I will just show you the screenshots of it. So there is accuracy screen, problems sort, also the contest graphs. So graphs are basically of ratings. Every competitive programming site has some rate, something called rating. Now the rating is basically determined based on the performance of the user in the contest. So contest graphs are basically the graphs in which there are ratings plotted across that site. I will just show you that. Submission graph and the list of solved and unsolved problems. So this is what the profile page looks like. These are all the accuracy across all the websites. The day string and the accepted solutions string. The number of problems is solved and total. Accuracy graph. It is this contest graph. So basically what I am talking about is the right side. Courtship long, courtship cookoff, courtship lunchtime. These are the names of the contests which happen at particular intervals. And people just try solving problems on it. And based on that they have some ratings. And you can see all the ratings of a stock stock itself. Then there is this. So this is what a DD profile of some competitive programmer would look like when he is submitting a lot. Then these are the list of solved problems based on the categories which he has solved. So you can just check. And what these greens and reds are. So green is basically the logged in user has solved the problem. So I am viewing someone else's profile. And I can just see that what all problems I have solved and the user has also solved. So all the blue ones are the ones that the user has solved. But me and the logged in user have it. So I will just add it to the to-do list or say ask for help to the user if you get stuck on that problem. So what is happening behind the scenes? Obviously there is a crawling happening, a lot of crawling. So to give you numbers, if there are 6000 users, there are six sites. There are 36000, the minimum 36000 HTTP records. Where it exceeds is basically the number of pages in the submission. Not every website shows all the submissions on one page. So you have page generation there where you have to crawl it. So first thought is basically write all the handlers. Say code chef handler, code forces handler. You get a user name and you return the list of submissions. That is the first thought you get when you say how to solve the problem. How can we do better? We can store the last retrieved timestamp. And every time our handler is called, we will just break out of that last retrieved. So that we can only get the diff of the submissions that the user made. So that is a better step after that. One more thing, one more optimization on it is why hit the follow course. So users also provide one of the completely programming website. But they just added their handles. So why hit a follow course every day in a chron of 24 hours, right? So after this optimization state, we still were on like 9 hours for 6000 users. Because there are also a few optimizations and the processing around the submission, which happens along with the chronic. So it took around 9 hours for it. So how do we scale, right? That is the issue. So what you do is make user a participant of the system. How do you make user help us in the retrieval process? And one insight on the submission is not every user submit every day on every site. A user can't submit on every site. User may but user can't. User will not. So that was one of the insight. The other insight was how do you basically capture both of these information and build a solution around it? So what we came up with was a delay value we installed. The delay value will tell how many days after the retrieval should happen. So basically last retrieval plus the delay value by 5. I'll tell you what this 5 is. If this plus 1 is current date, then skip it. Then don't run the retrieval process for that site. What do you do is you retrieve the submission. If there was no submission retrieved from the user, you increment the delay value. That is when the delay value sometimes gets ahead of 5. And if there were some submissions, that means the user was actually active. You see that if the user was active this day, he'll be active on tomorrow also, maybe. So that is how it's capturing. So after this solution, what we'll be eventually having is if a submission is already done, if there are submissions on it, so then the delay value would keep on getting to 0 and the user's data would be retrieved next. Then there's the refresh button which we provide so that user can just get the submissions right then. And we just reset the counters when the refresh button is fresh. So current data set right now is 7 million plus submissions every day increasing around 7,000 users across the world. And total number of problems that we have is 45,000 search. What user plans would be... So now we have all the data of the user. What we could do is we would track the last submissions of the user and based on that, recommend the user some problems to solve. And that would be very helpful for the user. Next would be mobile application or discussion forum for people to add and add more sites for software.