 long. You will be kicked or dragged off the stage if you exceed the time. There will unfortunately be no time for questions because we have so many talks. So without further ado, let's start with Harshal, who is going to talk about TweetStormy.com. Oh, and I have a couple of other announcements to make just quickly. There is a talk in the Banquet Hall downstairs by Arpitha about sleep in the brain at 1745. There is also an ongoing off-the-record talk, which is about AWS cost optimization, which has already started from 5 p.m. But if you want to catch the end of that, it's happening. One last minute thing. Ah, yes. The first shuttle for the party venue leaves at 6.10 p.m. People should gather near the registration desk. Shuttles will leave every 20 minutes. The party is at BrewSkiPub. You must wear your badge to have access to the party. You must not lose your badge because you will need it again tomorrow to get back in the conference venue here. Okay. Over to you, Harshal. Hi, everybody. So you might have seen me from the morning. I was a volunteer. What am I doing up here? Let me warn you. It's not related to DevOps at all. So I'm a drug addict. Okay. Now that I have your attention, that drug is Twitter and like most of the audience here, I think this is a relevant audience. So I've been working on a very little, very, very little side project. If you are a pro Twitter user, which I think this is a relevant audience for, I think you might have seen a lot of threads on Twitter. Those are called tweet storms and like a lot of famous internet things. They were invented or the term was coined by Mark Anderson of Netscape and you might have heard of his VC firm. So I'll show you a few famous tweet storms. This was won by a guy called Sidharth about sexism, but not in the way you would expect. There are other famous tweet storms as well. MKBHD, a famous tech reviewer, just goes on. Here's another by Dan Abramov. He's a open source contributor. So I'll show you what I mean. You just right now, like it's not publicly accessible and the logic doesn't work. So I'm looking for beta testers. That's why I'm here. You type here and instead of manually sending a tweet and then replying to it, you can see the tweets in advance like maybe when the logic is more advanced after 140 characters or before you can break so that each tweet in itself is coherent and preview the tweets. We have a good mobile view as well. It won't. It will be a completely free project just like for learning and so I'm a computer science student and you might have heard of the principle of the principle of least power. Everything that can be made in JavaScript will eventually be made in JavaScript. So I was like fascinated. Let me learn Angular. That sounds fancy and let me learn Koa. I already knew express. So I do when made the front end. I was like very tired. Oh, what is this? All this angular shit. Then I decided to go with express itself because I didn't have the patience to learn anything else. So that's it. We'll be launching very soon. If you are a pro Twitter user and would be interested in this kind of thing, please go and join the wait list or email me the contact info is there in the photo and sorry for putting you through this blatant advertisement. Thanks. Okay. Our next talk is going to be darker on arm by Siraj Nawad. Is he here Siraj? No. Okay. We skip that one. If he shows the player maybe then in that case we'll go to the next one in the talk for now, which was identifying anomalies using graphite functions by Aditya. Just sign the disclaimer. Hi. So I wanted to talk about the way we are using graphite to figure out how to expose metrics and basically find out when things are going wrong in a way that you don't need a developer to understand what is going wrong. Now to give you some background, I work at capillary technologies. We have some services or some integrations with a lot of external vendors for a lot of various things. We also run a lot of services in house. So we do have a few modules which require to integrate with maybe 20, 30, 40 services externally or internally. Now at any point of time, we would like to figure out if these services are running fine or not running fine. Now the first part, like many of the talks I've already talked about is that collecting the data. So we started doing that and we started using the Kodahail library and graphite on top of that to get our metrics. We quickly ran into a scenario wherein we were exposing upwards of 800 or 900 metrics for a couple of modules to basically find out how the system is running. Now we drilled down, we tried our best, we got it down to something like 200. Now if you have something like 50 services running and you want to find out the health of each service and some which are there of the module itself, you would require a couple of metrics for each module in particular. So we were still looking at upwards of 100 metrics which needed to be monitored on a regular basis to find out if things are working fine or not working fine. So obviously someone who doesn't really have insight into the system can't really make sense of that. So we tried using graphite functions to make sense of what is happening. Now I'm not sure if everyone here has used graphite functions but it's a little cryptic to start off with but you can basically write regular expressions to get all your series of a specific type to show up on one dashboard. We ended up having dashboards with upwards of 50 graphs which doesn't make sense to anyone. So then you start off applying functions on top of this to start making sense of it. Now one of the basic things that we tried doing was you apply something called as a maximum above an average above function. Now what would you want out of something like that? You would basically say I have 50 graphs running over here out of which if any of the services that I'm interacting with let's say takes a time of more than a second or two to respond, I would like to have those services drop on my graph so that when something goes wrong, I just look at the graph. I have one or two or three services showing up on the graph and I know exactly where the things are going wrong as opposed to looking through 50 and if you go through the legend of 50 items, it is basically incoherent and if you are like me who is basically colorblind, it's very, very difficult to find the difference between the shades of all those 50 graphs. So you apply something like a maximum about an average above graph and you bring that down to two or three or four graphs, which is very, very easy to understand. So now assuming you want to take this the next step, you typically would apply statistical means to or a machine learning algorithm on top of your grasp to find out if things are working fine or not working fine. Now going by the typical mathematical approach, you could apply some statistics on top of it. One of the ways that people do this is applying exponential smoothing functions. Graphite by default gives you the functionality of something called as a whole twinters operation. Now let's say you apply that on a time series data that you are getting. Basically what this does is it looks over the data that you have over the last certain amount of time. It finds out the possible deviation that you might have in that data, which is acceptable. You can set how strict or lenient you want this to be. And whenever the deviation exceeds these limits, you see a spike. This in combination with your max or max below graphs, basically give you pointers to find out exactly when something is going wrong and where it is going wrong. So now we are in a position wherein we just tell certain people, some particular team to just look at these graphs. If anything shows up on the graph, you basically raise an alert. So that's how we solved a lot of our monitoring issues. Our next speaker is Rahul Menon. Rahul Menon speaking on self-driving Kubernetes. Hi, my name is Rahul. I work with Waisapay. So I'm just going to take about three minutes of your time just to tell you about what self-driving Kubernetes is. It's basically running Kubernetes in Kubernetes. I see a lot of value in it mainly because as you saw the demo this morning, you can actually scale out your Kubernetes cluster just by executing one single command. It helps with deployment, upgrading, scaling, a lot of things. So I've been trying to work on this for the last three months or so, trying to get this thing working and out into production. I've still not succeeded, but I can see the light at the end of tunnel. So I lost my train of thought. So if you have been following up with the Kubernetes community, this is a project in the Kubernetes incubator called Bootcube. So this essentially what it does is it brings up a temporary Kubernetes server, an API server, a scheduler, and a controller manager, which then tells your Kubelet to actually spin up your API server, your controller manager, even your HCD cluster. You can actually host the HCD cluster, which your Kubernetes server stores things on in Kubernetes. And people behind this has obviously been KoroS. I've been trying to follow with the project maintainer and trying to get bugs sorted out. As from the demo this morning, if you actually want to upgrade your Kubernetes cluster, you could actually do a live edit like shown this morning, change the version and apply it. It's that simple. If would go down your API server, when upgrading the API server, you get a minute or two, I'm sorry, a second or two of where your API server does not respond, but it's back and your cluster is functional and it just works. So yeah, that's pretty much what I had to say. Anybody wants to talk about it? You can find me outside. Yes, I do have a blog post. Yes, sure. We can talk after as well. Our next speaker will be Juber, giving a digital transformation. Is he here? No, okay. He transformed himself out of the venue. So then it's Jananjay with a cloud for robots. Okay, guys. This is sort of continuing of the work we did at ETH back in 2013, which was presented at PyCon. And back then, Docker didn't exist. So we set out to build our own cloud platform using Linux containers and twisted. And as you may imagine, it was quite a mess. We managed to scale to about 15 nodes. And that was sort of the end of it. We went back to our research and our jobs. Then we decided to take another shot at it at the beginning of last year. And the vision we have is quite simple. Robots, as you see in the slide, were supposed to be a part and parcel of our lives. We all grew up with the Jetsons and we all grew up to robots surrounding us. And that's really not happened. I mean, this is where robots are. They're in factories or they clean your floor or they cost a few million dollars and kill a few people. And that's not what we want from robots. Robots, the way I see it, are assistants to our day-to-day life. But building a robot ex-company is really, really hard. You need to put together people from so many varied disciplines to get a simple product out. And that's often really hard to do. So we want to take an approach similar to smartphones. Think about it. 10 years ago, you had devices that had a bunch of processing power and were connected to the internet. And it was all monolithic. You had just a couple of companies like Nokia and BlackBerry. And it was really hard to build a mobile phone. But today, someone sitting in Shenzhen can make a mobile phone because they know how to make great hardware. Someone sitting in a garage can make apps. And all the complexities are handled by a single platform. And that's allowed us to democratize mobile phones. It's created the mobile revolution. We actually think there is some scope to do this for robots. So in May this year, in fact, in 10 days from now, we're launching the first service powered by our cloud, which is basically autonomous drones and delivery drones that use the power of the cloud to do complex computation storage and processing. And as we progress forward, we see this vision in a way that allows us to orthogonalize and federate and commoditize robotics. If you're a guy who knows how to write a great application using JavaScript and knows nothing about robotics applications, algorithms, or hardware, you can still contribute. If you're a person who is an expert in building hardware, you can focus and provide drones as a service. If you are a person who wants to write crazy algorithms for routing and navigation, well, you could come on board and create routing, navigation, and picking algorithms. The idea is to sort of open all of this up to as many people as possible. And since this is a root conf going into a deck stack, well, we are working on our own fork of open shift and communities. And we've added a bunch of controllers. So each robot is now responsible for its own compute in a bulkheaded design. And what this allows us to do is sort of scale in and scale out. And cloud computing does not mean providing an API to a bunch of machines. It actually means consuming compute storage and network orthogonally as required. And I think this is a key enable enablement required for robots to sort of succeed and for us to see them everywhere. And yeah, that's all I have to say. So look out on reputa.org. We'll probably open source components of a lot of these things and we hope to push back to the community. And if you find this interesting, hit me up and have a chat. Thank you. I've got a minute, so I am going to play a little video. So basically right now we are full stack and our full stack extends to designing our own hardware, designing our own chips, designing our own devices, and also writing front end code, writing all of that in one piece. So well, that's the version. Imagine the potential of connecting these agile machines and giving them infinite computation and storage. Thank you. Pratik. Anshu Pratik. Aha, here he comes. You need to just sign this quick disclaimer. Also, if Sriram is here, maybe you can sign the form ahead of time. Hey guys. So someone, I think Aditya already talked about using Grafana functions. So it slightly builds on top of that. So essentially my problem was we had outages, right? Everyone has outages. When we go and look for the post-mortem of the outages, yes, there was a graph, there was alert, there was everything, everything which was required to tell people that there's going to be an outage was there, but still we would come to know about it in the post-mortem. Yes, everything was there, but still we missed it. So the problem was there was too much noise. Okay, this is alerting, that is alerting. How do we get around it? Let's say we want to alert around latency. This is how the typical latency graph looks like, right? When it comes to monitoring, do I really care about whether it's one second, two millisecond, 200 millisecond, 900 millisecond? No. What I care about is what was it there in the last minute, five minutes ago, and what is it now? As long as it's same as where it was, let's say a few minutes ago, I'm all fine. Beat one minute, beat 1500 minutes. I do not care. So what we did was, and then other thing that we had to do is if you want to put monitoring, you essentially have to, let's say, especially in AWS, you have to put a cloud watch alert or whatever monitoring tool you have to put, you have to put on each and every specific resource. That becomes time consuming. So what we did was we imported all of the data into Prometheus. Prometheus at times is database. I think we spoke about it earlier today. So out here, let's say two specific pieces that we started off with. For my two regions, I have the NLD host graph. So out here, it's like all zero. This is all zero. This was a deployment that was happening a few minutes back. So now this essentially filters out the noise. The moment there's anything non-zero, I know that I should get alert on that and I should action upon it. This is what I was talking about specifically in terms of latency. So this is how we just saw how the typical latency graph looks like. And this is my graph specifically for the alert basis. So here if you look at it, everything is between zero and one. I have set my threshold to three. If anything goes above three, that's the only time I care about, okay, there's an issue and it should alert. Other than that, as long as it's all fine, yeah, it's going up and down, basically the delta is going up and down. Last second, it was 200 millisecond. Right now, it's 190. Next second is 320. I don't really care about that as long as, you know, it's been in the threshold. So this is what we do now. Essentially, this is a delta function. So if we take a quick look at it, drop common labels, so that removes the rest of the data that information that I don't care about. Delta function for all of my ELB latency alert, here I have removed the other ELBs that I don't care about. And I'm doing it over five minutes. So this is one way of doing it. The way it started off, this is built on top of Prometheus. So let's say this is the same graph in Prometheus. Right now the alerts are based on top of Grafana, Grafana 4.0. The latest Grafana basically has you can put alerts over there. Earlier it was not there when I started off. So this is way, like you can put alerts either in Prometheus or in Grafana. Now we do it in Grafana. And let's talk about noise, right? So this is all the Grafana alerts that I get now from this system. And if you look at the count, essentially let's say May 3rd, May 4th, every day, the number of alerts if you see, they are in single digits. Only like when there's any specific outage, let's say this was the day I had 12 alerts. And that's, let's say if you talk to any monitoring guy, that is like pretty, pretty, pretty less, right? Otherwise we get like down in alerts. So yeah, this is how it is. We are planning to expand this to further more things right now. These are all the four basic alerts that takes care of covering most of my things on a higher level. For other things, we have more rest of the monitoring still in place. This is just to ensure that my downtime is minimum. In fact, with the help of this, especially with the latency alerts, we are now able to avoid down times and outages because the moment any of them start spiking, I know immediately something is going wrong. And now we can act upon it even before the outlet starts. So since the day that we put it about two weeks back, the day we started putting this, we actually caught one particular outage in progress while doing an deployment. We were able to avoid it. So yeah, that's how it is. What we are trending towards is detecting these anomalies and preventing the outages rather than you know, fixing the outages post-mortem. So yeah, that's what it is. Thank you. Thank you very much. Our next talk is by Sriram and it's entitled restful email. Okay. My colleagues actually just put my name on the board. So I have to speak for it. My name is Sriram. I work for Endurance and I'm not having slides as such, but you can go ahead and sign up on our platform, Bluehost, and we're launching a new product called restful.email. So basically we give the developers the power to send emails, where API calls, track, and get their quota as such, and also determine whether you have successfully, the recipient has successfully opened the mail and click through rates as such. This is launching pretty soon. Go ahead and explore the tool and I encourage all of you just to check out DevCloud and use it. I think digital ocean doesn't have the integration. So that's like one of our selling points. So just go ahead and check it out. Anything else you guys want to know? I can, I have questions because I think it's just like 50 seconds. That's it. Okay. Go check it out. Yeah. Thanks. All right. The next talk is nursery rhymes as applies to DevOps by Shakti. Yeah. So the motivation is to use nursery rhymes that everyone knows about to share DevOps experiences and best practices. So I've tried to put this together. I hope you enjoy that. Please read. Great. Let's begin. Jack and Jill went up to the server to run the test with Docker. Jack pushed code and broke his test. And Jill never spoke to him thereafter. Tester, Tester, have you any bugs? Yes, sir. Yes, sir. Three code dumps. One for my master scrum. One for my lead. One for my manager who's a friend indeed. One little two little three little containers. Oh, little five little six little containers. Seven little eight little nine little containers. Oh, BSD has had jails forever. Humpty dumpty sack to debug. Humpty dumpty squashed a bug. All the team members and stakeholders gave Humpty a big tight hug. Goosey goosey DevOps, sir. Where shall I wander? GitHub or Bitbucket. This had on something sooner. Cloner project folder code to make it better. See collaboration working closer and I shall be an eye opener. Twinkle twinkle unit test. How I wonder where you exist. I will write unit test until the project is late to rest. Code code code your way gently down the screen. Come it early. Come it often and life is but a dream. See my little hands go hack, hack, hack. And my little test run back to back. I just have one word to say to you. Come learn DevOps and say I'm happy for you. One, two, pick your crew. Three, four, shut the door. Five, six, write your scripts. Seven, eight, test them straight. Nine, ten, make them your zen. Eleven, twelve, time to sell. Thirteen, fourteen, customers are keen. Fifteen, sixteen, customers are seen. Seventeen, eighteen, repeat your routine. Nineteen, twenty, get paid a plenty. Project issues in the way, in the way, in the way. Project issues in the way, my fair user. Fixing bugs right away, right away, right away. Fixing bugs right away, my fair user. Merging PRs as I say, as I say, as I say. Merging PRs as I say, my fair user. All the tests are passing, hey, passing, hey, passing, hey. All the tests are passing, hey, my fair user. As a client you should pay, you should pay, you should pay. As a client you should pay, my fair user. DevOps really save the day, save the day, save the day. DevOps really save the day, my fair user. Last but not the least, when you say clap your hands, give me two claps. Okay. If you're happy and you know it, clap your hands. If you're happy and you know it, clap your hands. If you're happy and you know it, then your face will surely show you it. If you're happy and you know it, clap your hands. If you apply software patches, clap your hands. If you apply security updates, clap your hands. If you apply software patches and you apply security updates and your application still survives. Clap your hands. If your package install worked, clap your hands. If your gem install worked, clap your hands. If your package install worked and your gem install worked and your NPM install also worked, clap your hands. If your CI tool is running, clap your hands. If your code is compiling, clap your hands. If your CI tool is running and your builds are always passing, then you're happy and smiling, clap your hands. If you have a plan A, clap your hands. If you have a plan B, clap your hands. If you have a plan A and if you have a plan B and you always use plan C, clap your hands. If you notice the containers crashed, clap your hands. If you build the containers from cash, clap your hands. If you fix them in a flash and you redeploy them in a bash and your manager didn't give a trash, clap your hands. Thank you. Okay, the next talk we have is SSH key management with Python and Jenkins by Mehul. Hello. Yeah, so I wanted to show on a small little script that I've written. One of the problem points that I usually face with, even though we are a small team, managing the SSH keys was becoming a problem as an inventory in the number of servers that we have started increasing. At times people would change the SSH key and we didn't really have it available all the time. Sharing of the SSH keys was becoming a problem. I tried to look for a quite a few scripts around which would allow me to update the SSH keys and manage them. I didn't find anything useful. So we had two requirements. One is the user should be able to upload the SSH key by themselves. And second would be that we work with the X-PACE cloud and Google Clouds. And especially Google Cloud has some provisions with API that you can update the SSH key to the API. And that makes some of the things very simple. So what I did was I just set down one evening and wrote a simple Python script using a couple of packages that are already available in Python. One is called SSH Pup Keys. You just pass the SSH key to this library and it passes the SSH key and it separates it out in a way that you can use it. So what I was doing is take the SSH key from the user and first take the user name part and since I tied it with Jenkins, I was able to validate if the person is uploading SSH key for their own name. So what we had was that the user name on the server, user name in Jenkins and the email address for the user would always be the same. So whatever SSH key you uploaded would only be yours. So that validation was being done whenever SSH key was being uploaded. And I used a library called SSH authorizer. Basically the library takes the list of hosts that you have and pushes the key that you pass to it to the given host. That gives you, and then at first I thought, like, I have the script. How do I give it out to the user? So first I thought about building a small web API for that. But then I realized we already have Jenkins and I didn't want to build authorization layer and that would be a lot of work to do. So at least we have Jenkins and Jenkins can handle this. All the users in our company have the developers have the access to Jenkins. So I just wrote a Jenkins task where user can pass the SSH key that they have. It will be passed by the script and if everything validates correctly, it will be passed on to the server. And if it's in case the next step that I'm looking to do is to pass it on to Google Cloud using their APIs. So basically the good part of what it is, while the machine is running, you can replace the API key even if you don't have the access to it. You just need the access to the APIs. So yeah, that would be it. Sorry. It would make a little more sense if I had shown you what I'm doing.