 Yeah, it was nice to know about SEO. So that's my VPN, actually. I was working. So my name is Senthil Kumaran. I work as software engineer at Twitter. And thanks all for coming to know something about what's happening on Twitter. Close the VPN stuff. OK. I used to work. I just am one and a half years ago. I used to work in Singapore at different companies and I got an opportunity to work as a Python developer at Twitter and then I took it up. It's interesting. And I have to say that it's one of the good experiences I've had because I tend to work with extremely sharp developers from all over the world. And being a member of the set of developers who are really skilled and really passionate about what they do, that first it helps me motivate to be in the development line for a long time, be as a good developer for a long, and then also helps me to appreciate the craft, which is software development. It's not just a business about the software. It's a craft. It's how you develop the software, how you make the software stable, and what are the complexities which are involved in developing a software which is usable by millions of people in the world all over the day. I can definitely appreciate Twitter has about a good percentage of like 250 million users every day, every second. Facebook has a billion users, actually, per day, which is four times bigger than Twitter. And then Google goes by a large margin. And all these companies kind of teach us that how do you develop software which is used, approved, and stable across the hands of all the different people and stuff like that. And developing at that scale has challenges. And the team I am in is solving those challenges. How do you develop software at that scale which is fast, stable, and accurate? So that's what I do. And within our team, Python is a major language which is used. Outside of my team, Scala is the language which is majorly used. Scala and Java are the languages which are majorly used in Twitter. And C++ for the core infrastructure, very less. But these are the languages. And a huge development group, like around 1,000 engineers are there who work with that. So with that intro, I will be releasing. As I said, there's a variety of languages. There is Scala, Java, Python, C++ mostly. And back-end services are in Scala. Revenue and machine learning stuff are in Java. Yeah, just a year ago, I saw the actual production ready machine learning stuff. The one which provides you with the promoted tweet or the promoted content and stuff like that. There is a team. The team is called the revenue team. That's what they say. And the revenue team's whole job is to write the programs which make money for the rest of us to work, actually. That's the one which makes the money. The machine learning programs go about doing the job and make the work. I mean, it looks like one of the major, the currently hot field in the industry right now, machine learning. I'm not there into it. But I have seen friends who are good Java developers, good Scala developers, very good in algorithms, and usually from top universities. Yeah, however they are. And they are into this revenue team which write the money-generating stuff. And generation is huge. Millions of dollars per second or something like that. And then thrift. Thrift is an Facebook product. Thrift is for the protocol-agnostic sharing of the code. Like, you write a thrift specification and then you give it to someone else. And he can generate the Scala code. I can generate my Java code. So he can, he can, using the thrift template, you can write your Scala server. And I can have my Java client. And then we both can interoperate with it. So by doing this, we share the thrift. And thrift becomes a protocol-agnostic way of sharing it. And huge lot of code is written in thrift. And surprisingly more. And by writing one thrift, you can generate Java, Scala, Python, other Ruby, all the other languages. So that's what sometimes Facebook, I think it started with Facebook. Facebook and Twitter, what they did was like, each one seems to be expert in different languages. No one is an expert in all the languages. One person may be good in Java, another person may be good in Scala, one person may be good in Python. But in order to have a big company like Facebook or Twitter, you need to have these good focused people into the same team. And you cannot ask everyone to learn the same language again because they'll be losing their expertise. So the solution seems to be like, let's write thrift. Let's generate the languages which you are comfortable with. And then let's go about, let's go about writing the client services over the thrift. Okay, let's write the client which will interface on the thrift server in the language which you choice, in the language of your choice. And let's go about with it. Huge, I have very less knowledge on that. But we deal with it constantly, every day. Okay, infrastructure and tools are written in Python. Twitter before 2011 and 2010 was mostly Ruby. It started off as a Ruby on Rails app. But later when WorldCorp and all happened and then billions of people started using it every day, in order to have the scalability and in order to have the power, in order to have the power of JVM, Twitter you started using JVM. Okay, so when I said all the languages, let me have something in the background so that, so this is my Twitter repository. I'll start counting the major repositories which we have so that, let it go in the background and we can come to it later, okay. Let's come to the pass, like to give an indication of what all the languages that we use. And this is not full to pass yet because I just chose three of the major repositories which we work on. And I needed to give that, to give an indication of how huge the software development is. And once we get the results of it, we can come to it, it's still scanning so let's give this some time. So how the development workflow happens is get this the version control repository and we all work on a branch and then the branch is submitted for review. So by review, there's a tool by name review board, okay, reviewboard.org, which everyone can, it's our open source tool. So the review, the patch which we work on is submitted for the review in the review board and then there will be experts of each branch of the repositories. And then they are usually listed as the owners in the repository. And the person who's listed as a owner needs to approve and give a ship it, actually. And then only when the review is given a ship it, when you submit it, it doesn't merge into the master yet. It goes into a submit queue, which is like a Jenkins instance. So the whole, the submission goes through a Jenkins and then the whole set of suite is run against the entire world of the Twitter to see that the changes which you are making, even a single print line statement change will be running the entire 1000 tests or so for your repo. And after that is run and then passes, it will get merged to the master by the Jenkins itself. This is the, I mean each change, sometimes if you are doing a change in the major library, code library, it takes a couple of hours after you complete the code and then review it and then submit it to go into master. But once it's master, it's kind of like stable approach that we are, we have written a well-reviewed tested code which need not have. And the reason these kinds of things have happened, okay, when I started in the software development industry working at Dell, we didn't have these kinds of processes in the software. We were, they had manual test teams, okay. The test team's job was to download the binary and run the tests against it. And with companies like new web 2.0 companies, there's no manual test teams at all. That industry is gone actually, unfortunately. But because it has been replaced by tools which have such a streamlined process of review and automated testing and stuff like that, which give you a good graphical result of if it's correct or not. So if the submit queue says okay, then the change is master. And then deployment, it's like, it depends on the team. Like immediately it's deployed to the hosts wherever we are. And it's controlled by a decider, okay. That's what we call the decider. So there's a web framework by the name decider which has all the features. So the one which you saw Twitter.com on the website, it has like 120 services, okay. By that I mean that. So each thing, this page is a service. This following tweets and followers are different end points of service. There's who to follow. This is like a machine learning algorithm going on to say such as who you should follow. And then the trends, it's basically given from the search and analytics team which goes and then searches for all the things and then gives it. And this is a trade service. And this one is running on like hundreds and 200s of machine, okay. This is one. And this, this is like the Twitter page consists of like 100 services. And each 100 services, one service will consist of like 10 engineers. And so 10 into 100. That many engineers are there, thousand engineers are there. So this is, and I was surprised to know that the Google Docs, right? Okay, I mean because I knew this because my team member whom I work with used to work on the Google Docs team. And you see that in the Google Docs there is the star button to make it a favorite. You remember that? The star button, okay. The star button. This seems to be the service in Google and there are 30 people working on it. 30. 30. And all smart engineers, by the way. I didn't know nothing of that thing. Just a star. Yeah. So that seems to be, I mean, surprising to know that that seems to be the state of the world right now. That's it, I. Yeah, depending on the team you are and the kind of service, your code is immediately deployed. Application engineers are encouraged to turn experimental features on a subset of users. That is followed by all the web companies, I think. Because you can randomly select 1% of the users and then turn on a particular feature to get the user feedback on. That's it. And in order to support the architecture of Twitter which is like real-time web used by millions of people per second and then different clients and the TV channels and everything, what we needed to support the development for was a scalable version control system which is Git, but with a lot of engineering energy gone into it. The review process was something which I contributed majorly, like we had a tool by named Git review which is not an open source yet. But that's one. But the scalability of Git is itself is a huge thing. Facebook faced the problem. They went about with trying to use Mercurial. I use Mercurial before using it and I think that it is better than Mercurial. And so some of the engineers also think that too. So Twitter is investing on Git again, like thinking that let's see if we can enhance Git by using standard scalability thing. Instead of putting a report on the Git software on the object model itself, let's shut the servers where the Git servers are installed so that our latency is small and then Git can support better. Which works fine. I mean, I don't have any complaints, but there are people who have a lot of complaints against it. And Facebook seems to have a complaint and then they are moving to Mercurial. I have no idea about it. Build system to support multiple languages. This seems to be interesting because I have no idea about what it is before I join Twitter. If you are a Java developer, you may perhaps be using Ant or Gradle or Maven. Gradle is used by the Android studio for the Android development. Ant and Ant is majorly in the eclipse world and then Gradle is, Maven is another one. I have used none of them because I had never used Java before. I didn't know about what is a building. Python doesn't use that. Python doesn't build it. Python is interpreted. But in order to support multiple languages, it's easier and better to build the software, create a binary executable, attach the dependencies and create a jar so that a jar which consists of all the dependencies together so that you can deploy the jar as a single binary on the deploy. And the idea came about like this. They wanted the whole twitter.com to be a single jar, which can be thrown into a cloud and then run. So that is why I think historically they went powerful with it. And in order to have this, yeah, I mean I ran this account, lines of code count. And just before the three metered repositories. In order to have Java, Scala, Ruby, Python, BornShell, SQL, and a little bit of CSS, some of that SQL stuff. In order to have all these combined together into a language node, into a build, it needs to, I mean there is none, Ant cannot do it. Ant simply will not support all these languages. Okay, Maven cannot do it either. So Google has something called Blaze, okay, their build system. I'm not sure if there is a Google Blaze build system. So here's, building in the cloud, how the build system works, that is the thing. Facebook has Puck build system. Oops. So it's a Java build tool, okay, and Twitter has Packs, okay. So this is our Python, this is our build system we used in Twitter. And this is what I work on, the team which I work on. And it's about compiling the languages and everything which is written in a variety of languages which I showed you. So given this much of information, the introduction, I can dive a bit into the build system and then show you how things act, okay. If I speed up, please don't mind. We can have that thing later. And by the way, all these build systems, Facebook is open source too, Puck, but it's for JVM Java and Android only. Twitter is open source too, it's called Twitter, GitHub, Twitter Commons. It's for Java, Scala and Python, and it's written in Python. I don't know, Facebook's written in Jiphon perhaps, I don't know, I have never looked at it. But all of them are inspired from Google's build system called Plays. And all of them have been developed by these engineers who worked on Google, who worked at Google and who found that that build system to be of very good in building such a good stable software. And they got inspired by that and then whichever company they went, like Facebook or Twitter, they went about implementing something similar, okay. So the parent of it is Blaze, okay, and people still appreciate it for many things. It's not open source. These children, which got folk from that, these are open source. So you can clone it and then look at what it is. The review process, I said like, it's a published and submit, it published to the review board and submitted, that's okay. Okay. And Twitter Commons itself includes a build system. It's something similar to Ant, Maven and Braille, and that's what we are going to talk about. But look at an example, okay. Let's directly dive into an example. I have the code checked out so that we don't have to do it. So I'll increase the font. So I'll read a Python Twitter, Pants, and let me open from there. So this is how the Pants thing is. So it starts with and I'll explain the directory structure from here. The binary which we have, okay. I'm finding it, I'm thinking in terms of how should I give an introduction? Because if I dive in directly, it will not be offline. I mean, you can give the directory names. But let's look at the example of something which we are going to compile. Okay. That's another quick question. In Twitter, the Python process run on Jicon or? Python, Python. C, C. C, Python. So let's look at SRC, SRC Java, com, Twitter, common examples. Ping-Pong is a name and then let's look at a build file, okay. So this ping-pong is creating a binary binary ping-pong which has a base name and which has a main class which is like the main it's pointing to. And it has a dependency called a ping-pong lip. And the ping-pong lip consists of the dependencies which is using the guava, which is like a Java library for it makes Java a little bit more like, it's Google's Java library, actually, guava. And Google is a Google's again Java library stuff. And Sun has provided some Jersey client. Since I don't know much of Java, I don't know what they are for. I'm giving an example of a Java compilation and then I'll look at the, I'll give an example of Python compilation. Okay. And then it depends on other modules like application, HTTP and HTTP modules. All these are separated into modules. And if I go into one of them, like application or HTTP, say, let's go into common application and, okay. And SRC, Twitter, common application, that's where I go. And if I open it, again, this is a build file by name application and it's written as a Java library. And this one says that it provides a jar by name application which is just published in com.quitter namespace. And it has the repo specification wherever it wants to publish it to. And it has its own dependencies, okay. So, as I saw, there's application, HTTP, and then everything else. It will pull in all these dependencies and then bundle it together into form of a Java library. And this Java library can be used to create a JVM binary. Now it forms the sense, right? We wanted to create a JVM binary, a binary which can run on JVM. And the JVM binary will depend on a Java library. And the Java library in turn depends on multiple Java libraries. And this is like this. And in order to create a fans goal build, fans goal, let's say compile, okay. SRC, Java, com, Twitter, common examples ping pong. This will go about compiling that Java library with that stuff, okay. Let's go fans goal bundle, okay. It will create a bundle to Java library from the different stuff, okay. Creating this ping pong that jar and stuff. And you can also have fans goal run. I'll get the full example from here. Let's run this, okay. Fans goal run. And this will run the whatever sources we ran against and let's running it here. And we can see the various end points which it has exposed to. This is just an example which everyone gets used to. So like what are the different contentions, like logging aspects of the library. The developers don't do that, okay. These end points are not written by developers. If you see the code, none of these end points are written. It's inherited from the dependencies. And I'll graph you for the different processes like Glass, Loan, Com, Viewplot, how your performance on the JVM is happening. So these kinds of stuff are all available in the form of libraries by the Twitter and common library stuff. And like the important part and the interesting part was a JVM binary created by a Java library. And now let's look at SRC, Scala, com, Twitter, common examples, but okay. Here it's a JVM binary which is created from a Scala library. So because Java and Scala are by code compatible, you can create a JVM binary from a Scala or a Java library. And you can create this thing too. You can interpolate. And it uses the same build system again. So we have abstracted couple of layers now. Same build system, same commands, same interfaces, same binary, but internally you can substitute one library with another. You can use Java or Scala, okay. That's it. And at last, it's not just for Java and Scala, it's for Python too. SRC, Python, Twitter, common, SRC, Python, Twitter, common, fans. Let's look at fans, let's do it. Okay, and fans is the one which we use. And that itself is written in, that itself is like written in Python, and the source code is in Python. What I'm trying to say is bootstrap. See, I use the fans command initially, right? And so I'm going to look into the fans code itself. And I open the build file. And I see that it's a Python binary by name Py. Just like I showed you, there was a JVM binary. And now it's a Python binary. And the Python binary by name fans. And it has an entry point, whatever it starts to. And it has a dependencies, like what it depends on. And the dependencies, if you look, here it depends on the fans limit, okay? And it says that for what platforms it can build. It can build for current, it can build for Mac OS X or Linux. And the fans live, if you see, it's a Python library. Okay, and the Python library depends on so many other Python libraries. Like the app is our application framework, or the conference is the tool for the key-like thing, okay? It's for publishing. And then it can depend on context-suitable and some other languages, like process-suitable and stuff like that. So the encouragement is to write libraries modular so that your library can be used by some other product. And once you have the dependencies on it, you can build fans in a similar way like this. Fans go, fans src, Python, Twitter, fans. And this will go about building that fans bundle, okay? Just like the Java library, we have a fans bundle. And it wrote it inside the disk directory. See, sorry. Now, it wrote, first previously we created the ping-pong.jar, right? And ping-pong.jar has here, it has all the contents of the class. And similarly, fans.txt has the contents of the PYC files which are to be created for running them. Okay, this single executable can now go compile your Java Scala and everything for, and create binaries, can run tasks, and do multiple things. That, I mean, that's pretty much like, I wanted to give an example of how it feels like to have a build file and compile a Java Scala binary and stuff like that. And it's interesting that other companies are adopting it. Foursquare is a major company which is using fan-spill systems, stumble upon a son of a, and others, like if Twitter engineers go forming some other companies, they kind of like, since they got used to it, they go about using it. So, and common libraries are, since heavy effort is put into it, and it's open source too, it's easy to utilize it, okay? I sometimes don't like some of the portions of it because there's too much abstraction, but still it's good to have a look and then develop something better. A lot of things are featured in media. There's lots which are, if you go here, there is like plenty of projects which are available, like there's Storm, there's Scalding. Scalding is like a Hadoop job running using Scala Phileg is our RPC system, which is one of the code for the Twitter, and then JavaScript stuff. I'm not a web engineer, but there are a lot of web engineers who do plenty of good work in Twitter. I have no idea about that. But that's like a Zipkin is for like tracing and to have a cache with them and crash proxy which is used by Wikipedia as well. And by the way, it's like a Scala library, I have no idea what it does. But plenty, and we talk about commons only right now. Like that there are like so many other stuff. And Twitter got featured in a number of media recently, like the Tweets per second apart. This one was interesting. Here are new Tweets per second apart. On August 23rd, 2012, I guess, 2013, very recently. It was August 12th, 2013. It hit a new Tweets per second, that's 150,000 Tweets per second. It was huge. I mean, you remember just few days back, few years back, there was a C10K problem, kind of web server, 110,000 connections per second and stuff like that. Now look at this, it's 150 Tweets per second. It's huge. And this blog post explains about how it went about doing that and what happens when so many things happen. And Twitter didn't break that, that's why. That's why it's like, and what was the event which was happening which took place was, I think Japan is, yeah, Japan people was the Eric of Castle in the Sky. I have no idea what it is. Okay. So when Japan was watching Castle in the Sky, there were so many people tweeting about it and then it hit the high rank of like 150,000 Tweets per second. It's a mark, how much? When Obama kind of like says that we want election or something like that, that's huge, okay. Or when, what is it again? Thank you such an campaign which happened. That was huge, I guess. And there's a lot going on. And there are other media articles which got featured too. Like this one, the second coming of Java, the Relic returns to all the web. This explains how the Java is used, okay. Java by it, it's JVM. JVM and then other languages on top of JVM. And then there is return of blog, how Twitter rebuilt Google's secret weapon. This is about our cloud infrastructure by name MISOS, okay. It's open source too. So it's an Apache project. Apache.MISOS.org. So making it easy to build resource efficient stuff. Airbnb seems to be another consumer of it. What it does, it is based by the University of Berkeley project. There's a professor by name, Stoica, who's a researcher, very famous in the distributed systems world. He kind of leads it. And this is like, if your company doesn't want to spend money on Amazon web services, because they are costly, but you want to have the same scalability which Amazon web services provide. And same kind of like abstraction which AWS provides. Instead of investing on AWS, you run your own AWS. Well, that's what it is about. Download MISOS, install it on all your hosted machines, and then provide your engineers as AWS instances. It's like build your own cloud kind of thing. This one is like, and so Twitter doesn't use AWS. We use MISOS because it's developed within, installed on all the machines. Our jobs are running on MISOS. And it can be monitored, it can be, and it's like standard distributed system stuff, okay. So that had got a good feature. And that's again, as this article shows, it's inspired from the Google again, okay. Google has done really good stuff in this, in not just Silicon Valley, but all over the world. So there's a lot of technologies which are built very similar, or sometimes when they are building the second time, it's built better than that. This is another example which. That's what I wanted to share. And if you have any questions I can take. I hope it was interesting. A lot of information was presented. And if you have any specific questions, please feel free to reach out to me. Yeah. Thank you. Questions? Are there any questions? Yes, yes, yes. Hence, in the, how is this done in the classroom? No, the underlying compilers are still GCC, JV, Java and stuff, Scala, SBT, okay. But in order to bundle all the projects together, it's like make file. So make file is the traditional build system which we know, make and make file. Make file by itself doesn't involve, make file by inside it involves GCC, right? So in a similar way, these structure the projects in such a way that you can have the organizations, you can have the, you can have the compilers, correct compilers with correct flag getting involved to build the binary. They are not compilers by themselves. Pardon? Python two. But it's compatible with Python three. It's Python 2.7, but past it's written in Python 2.7. It's compatible with Python three, upward compatible. I have no reason, we have no reasons why we are not using Python three. I mean, I guess you would treat different major version changes like a different language. Yes. No, not for, not for Python inside of there. The languages that are compatibility label libraries which are written, I'll show you an example. Let's look at, so look at this code. It's comparatively Python three. If there is a comparatively layer of Python three, we import statements are written in such a way. And if there is a, if there's not a Python three, it's written in such a way. So it's some random thing which I recollected, but the code is written for Python three to be, I mean, that's because the developers who work on this care about abstractions, generality, and everything. If I were to read, I'm not sure I might have done it like that because I think I would prefer sometimes simpler straight forward approach. But some people prefer it more abstract, general, and overly scalable things. So it's both Python three, Python two, and anytime it can be switched to Python three. So this one plays the Google's business stuff that seems to be written in Python. And the build file, which I showed, the SRC common, Java, com, Twitter, common, examples, ping pong. See, I'm opening a file by name build file. But look at the syntax of this. Imagine that this is a function call, and the function call has a name parameter by name name. And this is the second parameter by name dependencies, which is like a list, and each of the elements of the list are a parameter with a function call by name pants. This whole build file is a compiled Python file. And how it works is, it traverses all the build files and then PYC compiles them. So if you, once I compile that find type main data PYC, you will find that there are build.pycs also, yeah, here. There's a build.py, so the build file itself, which has written like a text file for writing your dependencies, those are Python files themselves. So it can be compiled and then made into a single compiled object, which can be integrated. Choice of languages of Python is historical. It was chosen before I joined. Probably, it's a very good idea. I was chosen before I joined, probably because Google's blaze was using Python and possibly because you can write your, you can write your configuration build file as a language itself to support multiple. And since it's a language, you can do a lot more things in the language instead of just writing your configuration file. And bootstrapping, Java has the cost of bootstrapping. Okay, Java processes are very fast, but it has a cost of bootstrapping. Like when you write a tool and then you run it, it needs time to start. Whereas Python doesn't have, so these are some, all technical reasons, like, perhaps, yeah, I think. Is this a piece of the Scala view tool for this compatible SVT? It uses SVT, but it uses Zinc as the thing which are comparable. There's a good article. It's called, is it TypeSafe which is creating the Scala compiler? It's called TypeSafe. Yeah, TypeSafe. So let's TypeSafe Scala Pants, okay, sorry. Zinc and incremental compilation, okay. So this TypeSafe company which creates a Scala, I don't know how they are associated with that, but they are majorly creating Scala tools on it. They mention about using Zinc to increase the speed of Scala compilation. And Pants enable Zinc by default. And I think here, yeah, it's a Zinc has been already integrated with Scala, Maven plugin that's currently being integrated with a new build tool developed by Tipper called Pants. It increases the speed by a couple of points. Major speedup is gained more by caching. Like, once your build is created, like here, inside the Pants.d directory, sorry. Here, there's a whole lot of things which are cached. And then the caching seems to work in effect. Second time, there's no compilation, it's just copied. And if there are multiple developers which are developing, and if I and you use the same module as in the Shared library, and if my Pants build compiles for me, when it comes for compiling for you, it will not do it because it can simply download it from the network for you. So, a lot of tricks in caching and stuff which is used to speed up the process. Yes, not just local, but remote as well. The open source Pants definitely has it. And that's how the Foursquare, which is a 100% Scala shop, seems to have increased their build speed when using Scala. Cool, thank you, thanks. Yeah, sorry. Okay, it's a Shah. The Shah is calculated from the Git Shah at the moment, as well as the path of the... If you see the names of it, I think you can figure it out. Like, there will be various different Shahs which will be computed and then met for different things. It's basically, it's the name, path, and Git version number. All take place together to create a unique artifact cache. And I think that's it. Okay, cool, thanks a lot. Thank you, thank you for listening. Thank you. Thank you.