 Good evening, everyone. Thanks for spending your day with us. Just going to quickly start us off. It all kind of started with a question with why will we ever build a distributed computing platform in Node? So let's start here with a little bit who I am. My name is Gord Tanner. I'm on Twitter at at Gord Tanner. I am the co-founder and CTO at Bithound. So Bithound is a small startup. We're pretty new, about a year. Founding team was founded by Dan Sylvester, PJ Lowe, and myself. We're from Kitchener, Ontario and up in Canada, just a little bit west of Toronto. Our office is actually in an old restored house in the downtown core, which is pretty nice for Canada when we have about three months of summer. We can sit on a porch and work. But a little bit of a backstory about what Bithound does. The whole premise of what we're doing is, as developers, we generate a lot of data just with doing our jobs. As we work, we generate code. There's diffs we create. There's code we write. There's code we delete. There's tests we write. There's tests we run. There's issues we open and close and bugs we open and close. There's just an inherent wealth of data that is generated as we work on software projects. And what we're doing at Bithound, our goal is to take all this data and to try to use it to determine an overall quality or health score, to give kind of a baseline metric to determine how projects are working. But with that is there's an inherent problem. So to process this data, there's a lot of it. And it takes time. So imagine running JS Hint on a thousand commits in your project or doing another static code analysis or looking at history or just viewing the number of lines in a file. But everything we wanna do inherently is a problem of time and there's a lot we want to look at. So from the very start of Bithound, as we were kind of throwing our ideas of what we actually wanted to make, we knew that this was gonna be a concurrent problem. This wasn't something that we'd be able to do without really using the concept of concurrency. So quick definition is actually from Wikipedia concurrency. The whole concept of concurrency is just the simple act of having computations that are executing simultaneously. And those computations or functions or code that's running have the potential to interact with each other. So with that being said, the question I first get asked is why JavaScript? Of all the languages that are out there, why would we pick JavaScript? There's Go, there's Ruby, there's C Sharp, there's Python, there's Haskell. There are a lot of languages out there that have much richer concurrency primitives built into the languages. So with any real tech story, any real tech decision, there's a little bit of backstory to kind of give you why we made this decision. So to start, there was a company called Tiny Hippos which some of the employees were Dan Sylvester, PJ Lowe, and Gord Tanner, amongst a few others. We were a small consulting shop doing basically hybrid apps. So using those like phone gaps and using like Jill and Whack and building these apps for clients. But there's a problem we noticed and we ended up writing tools internally that became our product at Tiny Hippos called Ripple. All it really was was a mobile phone emulator that ran in Chrome as an extension. Pretty cool, we had a lot of fun, sold at the BlackBerry, but the whole concept was the entire code base was JavaScript. If any of you have written a Chrome extension, that your backing code is JavaScript. So we just had a lot of experience dealing early on with JavaScript as in a larger product. So with that being said, we had a lot of core competency in our team with using JavaScript. We've been using Git for a while. So the whole concept of pardoned upon eating your own dog food, we decided that the first language we're gonna target with, that Hound would be JavaScript and analyzing Git repositories. And what better place to analyze is your own code base. So day one, without even thinking about is this the right choice, we chose JavaScript. So to go back to that concept of concurrency, highlighted a couple of other words here. Concept of concurrency is that if code can run on multiple cores, on timeshared threads on the same core, or in physically separated processors or physically separated machines, and there's a couple ways of handling concurrency in computer science. So the more traditional way that people kind of think of concurrency, they think of threading, they think of this concept of shared memory. So one way to kind of add, share this information between is to alter contents of something in a known state in your memory. So you have a shared memory location, you'll alter the state of that, and usually with some form of locking, you will signal other threads that they can run in and read or mutate that state. So the concept of thread safety or using things such as mutexes or semaphores or monitors, it's a pretty robust and expressive form of concurrency and handling of having multiple processes talk to each other and work together. But as anyone who has ever spent a weekend trying to figure out what caused your service to deadlock is inherently complicated, is inherently hard to try to reason about all the potential timing interactions between threads, between memory accesses, between processes. So another form of concurrency is message passing. And the concept of is exchanging messages which was kind of brought to us by Scala, Erlang and actually JavaScript, where you can asynchronously or synchronously pass these messages around and the act of passing the message is sending the data between processes or threads. So if I go to the actual definition of, like say Node.js is our platform, on Node.js they kind of describe what Node is by saying that it's built, it's for building scalable network applications that are data intensive and run across distributed devices which are all kind of big key points for us and we're knowing we need to build a distributed computing platform. And what's Node's big claim to fame is it's concept of asynchronous IO. So we all know there's an inherent cost in IO and if you're ever to like time slice what your processes are actually doing, IO usually ends up eating up a lot of that time. So just reading from RAM, reading from CPU, going to the disk to the network, there's an inherent cost of dealing with IO. So the concept of asynchronous IO and Node was the idea that having your event loop and just sending off messages back to a backing end thread pool, having those messages kind of come back into your event, Q and B, fed back into your event loop. And Node didn't really invent this but it uses what JavaScript was designed for because JavaScript was written first for the browser and when probably when they were proposing it like every browser writer said you want user code to get in my rendering pipeline hence why we have the invented DOM, why we have like the concept of one thread running but handling messages from mouse move or hover or window ready. We were kind of working with this in the web for a long time and Node kind of just used that same concept of what JavaScript already had but bringing it into the concept of IO. So this doesn't come without controversy. Having one thread, having one kind of process sitting there pulling stuff off a queue, interacting with it, running your code, sending a message off. You can block and that's where the concept of like I think this was a pretty interesting post where they were talking about where you you can block the event loop just by doing heavy CPU intensive work. And it's a bad thing. It causes your request per second to go down if you write things and a lot of people were kind of throwing around different ways of doing it but there is an inherent beauty actually in this limitation. So bear with me for a second. Trying to move a window over there but that's not working. Oh, I give up. Anyway, the concept of having one thread is good because a lot of that weird concurrency mumbo-jumbo that are like kind of preamble you have to deal with in other languages disappears. So the whole concept is am I gonna sit up here and kind of create another distributed framework? If you actually go to MPM right now, type in the keyword distributed, you'll get about 2000 results of distributed frameworks averaging around version 0.0.1. Might have two or three commits three years ago. The short answer is no. I'm not going to develop yet another distributed framework for real to use. It's a very personal choice. Choose the tools that work for you, the network design that works for you. Just to make things easier, I open sourced the section code we use at bithound. I can get it on our GitHub bithoundfarm.bithound.io. Just as an example, use it at your own risk. I might clean it up later but I'm just gonna give you a story of how we built our backing and tools. So starting with, working on another startup called Thalmuc, I was using something called ZeroMQ and it's a really beautiful messaging library. So it's basically ZeroMQ looks like an embeddable networking library but acts like a concurrency framework. So we've all worked with sockets in the past. We've used web sockets, we've TCP sockets. You know the concept of writing and reading to sockets and that's exactly how you work with a ZeroMQ socket. It's just there's an extra kind of goodness that's hidden inside of it. So the concept of what we take a ZeroMQ socket is that they carry atomic messages across transparent transports. So rather than having us instantiating a socket and saying, oh, that's gonna go over TCP, you can interchange through the construction of these sockets to make them go in-process messaging or inter-process via shared memory through TCP or even multicast like spamming your entire subnet with the same message. But that's all transparent. As far as you're concerned, you're reading and writing to a socket. Also, something else that's kind of cool with these sockets is that you can connect them end to end with concurrency style patterns so you can do stuff like request reply. So I can have one socket that sends out a request and have five listeners to that which will round robin and reply to it or I can have five people that make requests on one that will reply. And we're all familiar with the standard request reply but there's also ideas such as public subscribe so you can do eventing. So publish an event, subscribe to events. Task distribution, so I mentioned round robin. But, oh, I went a couple slides. But you can kind of like connect them together. So with that being said, I'll jiggle that. The concept of tasks kind of comes to play. I'll do that one more time. So when I say tasks, all I want you to do is think of atomic chunks of work in your app. So those kind of bigger pieces of information that you can kind of share, that you can go, I can run this concurrency. Once you kind of like think about all those, the problem is is you're probably wrong. The whole idea is to basically try to get the idea to understand your project, assess your actual application's needs and then only then start to define your tasks. And also as you're doing this, build in a way that you can adapt. So the idea is you want to be able to change. You want to be able to adapt to how you are coding. So with that being said, once we kind of had that idea around, oh God, now my computer's frozen. We have the idea of roles. So I'm just gonna try one quick display thing. Okay, so to give you an idea, we kind of started off, we said we had roles that we kind of had an idea. We had this master and slave relationship where a master would be, for example, something like our web server. Its job is to start jobs, to kind of break down the work and to listen to results. And the master process's idea was it would just send off to the slave processes, which would just connect. And their only job is to really process these messages. So the whole concept that a master could send out, hey, can you do this thing for me? And some slave process out there would pick it up, process it and say, yeah, sure, here you go. And send it back to us. The idea is that the master doesn't know how many slaves are out there. Through the magic of Xerium Qsocus, we don't even have to have any slaves out there. It'll just, the message will sit in a queue waiting until a slave is connected and rejoins and sends it to it. But with that being said, as they're, like say we had a bunch of things to do, we had the ability to send a list of things or send an array of messages and have those work in parallel. So if we send a list of 10, we could have 10 concurrent slave processes pick up and send their information back. And we just waited on the thing. So what quickly as we started developing with this and it was working, we hit a problem where we would have them job. So our web server was our master typically and we'd say for example in our case we wanted to process a repository. And the first thing we needed to do is get a list of commits. So we would have our, we didn't want our web server sitting there cloning to get repository, listing all the commits, building up messages and sending them off and then waiting and going, okay we have 50 messages back and working with it. So the idea was we just wanted to send out a process, process of commit or process of repo message and that whatever slave process happened to pick that up. They would clone repo, build up a list and go here all the commits. And rather than sending that back to the master to do some over big orchestration just have the slave kind of throw out into the world. Here's some commit messages and wait to come back and then return back to the original person who wanted to process this message. So that kind of had some existential things where we're trying to figure out what's the difference between our master and our slave processes at this point? Because we're having essentially we're having slave processes working and elevating themselves to almost a master. So the key was is there was no difference. At the end of the day what we ended up with was having just a collection of workers. So workers are just generic processes out there that sit there listening to jobs. We have the concept of there are some workers that just wanna tell people to do things and don't actually wanna do any of the work. I'm sure we can all think of examples of that such as our web server or I don't know a manager of some sort. But the whole concept of workers across we can spin them up as many as we want and the more we add and the more they join they just work together. So this is where my slides get interesting. So we made a module called farm. It's open source you can check it on bithound GitHub if you have the interwebs. So we kind of broke it down into these kind of concurrency primitives of how we wanted to work with our data. So the first thing was that standard kind of pub. I have a message and I want someone to do it I don't care who. So they'll send that's basically the farm.jobs.send and just give it an object literal and a callback and job is sent out into the compute farm somewhere somehow a process will pick it up and they call a callback and send it back. And then we also had the idea of distribute because we'll have a list of tasks and it's a little bit of difference of saying like I have a list of things that I don't care the order that they're processed in versus I have a single job I want you to do. So we had a separate method called jobs.distribute which we send it a task and we have a function at the end that just gives us the result of all of those tasks. Something that was actually really cool was we because the RMQ added published subscribe type of sockets we were able to add a public sub. So we can go farm.events.publish farm.events.subscribe and this was actually really cool we just added it in because we could but because we ended up with a lot of processes out there having a global pub sub that works exactly like event midder that you're used to in Node was actually really powerful and we started routing really neat things through there just kind of maintenance tasks saying clean up this directory or sending out stats so all of our metrics of like overall health of our cluster are sent via pub sub events. So if anything look at the repo just for having that if you have any clustered processes and you kind of want to have some way to send events back and forth or consume events or just have a general overall monitor you can do some pretty cool things. And oh goodness. So there we are. And then lastly that kind of opt in concept of worker all we have all we do is either farm.worker pass a callback and that callback just takes a task object and we give you a callback that you call with error result. The whole concept of worker the worker doesn't need to care if this job was sent to it via explicit send or if it was distributed amongst a batch all the worker cares about is there's a task it has to do an object it needs to work on and a callback that it can call when it's done. So I want to tell you a little bit about launch day. So we built this out we had a closed launch of like 50 to 100 people and it didn't go that well. We learned a lot and one of the core things we learned was what works on one machine well doesn't really work as well when you're running it on 200 machines. So our developer environment typically around that time was OSX we had our web app running on our server we had a broker process kind of running there and we started up worker processes for every core we had on our laptop or even doubling up some but it was all contained on our laptop but in production we had hundreds of machines that didn't talk to each other really well we had our collection of web apps we had a couple brokers and we had hundreds of machines there some are physically separated from each other some are running in virtual machines but imagine hundreds of machines that are not talking to each other hundred machines that all tried to clone a repo at the same time and if you ever want to get rate limited by GitHub that's a good way to do it and because as we were developing this on one machine all of our sharing assumptions were wrong we were just kind of working this out and realizing that someone could get a clone repo task but they're not necessarily guaranteed to ever get a commit task or someone who got a commit task there wasn't one that got a clone repo task because you scurreled out so we realized quickly that we needed to if we're going to develop in a distributed environment we should develop in a distributed environment so we use a tool called Vagrant it's a whole other talk that I'll talk later over beers if you want to talk about Vagrant but Vagrant allows you to have kind of disposable VMs and allows you to kind of fire up VMs and clone them real quickly and this allowed us to basically build a miniature prod with a single script so we can kind of do Vagrant up and it provisions three, four VMs for us set up in a production-like environment and the beauty of this was is it ended up using the exact same shell scripts and configuration that we used in production so as we would fire up a new physical machine or a new VM to run in production it's using the same scripts as if I'm firing up a VM for development and it really, really helped us and it really allows us to test our production scripts test our development scripts and really got our developers working exactly like how it would in production so those things you kind of forget as you're working like, oh, this is going to run across a bunch of machines it allows you to like physically experience that and something I learned from my DevOps guy that was a quote that I kind of got this was the concept of pets or cattle and he would refer to it as you have like your machine, you have your laptop and you can kind of say it's your pet we love our laptops, put stickers on them we name them cool things we have all our bookmarks and our favorite stuff but our VMs or production environments are named like Northwest Server 002-Blade Enclosure 1 and the whole concept of the VMs or the production environments is that I don't really care if it breaks, I'll get another one like you all have worked in Amazon, web services you just spin up a new instance if it starts behaving and we actually use that kind of pets or cattle philosophy overall in our architecture that everything from our services, our computers, our jobs, the whole concept there's an error destroy the environment, build it up again and have everything just kind of retry again because failure is always an option you should expect things to fail if something fails, try it again schedule it to run later if it keeps failing you can look into it it was beautiful as someone who was doing DevOps kind of part time developer came up and said yeah it stopped like working on my laptop I can say destroy your VMs and recreate them does it still work, does it still break? no it works now, okay I don't care or if there was an instance somewhere out there some Amazon instances kind of misbehaving like it's taking a little longer or it airs out every once in a while destroy it, recreate it, start again and the whole concept of their cattle they're just, they're useful chunks of stuff so with that being said as we're building this stuff we ended up building almost like this giant Rube Goldberg machine that is using kind of the UNIX philosophy of we developed a series of tools that could be changed together and run atomically so rather than having this giant start orchestration we had a bunch of surgical tools we could run where we're like pick jobs up halfway through or try to retry this one section or spin up a bunch more processes to process this extra thing that was feeling a little slow and once we had this Rube Goldberg machine something that really was interesting was that we realized that we were part of this overall machine we were the one, we were a cog in the wheel so that like if a job kind of failed you wanna be able to get your tendrils in there to restart it, to tweak things to like really allow yourself to embrace failure to embrace that having a lot of code running over a lot of machines and a lot of latent networks you wanna be able to reach in and push things along so why node? At the end of the day, why node? Why JavaScript? And we actually found a few kind of wins with it was that the async IO, the concept of async IO and the kind of simple concurrency primitives of there is no concurrency primitives everything's just events really helped us we didn't have to worry about those really complicated re-entrant code or states or we just kind of coded around the idea that we have one thread process will only be doing one thing at a time also it was an amazing language to glue a wide array of tools and services together so it almost felt like a little pearl or a little like yeah shell like where we were able to read in from standard in write the standard out, interact with shell there was a section of a code that feels a little slow running in JavaScript let's just pipe it down to a cat command and pipe it to Auk which is really good at stream processing but it gave us a really good glue tool that allowed us overall with all the architecture of like message, passion and concurrency to really shell down and do something in a native component or in a shell or a bash script so basically lastly that's about it thanks for bearing with my technology I'm questions probably not the best you're probably gonna ask a question and say cloud I'll hear but my hearing is really bad if you have a question try to find me after I'm totally willing to work I brought some t-shirts I can probably hand out but I'll let you all go home or listen to the closing remarks cheers