 All right, this is awesome. I don't know if I knew how well produced this conference was going to be, but it's amazing. All the organizers are doing an awesome job, and I'm super excited to be here. I am another Aaron, but on the internet in most places, I go by AQ. I actually started doing this when I was like 13 years old, because there were a lot of Aaron's. And AQ was original. And actually, recently I found out there's a rapper in Nigeria named AQ. And he has to be the real AQ on Twitter. So something about that makes me really happy. Some of you might know me from my podcast that I host with my good best buddy, Michael Bernstein. It's called Beats Rye Types. It's about music, food, and programming, which are three obsessions. We actually end up talking about a lot of other stuff. We actually had Tenderlove, Aaron, on a couple of weeks ago. And it was a really good show. Also, I'm hosting this event with some friends. I'm also from Kingston, New York. My house was actually burned down. I wasn't living in it, but my house was burned down in 1777 when the British burned Kingston. So I have a tie to those crazy Brits burning my house down. But I'm hosting this cool event that's kind of a retreat. I don't know how many of you will come all the way from Spain to here, but I came all the way here, so come on. I also have been doing, I worked for a long time at startups and doing stuff, and now I'm doing some consulting. So feel free to hit me up about that. Also, I have a friend, and this friend would like to figure out how to smuggle a leg of ham back to the America. If anyone has tips about that, let me know. OK, anyway, on to the actual meat of the talk pun intended. So these days, I think there was a lot of good setup for this, actually. These days, we talk a lot about inter-app communication and how we're communicating between the many different applications that we have. And this can sometimes mean browsers and clients, but can also mean apps within our infrastructure. There has been many, many, many talks over the last couple of years at Ruby Conferences and beyond about microservices and service-oriented architecture and all of this stuff, because we wrote a lot of applications, and they continued to grow. And eventually, we hit a wall and realized that maybe it's hard to work in this way. But actually, this whole microservices trend, I don't know how much weight I want to put into it, services as a whole have been around for a really long time. They might not have to be micro, but services as a whole have been around for a really long time. And there's really good ideas behind services in general. And I personally am not going to tell you to go out there and split every single piece of your code into completely different microservices and running on Docker. But I think extracting parts of our application into individual other applications and having those service boundaries, no matter what side of this debate you fall on, is a good practice. Like, I think we can all agree on that. So I'm going to talk a little bit. Because now we're splitting things up and talking about how we're going to do this, there has to be a lot of talk about how we do communication between these different applications. So I want to introduce some terms so that when I get into this conversation, we can talk about, no, we're talking about the same thing. So in this case, I'm going to be talking about protocols. And when I talk about protocols, I mean the transportation mechanism for delivering information from one application to another. So in this case, we're talking about two applications. There may be two Ruby applications. They might be in two different languages. But the protocol is the means of communicating back and forth. Then we talk about format. So in this case, I'm really talking about serialization format. I'm talking about how we take that data and package it down into something that can be understood by both sides. And this isn't the means of the communication, but it's the means of how we're structuring the data between these. And usually when we talk about these two things, we phrase it in format over protocol. So this means it's the shape of the data and then how it's delivered. So usually the most common one that we've fallen into in the Rails and Ruby communities is JSON over HTTP. And this works, but it came from an evolution of different data formats and protocols. And I want to talk a little bit about that. And that works for the browser-client relationship, but it might not work for our browser-server relationship, but it might not work for as well if you have two completely backend applications that are communicating to each other. So before the internet, I just love this photograph, so I had to show it. So this is a woman and I couldn't find her name, but she, this is like the original, all of the code for an original military protocol is in those punch cards. I think it was five megabytes of data. So this is what five megabytes of data looked like in 1956. And so the format was these punch cards and the protocol was sneaker net. It was literally walking these punch cards from one machine to another. Then we moved on. Obviously there's a lot of stuff in between here, but I'm moving quickly. We moved to XML over HTTP. This is in the 90s if you worked on the internet and you worked in applications. Chances are you wrote some XML. And I mean beyond HTML, you wrote some kind of XML, whether it was soap formatted or something, you worked in it. We can all cry on each other's shoulders later at the party about all those years we spent doing that. This was also the best, most awesome image I could find for internet superhighway. How awesome is that? It's so cool. So around now, now we move to JSON over HTTP. Like this is kind of the format. I also, not only as a rapper in Nigeria named AQ, there's a Christian rapper in Iowa named, or in St. Louis named Jason. Who knew? So if you Google Jason, you'll find this guy. I'm into hip hop music, but I couldn't get into his music. I'm sorry, but I do like, I love this one. So for some reason, whatever it is, we started, we moved on to doing JSON over HTTP for pretty much everything. This became the de facto standard for a lot of what we do and we kind of defaulted to that and a lot of applications, you know, I don't even have to have a show of hands because I guarantee everyone's gonna raise their hand, have written a JSON service that communicates over the HTTP protocol. And, you know, there are some reasons for this, but we, it's been a while I think in the Ruby community, especially, since we reevaluated if this was really the best, ultimate best thing we could possibly do. The default response of why we chose it is because it's easy, it's a lot easier than what came before it. But if we break it down, there's some more details about that. One, a lot of people say, a big one that people say, is it's human readable and writeable? Well, certain humans can read it and write it, probably a lot of people in this room, but if you actually didn't have your, like, JSON display formatter for your browser, is it really human readable? I don't know. I mean, it's English or Spanish, it's words, it's not binary, but it's still not exactly the easiest thing to parse. A really big thing is, especially in the past couple years, there are literally every language now, if you have writing a new language, it has to ship with a JSON library for it. It just, it has to be. And literally every language uses it, so if you're communicating over multiple languages, maybe this is a good idea. It's pretty fast to parse and read because it's plain text. It does have a few explicit types, which is nice. There's strings, you can have true and false liens, you can have actual numbers and floats. And probably the main thing when everyone jumped here, wasn't just because JavaScript was becoming popular, that was part of it, but a lot of it was just because compared to our recent forays into writing and reading XML, this just seemed like this breath of fresh air. Everything was so small and compact and easy to use. But the problem, probably the biggest problem with JSON is it's easy at first. And this is kind of a parallel to the problem of writing a Rails application and putting all of your code in a Rails application is easy at first because they always start small. So your typical JSON, you have your ID and your title. Okay, you know, you're communicating with your product team. Now we have to add an author. Okay, now we have to nest the author because we need a bunch of more information and we wanna nest some favorites in there and then, okay, and now the favorites also have to have users nested in them. Everyone has seen this story. If you've written a Rails application, you've done this pretty much. And they call that, you know, there's this phrase bag of mud, but I like to call that the bag of glass because code can be a bag of mud, but this kind of what Aaron was saying too, just before this about the end of hash, when you just stick a bunch of stuff in this hash without a lot of thinking about it and as the product requirements evolve, every time you add something and you wanna reach in and get it, you're gonna get cut. That's just what's gonna happen. So that's why I call it the bag of glass. And what we actually want here, if we like step back and think about it, is we actually want something that works well in many languages. We want something that's relatively fast. We want something that's compact, especially if we're doing interact communication over just the network. We want things on the network to be relatively small. That's probably gonna be one of our primary concerns. In an ideal world, it would be awesome if we could have something that actually had types like beyond just strings and numbers. And most importantly, I think from my perspective, as I developed this app and worked on a bunch of apps over time, I wanted something that actually handled change and thought about change in an interesting manner, like or at least thought about it at all. I don't think Douglas Crockford, who kind of coined Jason and wrote the specification, ever thought about how we would use it the way people use it today. So what if I told you that this already exists, that there is a format that does this already and it's from Google and it's called Protocol Buffers. So Brian already said that all great things come out of Google and it's true. This is pretty amazing. I think a lot of people are scared about Protocol Buffers because the name sounds really scary. That's one of the reasons. I'd say probably 50% of people, also the documentation is pretty horrible. Google is great at writing code. They're pretty good at writing documentation, but at least for Ruby, it's a big leap to get into. How many people have heard or used of Protocol Buffers? Okay, so a couple people, but most of the audience, which is great, perfect. So the rest of you can ignore the rest of the talk. So there are a couple of people here who probably actually work at Google and could probably tell me, tell you guys better than myself since I have never worked inside of Google, but Protocol Buffers came out of Google because of these explicit reasons that I was just mentioning. They wanted a format that handled change well and Google is built on this big platform of services. I won't say microservices, but services. There are lots of different applications talking to each other and they're all doing some form of RPC and they needed a way to handle the change between these services because not only were these different apps, but there are usually different teams working on those apps and they worked in different ways. So they created Protocol Buffers and there's some really good talk on their site about why because they basically wanted to get rid of all the if statements in their code that said, if version equals one do this, if version equals two do this. And I'll show you how Protocol Buffers handled that in a second. But basically it's become, they say it's the lingua franca inside of Google for data formats and there are something like 48,000 different Protocol Buffer message types defined in the Google code tree, which is pretty insane. Recently, and I should say this isn't just for inter app communication inside of applications too, Google and other developers are using this. I've been working a lot recently inside of the Chromium project and Chrome itself. And inside of Chrome, when you're booting up that browser, each tab in that browser boots up a separate process on your machine. And if you've ever looked at Task Manager or something like that, you've probably noticed that. And the way Chromium and Chrome handles those messages between the different processes in your Chrome application is serializing messages into Protocol Buffers. So that's pretty cool. You're already using Protocol Buffers today even if you didn't know it. So what is actually a Protocol Buffer? What does it mean? How does it work? What does it look like? So the general idea is a little different than what you're kind of used to with JSON. But the end result is pretty much the same. Instead of defining or not defining at all how you parse or read formats because JSON could just be a bucket of data, you have to define a schema, basically, for that data a lot like the way you would define a schema for a database table. So the schema is this format, a unique format from Google called the dot proto format. And the dot proto format looks like this. You define these different types. And those types have different attributes or keys. And those keys have their own types, like defined types, just like I was saying, like there's integers, there's strings. And then what you do is you actually run some code that generates the code for the language that you wanna communicate in. So you don't actually even have to write the part reading and parsing code or encoding code. You run this Prodoxy thing with a Ruby out or in my example or where I'm gonna be talking about the Go programming language because that's where I use this primarily. You get these pb.rb files for each of those different types. And in Go you get this one app.pb.go, which are actually, in Ruby, a lot of this, the packages do a lot of this through metaprogramming and defining methods at runtime. But in Go, obviously, it's a compiled language. You have to actually define all the getters and setters for all these different attributes on types. But you don't actually have to do it by hand. You just write this format and the generator does it for you. And we should all be used to generators. We use generators all the time. If you're a Rails developer, this isn't that scary. I've heard people complain about it because, oh, I don't wanna use generated code, but you're already using, if you do Rails new, you're using generated code. So these are two examples. So the top is what it looks like in Ruby to encode and decode and a protocol buffer for this image type and on the bottom is what it looks like in Go. And you'll notice that the names of things are a little different in Go. It's marshaling and in Ruby it's encode, decode, but literally it's doing the exact same thing. Those two examples are actually doing the exact same thing and you can see that they're doing it on either side of this, potentially on either side of a protocol that we're sending this information back and forth on. The only difference really in Go is you also have to do a lot of explicit error handling, which is nice too. So just to go through what the proto format does a little more in detail, you have these explicit types. You have, this is actually the key thing that I didn't mention and we'll talk about now, but the way backwards compatibility and the reason it works in protocol buffers, it's the simplest thing that could possibly work is how the way Google describes it. They came up with this idea that each attribute gets this ID tag or they call it a tag or a field num. There's a lot of different names for it but basically it's just this number on the end and that number, the only rule is that number has to be unique and there are like 10 numbers that you can't use in the 10,000 range. It has to be unique and if you remove a field you can't reuse that number. That's it and you can't, so if you think about that it's a really simple way to make backwards compatibility work because basically I can continue to add fields in here and if I regenerate my Ruby side but not my Go side basically what will happen is Ruby will know about those fields but Go won't. They just won't know about it and it'll just throw it out because it basically parses down to these ID numbers as opposed to actual names. And then on the other side if I remove fields like let's say I removed the file name requirement from image, basically what would happen if someone sent a field that was numbered seven and was a file name it would, on the other side didn't know about it or didn't have that it would just throw it out and that's it and you just keep moving forward and evolving this format without having to update both sides at the same time which is really, really powerful when you're writing real apps. You can also have repeated fields and another really important thing when we were talking about that JSON that had this deep structure you can do deeply nested types in these protocol buffer formats too so you can see here request inside of it there's an image type and I'm sending that image but image in and of itself is a type. And finally you can say things are optional or acquired though this is gonna change a little bit in future versions of protocol buffers. So I did a little benchmark of the different serialization formats and the different libraries. I didn't do all of them these were just some common ones and how long it took to encode and decode. You don't have to really look and know all these numbers but if you're interested in looking at the actual source of this it's on GitHub. But the interesting thing here is you can see that encoding and decoding for protocol buffers are a little slower in Ruby and the reason for this or one of the main reasons for this is actually I kind of had to fake it here because if I didn't actually turn the Yajel or JSON stuff into actual objects it was extremely fast but that's kind of a lie because we're not just gonna pass around hashes everywhere or maybe we would but that would be crazy. So the big thing is that I didn't kind of mention here is that when I have an image in Ruby that image is an image object. It's not a hash of values. It's an image object with methods and attributes that match my thing. So I can actually pass around this object. I can add additional methods to it, whatever I want to do because it's Ruby's same thing in the go side. So it's interesting. The only thing to mention here is that it is a little bit slower on the encoding and decoding and that's because Ruby's handling of binary I think is the primary reason that it's not as fast or great as other languages and that's kind of BS because Yajel is C so it's really just C doing it but the most important thing here is if you look up at the top and Beefcake is a Ruby library for protocol buffers and so is ProtoBuff, they're two separate libraries. The encoded type, this is a really simple hash of like six values. The encoded type is half the size of the JSON one. So maybe it takes a little longer to encode and decode but if you're doing a lot of network IO and sending messages back and forth maybe actually the concern is how tightly it can pack on the wire. So there's some clear wins. A big one is there's a centralized schema and a lot of people are gonna say there's some cool projects in JSON to do JSON schemas like there's one called JSON schema and there are a bunch of other ones that have emerged and they're good but there's no, it's hard to force it basically. This is an explicit schema, there's no getting around it. Like I said, you're not just serializing into hashes, you're serializing into objects which is a big thing. Additionally, it handles these adding and removing really, really cleanly which is really good and it has these real types like we're converting into images and there's a field on the image can be an integer, it can be a float, it can be a bool, it can be all these things and especially when you're dealing in a compiled language, a strongly typed compiled language like Go or basically any other non-object oriented language that you might convert back and forth with, you're gonna want to have real types. It's really annoying to have to do the encoding, decoding every time if you're doing JSON in those languages. And for me, one of the strongest things that I was really excited about is it's really compact on the wire, like it's really small. There are some downsides. We were talking about human readability. Being coded output is binary, so you can't read it. Does that really matter? I don't know. You're gonna be reading and writing all this stuff anyway and like Aaron was saying, you're gonna be moving to HTTP too and things are not necessarily gonna be human readable anyway without dissecting and decoding stuff anyway. Ruby encoding and decoding is not as fast but maybe we can fix this and the Ruby libraries for this are not the best though I'll talk about a new thing in a second. So what about the protocol? So we could use HTTP. You can just stuff them in the body. You can do this between apps, use HTTP as a protocol and talk call a day. I didn't like this because I was condensing this entire thing of request response into this tiny thing of the only things that I needed. I didn't want to write HTTP and have all this extra weight on the wire. So I invented my own protocol, yay. It was basically the easiest thing I could think of. It's not actually my own protocol. I didn't invent it. Other people have done this exact thing for many, many years but the format is really simple. You have a client and server. Oh, and I should say that's pronounced T-C-P-Z, not T-C-P-S. There's a client and server. You're talking back and forth and the way it works is you have a header which just is a Uint and it has the number of bytes of the body and that's it. And then you read that one byte header of the number of bytes of the body and then you congest that many bytes from the stream and the response is exactly the same way. And the body can be anything like T-C-P-Z doesn't actually care what you're putting in there but in the case of the applications that we're actually working in, what we did was we wrote, we used protocol buffers as the actual serialization format. So this would be protobufs over T-C-P-Z. T-C-P-Z is really simple. There's a client and a server. It doesn't do multiplexing really. And so far I only wrote a Go server and a Ruby client. The cool thing about the way T-C-P-Z works is that we stress that you don't write your separate load balancer. The client is actually the load balancer between all of these different services. Request response encoding are not part of the protocol. It's just a string or it's just a, not a string, it's a, in Ruby it's a string but it's bytes. It's just a byte stream. And each T-C-P project defines its own request response. But I wanted to build a bunch of stuff into this because we were using this over and over again. So if I could make a little format request response thing, I could add all the stuff that I actually wanted for every application that I was building. So there's good logging, stat-c is built in, protocol buffers and then pipeline requests you are just built in. And the pipeline request thing, I had a slide for it before but I thought I was gonna run out of time so I didn't include it. But the pipeline request thing is really cool and also following the Google thing, it's the simplest thing I could possibly get to work. You just say negative the number of messages and then you read that many messages in the T-C-P format. That's it. It's pretty interesting. And it works. So I was the CTO of Pibrose Post for six years and we worked on this there and we wrote multiple apps using this and have sent hundreds of millions of messages through this T-C-P-Z protocol over the past couple of years and it just works. We haven't had any problems with that. We've had lots of problems with lots of other things that I'm happy to tell you about those but we never had any problems with it. A quick aside about why I was trying to design my own protocol and what I was interested in, another every good thing comes out of Google, I don't know if there's a Papers We Love Barcelona. Is there, does anyone know? Has that made it over here yet? So there's this really cool meetup format that was started in New York where people get up and present different academic papers and take that academic paper and then present it to a group at a meetup just like a Ruby meetup or something like that and it's called Papers We Love and there's a bunch of branches all around New York or all around the States. So in that spirit, I've been reading and for the past couple of years reading a lot of academic papers because there's a lot of really interesting stuff there. If you've never read an academic paper, it might be seem daunting but I highly, highly stress that you should just try it because there are a lot of them that are really, really fascinating and there's a lot of stuff that you think, oh, I'm gonna invent this thing and no one's ever done it before. Someone's done it before. That's my lesson to you. If you take one thing away from this, someone's done it before and probably in academia and probably has written a paper about it and found all the problems with it so you don't have to. So one of the most interesting ones that I love reading are ones that come from giant production systems. I've never really worked in a gigantic production system. I've worked in large ones and Google's is larger than anyone else's and so they've written a lot of really interesting papers and the dapper paper is about how you build telemetry and introspectibility into these nested request response formats which is really exciting and you should read it and a lot of people have implemented ideas from this. Twitter has Zipgin and there are a bunch of other ones too. Similarly, to bouncing off of Aaron's previous talk too, Google has been working on this new application framework called GRPC which is basically just protocol buffers over HDB2 and takes advantage of the streaming and everything like that. It's really interesting. It's not quite there yet. Basically, the only server format for it right now is in Java. So you have to run a Java server which I'm sure none of you actually want to do but you can write clients in lots of languages and like I was saying before there are some new developments. Basically, Google has now written a Ruby client, protocol buffer client and library that they haven't really publicly announced or talked about but it's part of this GRPC project and I'm hoping that that will become the new de facto and be faster. Like I said, GRPC is a layer above protocol buffers that's protocol buffers on HDB2 but it's not just protocol buffers on HDB2. It defines a specific way of doing request response in RPC. It also relies on the new version of protocol buffers which is still in beta called Proto3 which has some really cool new features. It's cool that the format is evolving. JSON is not evolving ever like Douglas Crockford said basically. This is it, it's the simple thing. So it's cool that the serialization format of protocol buffers is actually evolving and there's some new things like reserve tags and compaction and stuff like that. So I'm about to wrap up and the moral of the story is don't accept, if you take one thing away, just think about other things that are possible before accepting the default and this doesn't mean just grabbing at new technology but it does mean that one, a lot of other communities out there have done really awesome work in web application and application development and we should be taking and stealing from them and not just facing inward. You should always check out what alternatives there are. Even if you don't use them, you can still use JSON over HTTP. I won't be mad but it's just really cool because there are other ideas out there and that's how Rails and all of our, the greatest things in our community got started was with people looking outward to other communities and basically grabbing ideas and pulling them into ours. So thank you, I'm AQ, thanks for your time. I have a question. So you compare the sizes of the formats on the benchmarking side. And my question is whether the non-binary formats were compressed or not? His question is whether the long binary formats were compressed, they won't compress by default but because the end result is binary, you can just GZIP protocol buffers and that just works and it works really well. But in Proto 3 actually, they're gonna start compacting things by default but that's not part of the format now. So is there an established mechanism to deprecate fields within the format? You said it handles change. So the way to do it is you just remove the, the current way is you just remove the field from the Proto and then you just never use that number again. But yeah, in Proto 3 now not only can you remove it but there's this reserved tags thing where you just add that number to this list of reserved numbers at the top and don't have to define the field. So like let's say I remove ID8 which was file name, then I would just put that at the reserved tags on the top of that type and then no one could ever use it again and it would error out if they tried. Okay, we've got one question up in the balcony. Yeah, I was kind of wondering the same thing because I may have a field like foo with an ID of eight. I remove that one and like two months later somebody else gets in and adds like bar with an ID of eight that would like completely mess up everything. Yeah, yeah. So why, I mean maybe it's like for keeping the size smaller but why not like hashing the value of the key or something and using that as an ID? I think it was just for simplicity. But you can also, I mean the way we've done it as a best practice is just like we've, before the reserved tags thing we just added, we basically had a comment in the Proto file that commented out fields. You don't actually have to remove them, you can just comment them out and they won't get generated but people know that that was a reserved number. I think we've got time for two more questions. We have one down here in front. Hi. Did you compare ProtoBuff to other JSON variations like Bayzone, it's a binary JSON. They added a few more types I think and it's a lot less in size. Did you? I should add that to the benchmarks. I haven't added Besan but it is about message pack too. I was in there and message pack is, I think message pack and Besan are about comparable in terms of their size. Besan might be a little smaller depending on what types you have but it's still not as small as protocol buffers because there's no schema basically. Hi, great talk. Just to play devil's advocate here, you're describing a format where you specify schema and then you generate code and you work with objects over any protocol including sockets and HTTP. Isn't that soap? Yes, I mean it is something like that but the big difference between I guess soap and GRPC or the new formats are just basically the ease in which you're able to go between languages and how, what the support is and a lot of these things like backwards compatibility but yeah, Google people and people in the community have thought about yes, this is RPC, RPC systems are always gonna give people the Twitch that they were going back to the early days but I think they've solved a lot of the problems that people had with those earlier formats. Thank you. Great, thanks everyone for your questions and now let's go get copies.