 Welcome to my talk implementing binary protocols with elixir I hope you're curious about this. I got curious about binary protocols like Year and a half ago, but we'll talk about this in a second I'm pretty stoked to be here as you might guess coming over from Germany is quite a trip It's not my first time here though. I've been here 26 years ago. You can judge by my awesome like clothes But anyway to in order to to start the talk and to make it a real story We have to take a little detour and then we'll like It might take the first 10 minutes of the talk and then we are coming to the you know to the beef But let me talk about the web first. I myself. I'm a web developer since I started developing software and What I felt is that the web is screwed right? since the beginning it You felt this even more and more so I think it's because it's completely outgrown. It's intent, right? So Tim, honestly, imagine the web like 25 years ago with the idea that research institutes can share their results In a document way. His idea was even that the web browser is kind of an editor all the stuff did not happen Now we have Tinder He might not have a thing of that it's slow right because of multiple reasons and If you take a look at a average website two years ago, it had 101 assets which all over had 2.6 megabytes shared over 38 connections and especially the connections Made this website slow because you have it's dcp The other way around to HTTP is running over TCP and TCP you have this Handshake process which is ping back and forth at least three times and if you're using an encrypted connections even more the inventors and the people in charge of the protocol namely the IFT knew about this a long ago like almost 10 years now and Like three years ago. They started working on fixing it So they invented or they came up with the idea of the next version of the HIV protocol namely HTTP 2 So first of all, let me tell you a little bit about HIV 2 It's compatible So it's completely not breaking the web the scheme will stay so we never the user will never see an HIV 2 somewhere in the Urella some something all the semantics we have headers methods and the request response cycle that will stay and Here's the thing how they fix the connection problem. They multiplex single TCP connections So you can transfer multiple resources over the same given TCP connection you can think about this Like that so we have one physical TCP connection and then on this connection. They are logical streams and On this connection there are frames so each frame belongs to a logical stream technically, it's only frames being sent over and Now I want to talk about this frames and these frames are binary. So here you see the connection Where where we're going now? So this frames in the h2 protocol are binary frames and They're binary for a good reason because they want to save save some space on the connection They want to make it fast. So the idea of the hv2 protocol was to be really fast and Here we have an example of the of a typical frame layout so every binary protocol usually follows frame layout Here you see the one from hv2 it starts with a length Every frame layout or every binary data usually have the length in there because you should know where it should stop parsing Especially if it's a protocol like an effort protocol It might be not the same for Files on the disk They have a type We have flags in here The type defines the set of flags then there's the so-called reserve bit. It's always zero. So you should not bother with that It's Just zero and the stream identifier. This is what I just talked about that the frames belong to a logical to logical thing Also frame layouts are always in this 8-bit chunks. So in the binary chunks and This is just the header So there's also the body following the 10 different frame types and I want to take a look at one specific one Which is the header. So you all probably know HTTP headers, right? It's meta data information per request and it has the type 1 because it usually initiates a request and This is the frame layout of the header frame. You only see the body here. So there's also on top of that There's the header frame format or layout you saw in the previous slides Again, it has a pad length so this specifies the length of the padding so that every parser could know What is the header block fragment once the padding the padding is a security mechanism. It's Against attacks called breach and crime. I'm not that much into security stuff So this is all I really want to tell you about that so if you if you're curious about that just Google breach and crime and Then there's also the exclusive bit a string dependency away That's not our main focus here. You could fill up a complete other talk about that and then there's header block fragment and here we are going Even even closer to binary protocols because not only the hve protocol is binary the h pack Or the header block fragment is also a binary protocol. So if we fill up this was like real fake data, I'd say This is This is a frame and this is should be valid at least judging from the implementation. I have And I want to talk about this block here so this is the so-called header block fragment and All headers transferred on the protocol are encoded like that now I Want a little bit talk about my motivation on this talk So I fumbled around with the hve do protocol for quite some time I found it pretty interesting. It solves a lot of problems. We have nowadays It solves kind of the it kind of the same it kind of solves the same problems all the weird asset pipelines Are trying to solve but on the right layer and I got interested in that so I was tinkering around the protocols and I stumbled upon upon the HPEC RFC so the HPEC RFC is The RC where the whole header Block fragment parsing is described and it is it is decoupled from the RFC where hve2 is described for the same reason You would decouple in software right so you can update independently and It could be used in other places So if you have a f the use case of having headers and you want to compress them and you're not running on the HP protocol You could also reference to that, but there a few There a few points they really point to the direction that it's used in HP and at the moment It is the only use case But it's not really tight closely tied to that Right, and also I might or might not write my own web server and elixir But it's definitely not something we are here to talk about Well, how does HPEC work? HPEC works with so-called header compression header compression table So this is to give you some context about the elixir code. We will see in a minute so again, the HPEC protocol works with a header compression table and This is the header compression table. It consists of two parts. There's a static part and dynamic part But it is one table. So the index continues to run, right? And if I have a given header like the method get here the encoded version will look like this So you could encode the header method get just by an index in the header and the header compression table same for the methods for the scheme and the host is also encoded differently here and then there's the pass for example, and if there's a Value which is not in the static or dynamic table. You just encode the string with Huffman encoding I won't talk about Huffman encoding in this talk But it's I think it's an interesting encoding and if you want to know more about this Just check it out. It's not I mean you could learn it in like half a day or a few hours That's an awesome YouTube video explaining it. I have linked at the end of my slides I will upload them later. So I really recommend checking them out and It is encoded like that because you can think that the method get is something you would see on on the whole internet Quite a lot, right? This is probably the most sent header on the most sent thing ever right because it's retrieving information from any web page and This is why you can have the key and the value in the header compression table And then there might be headers where the header is like always in a request like path, right? There's always a path and request, but you can't have or mostly the value differs And this is why you have it De-coupled here and you can have entries in the header compression table where only the key is specified and the value Could be dynamic as I mentioned it's a binary protocol as well So it also has this binary format and I want to look at this now So this is an x header think of method gap And I really want to I really want you to pay attention to the first bits in the first line When we are taking a look at all the different formats now because they all follow the same scheme So if you're receiving an index header the first bit would set to one and then there's the index the headers actually referring to Then they are literally header fields with incremental indexing that means that the headers are added to the dynamic table And these starts with a zero in the first bit a one in the In the second bit so with the index one and then there's the index if it's an index name or a zero If it's not an index name same for literally header fields without index But again the first bits differ, right? So the first bits are four zeros and then the index or everything zero if it's a new name and then there's also never indexed Which is for intermediaries on the protocol Then it's three zeros one then the index or zero. So you see every possible format every possible Direction on how to handle a header has a difference Different signature right and this is how you can determine which header block fragment or which header you're actually seeing And this is what we're going to do now I really want to guide you through this library. I wrote on how to parse the HPEG protocol So now that we have the context set Let's take a look at decoding the the header block fragment before we do that. There's one Basic with this which is super crucial and I really want to take this short digression It's binary send strings and basically binaries are strings better a String in elixir is a utf-8 encoded binary I was just checking the schedule of this conference this morning. I saw that there's an entire talk about this So if you want to know more check out that talk We just do this really quick here, but I think with that statement There's enough Content covered to make my point in the next slide. So if we switch over to the console and just firing up IX You could see with the helper method question mark each code point of every character you have So if I do a question mark II I can see that he has the code point one on one You can also do it the other way around you can specify a string by writing a binary so this is a binary construct and If I just write 108 and add the utf-8 modifier and what if I say you usually more for pattern matching You will see I get the string out. So elixir always an IX always displays you a string Even though it is a binary when every binary or every code point in this binary has a representation in the in the table You can also just leave out the modifier as you see here. So it's done now Because the code point is something this playable And if we clear up this terminal now just fill up string Jose And that's another trick if you want to see each code point of the of a string just append a non-displayable character The zero for example, and then you would see each code point in here And this obviously also works for emojis and whatever Well switch back. Now. Let's take a look at the actual HPEG module so all the code you're Okay, I better stand here So all the code you're going to see now is inside this HPEG module. There's also two supporting modules We're not taking a look at this is the table implementation and the Huffman implementation But we'll see this in a second. So The HPEG module defines its own type because we are dealing with headers a lot So we're defining our header type and the header is just a tuple of two strings It's key and value. So method get would be two strings methods and get There's the public function decode. So we aren't going to take a look on how decoding works, right? It's a parody of two. This is important later on and Its signature is that you receive the header block fragment, which is a string or binary Which is the same and the pit of the table. So the table is a gen server Implemented it's just a dense server. I'm holding up the state of the dynamic and the static table And again, I don't have the implementation of that but all the code is on GitHub So if you want to see how the table and the Huffman stuff works, just check it out Well, now, let's come to the interesting part. How does the code work? Well, it's parsed Well, no, so we're going to see how parsed works Due to the fact that a header block fragment can hold multiple headers And we're going to call parsed recursively and this is why we are calling parsed with a empty list Because we are going to add the headers. We're just parsing to the list Let's get back to this image As I said, this is a valid header block fragment and I want to pass this header block fragment together with you in the following slides So pay attention to the first few bytes, right? So it's one a bunch of zeros and then a One zero, which should be a two, right? And then there's some other stuff. So if you're calling the path method With the header block fragment and I have multiple Implementations of the path method. This is one which is going to match And if you take away something from this talk and if you came here to know how binary protocol parsing works This is really what I want you to take away because this is basically all it takes And we're using a few elixir patterns here and the first one is you can deconstruct binaries in function headers and then use function header pattern matching All right So I just paste the complete binary as a first argument and then I'm deconstructing the binary and pattern match Against the pattern I know from the protocol and this is why you have the code coming above, right? This is how how the library came to life because it just pasted the The formats into my library and then we're writing function headers and the implementation is not I mean you'll see this in the second But it's not that complex Take take care that the rest variable we have here already is without the first one because we pattern matched out the one right and Then there's empty list and the pit So next thing we have to do is we are parsing out the integer So the index which is an in seven the multiple representation of integers we'll see in a second For now, let's just let's just take this as given that parse in seven gives you with the index And then you just do a look up in the table to see what is the actual header and as I mentioned it's a two and regarding to the Had a had a block table on the h pack table the index to refers to method gap So we know the first header what we do is then we pass again the rest and Then append the header to the list or prepend the header to the list and then calling parse again So I'm next It's the same. It's again the same Header the same function which is matching. So we're going over this a bit faster here Right, you see the header. We just passed out is in the variable now this time. It's index six and Six is scheme HTTP and then we are calling parts again Again, that's the same thing matching. So it's again the index header field You see the headers have now both headers in the list in the like your inverse order So this time is the index four. So you see that could be path Variations which are in the table like slash here. So if you look up the header You've got this one again a pendant and then we're calling parts again And now something different is going to happen because it's not an index header field So what we have now is literally header field with incremental indexing. So this means there's only the index known But the value is a string and again, we're doing the deconstructing panel matching here the rest is Also, but that's there's one thing you might wonder Which is the modifier, which is just an integer. It's the shorthand It's basically syntax trigger for having the size modifier So when you see zero colon colon one, it is basically zero colon colon size one So that's yeah, that's basically it and you can even combine modifiers with adding a dash or minus So what do we do now this time the index the integer is encoded and in six bits or in six plus bits I'm passing that one out. Then we're passing the string and what we're doing We're passing the complete block so the two lines at the bottom We are passing that to the past string method because strings are always encoded like that So the format of strings is always that format and then we are Then we're taking we're looking up the header from the table because we only have an index So we have to know the actual header string, right? So we have to know the key. That's authority and Due to the fact that it's incremental indexing we have to add the header to the list because next time Someone wants to send us the header. It could actually reference just to the index, right and The index and what is in the table and what's not at the table is actually not something which is transferred between client and server So if I'm the client you're the server. I'm sending you something and I say you save it I know you will get ID 52 because the static table if it's the first thing I'm sending you Because I know that the static table is just until 51 and the next thing I send you and say save. I know it's 52 So I know what you have in your table This is how this works Well, and then again, we are calling parse adding pre-pending the header. We just parsed and then calling it Now something again different is happening This is how we terminate Recursive function calls, right? So now the header block fragment is empty. We've passed everything So we have to kind of wrap up So we just reverse the headless because had us have to stay in the order that's mentioned in the RC So just reversing it and then that's it. We have part the headers. This is how had a block fragments gets parsed Never played around with that. You mean a speed differences Yeah, I mean you won't send that much headers, right? I think the header list is if it's if it's a hundred headers, it's immense and 100 is maybe not that big of difference Okay, um, let's wrap up how header parsing works and how this had a block fragment. We have just parsed How it looked like so the first bite was the method get header the next one Was EAM HTTP the next bite was path Slash and the next bite was just the key authority and then all the rest Was the string WWW example.com? So if you want to use your headers efficiently, this is another takeaway from this talk Hopefully always use index headers as much as you could and if you can't do that always add them to the table But usually this should only be an interest if you're writing a website or something because on the otherwise The clients and stuff should handle that for you, right? But maybe I don't know don't have headers which change all the time when it's not necessary That could be a takeaway for you know usual web consumers Okay, then there's two things left while decoding which is parse string and parse integer, right? So let's take a look at how string parsing works because I want to save the most complicated one for the for the letter string parsing is This so you have This format and as I mentioned it's always the same and the first bit indicates whether the string is often encoded or not So it's one or zero and then there's the actual length and then there's a string So how does this work here? We have the rest first thing we do is we pile out the length and this user This is using the same integer The same integer encoding we will see that in a second and then we do a pattern match again And you don't have to have this pattern matching as a function header You can just do this pattern matching as you would do pattern match like in all your other lixar code So we are pattern matching out the actual string value and the rest because you know you never know how much there's left and While we are this slide. I want to talk very briefly about the binary and the bit string Modifiers so the binary modifier basically says it how it always have to be a complete byte right, so it's always a Multiplier of eight the size always have to be multiplayer of eight. Otherwise, it's it's a match error Right and bit string is basically it don't have to be a complete byte. So it could be just you know Seven bits that that is okay for bit string, but that's not okay for binary That's the difference of that Okay, and then we have passed out basically everything and then we're returning we we are returning The string value and the rest. Oh, yeah, and this is how Huffman encoding looks. So it's basically the same code It's basically completely the same code unless The function header is different. So again, I've defined past string two times first time the function header says zero second time function header says one in the first bit and then Before we return the string value. We actually call Huffman decode again If you're interested in how and how in how Huffman decoding works And you can check out the library on GitHub if you know how many coding and are wondering about the Table the decoding table the decoding table is specified in the protocol. This is something you don't have to transfer It's it's given. Yeah, so if you check out the library, you will see like I think it's a 200 Yeah, it's a 250 250 lines of you know, just table definition All right, now let's come to the I think most complicated part and how integers are encoded So let's recall this is how the format looks like For the index header field and for the index header field the index or the size you have to transfer the index of seven bits And seven bits is is quite it's quite good. So you can encode Quite big numbers in there. So for this example it might work But in the protocol you can also find integers encoded in four bits And this could be not enough if you want to say here's my string WWW my super long URL comm this might not be enough to encode this in four bits, right? So this is why it is seven plus or four plus because it basically says The integer you're going to pass now is seven or seven plus unlimited however much space it needed so it's basically unlimited integer size and I wanted to talk about how this works now Here is little CS one-on-one how you how you know which decimal is encoded in a binary Well, you assign the values one two four eight sixteen thirty two forty six and so on to the values and only set the ones With a bit of set starting on the right side and then you just do a sum and you know this binary would be 42 Right, but we have computers. They do this stuff pretty well Just to gather a little recap here So 42 is a quite a small number so encoding this Just works and this is how it would work in the given header block fragment, right? You would just encode the number at seven and that's fair But what if we have a bigger number? Well the bigger number Follows the following pattern First thing you do if the number does not fit into the string or into the space you have given In our example, it's five so think of in five plus Right, you set all bits to one and this indicates the par the that the integer is bigger than the given prefix So here in our example, it's a prefix of 31 and then we have a rest of 1306 so 103 1306 is still bigger 128 where 128 is what we could encode in this Complete a bite or in this complete bite, right? So we have to encode this by doing a modulo or dividing it. So we dividing hundred 1206 by 128 and What's Sorry, what is important important for the parser is that you always Set the most important the most significant bit Which is the bit on the very left in our example has to be set to one So the parser knows the next line is also important for the integer because when the next line is zero It knows the binary ends here So what we're doing now? We have the rest which is a 10 and Then we're encoding the 10 the 10 easily fits into 128 and then we encode the 10 and this is done I think this is very complex. It took me a long to a long time to wrap my head around this I really want to give a huge shout out to the cowboy implementation of this stuff Because the elixir implementation I have is pretty much inspired or maybe copied by that So how does this work? Well, this is I'm parsing for as I mentioned and Here's a pattern match when we are decoding And all the first bits in this rest we got past is one We know that we have to call parts big and with the rest and then we also give The prefix size because we have parts in four parts and six parts and seven and Then we have kind of the fetch all so the first one doesn't match That's one will match and we know it is Smaller than the one we just mentioned so we can just return it right pass it out and then return it and Then we have parts big and and what parts big and does it basically has The first the match where it's zero at the most significant bit So we know that we can stop pausing here So what we do is we parse out the rest And then we have the value and then we do some bit shifting magic here And then that's sad and the same as if the most the most significant bit is a one We know we have to call parts big and again now. Let's talk about encoding and To make my job here easy. I could say encoding is just decoding but reverse It basically is but I want to I want to walk us through this with a little bit more examples here So we have public the public function and code it get past a list So that's where you see the guard function here because we want to make sure it's always a list of headers and never just one header and Then we would just do the encoding and call encoding with a parody of three Because it's again recursive call and we have the header block fragment here And they had a fragment for the starting call is obviously an empty binary and Then I see to see we have the private function encode To encode the headers and this Does a case on the table because we could have three different states And the first one is that the key and the value are known to the table like method get Right, so that's the key and the value are in the table So we know we could just encode the index header It could be a key match so that the key is in there like an authority example, right? So the authority is in there, but the value should be encoded and literally and then there's a Non-match by the case so the value is not the table at all the edge pack library I wrote only have this three cases implemented so there's not the never-indexed stuff That's not going to happen and then not added to the table. It's also not here So we call either one of the methods depending on how we Want to encode the header and then we have the partial header and then we just append this to the header block fragment and We have everything encoded well now. Let's take a look and how this encoding works because this is where the magic happens, right? and this is encoded next and I think I don't know I think this is pretty straightforward, right? So it just said again regarding to the format the first bit to one and then you encode the then you encode the integer in in 7 So up next there's a little header field with incremental indexing So first thing we do if we look up What kind of header we have because we want the we want the header to be added, right? So we added to the table and then we just encode the header. So we said the With that's a scheme basically so the first bit is zero seconds one Then we have the index and then we do encode string as a binary And now also let's take a look at how this primitives are encoded So let's start with rings does encode string and in the HPEG library again There's only the case where strings are half been encoded so There's never the case in this library at least where a string is not half been encoded because I think it's okay to always do it I mean there could be the downside of like speed because it Uses a few more CPU cycles than the other one. I don't know if that's important But regarding to size this either it's equal if the string is kind of short The size would be equal and if the string is longer and has more repeating characters the string will be shorter So I think it's it's good to always do that at least for the start and then how integers are encoded Well, that's the same kind of Bit shifting magic. So let's take a look at encode in six if it's if the end is Smaller we plainly encode it, right? It's not if it's not we encode the big end and Again said the you pass the prefix or set the prefix in the binary here, right? And then there's encode big end and if it's like smaller than 128 which is whatever could fit in the one bit And we just encoding it setting the most insignificant bit to zero or we call and code big end again Awesome. If you've read the abstract and the talk description I also promised to talk a little bit about RFC based tests and I think our C base test sounds like you could fill up a complete talk It's not But yeah, let's take a look. So This is the test I have or the test setup It obviously uses X unit and For set up. We just set up a table. So here you see the first time a little bit of table code at least how it's initialized and Here is a screenshot of the actual RC. So this is the IFT spec and I want to zoom in a little bit and They have really sophisticated examples of how the protocol is functioning. So they have Always the encoded header list that the non encoded headless like the paint header list Then they have a hex them of the encoded data and then they have Example of how the decoding process works and then How the dynamic table look afterwards and they don't have this for one head of the fragment They even have this for subsequent requests So they have the first request looks like this and then we make this request and then the table looks like this and Coding looks like this. So I think this is really this is really awesome And if you just if you're implementing this library Like I did I was just copying over the test because if I have the tests they have the specs passing I can be super sure that my implementation is correct, right because they have such good tests and Here you see one of this RC tests. I really shortened lots of stuff here But this is basically how it works. So this is the test for the little header fields and with the next thing First thing I did is setting up the header block fragment mentioned in the test But as you can see it looks a little bit different than the one in the actual spec, right? Because they have this hex them and it looked yeah look different So what I did it was just pasting it in and then I was going with a lot of multi-line edit and search and replay stuff splitting up the four Digits blocks to two digit blocks then pre-panning a 0x and then copy out the comment Okay, so this is what the test does so it actually passes the header block fragment in the table To the HPEC decode function and then it asserts if the header I just receive actually is this what is specified in the test and I also check the table size because this is also something the specs provide the specs also provide the expected table size but if you think on the or if you remember how the How this back looked like I think it would be awesome if my test could just look like that, right? So what I really would want to have and there is a there's a change that in the actual code and you can see where this happened I want to have That one just in the code, right? So I wanted to have This representation of the hex dump in my test And not like transforming it manually all the time, but if you look at this code It's not valid elixir code, right? It's not it's not going to work. So What you could do with this at this point? We can just come up with our custom signal here to make this work And this is a test helper and as you can see the lines in there are kind of long I want to walk really quick through this Basically what we do here is we split line by line Then we remove the commons and then we split by space into two bite chunks Then we flatten and filled out the junk and then we convert the hacks into actual integers. I am Sure that this is not the optimal solution. I am sure that this works So if you have improvements to that again, it's on get up So just feel free to jump on that and improve that if we jump now to the terminal execute the RC tasks You see even though it's a lot of helper tests I had a lot of helper code elixir is as awesome that this works really fast And it doesn't slow down the test at all Again, the code is all open source and get up So if you're interested in that if you want to poke around this if you I don't know want to work with this a little bit If you want to improve it just jump on that Send some pull requests. I mean, there's not it's not the biggest open source library in the world, right? It's really this Basically, there's nothing going on to be honest There's one pull request There's one guy and I really appreciate that because he's writing an HB client and he found a bug And I don't know if he found a bug in engine X or if he found a bug in my code And we actually in the moment of figuring this out. So if you want to read really long pull requests text if you also invite it This is the thing. I might or might not write if you're interested in how Web server code could look like I'm also I want to invite you to come to me and talk They're really want to give you some final thought about all the stuff. I did another stuff I was doing I think elixir is and also all in as well So this is kind of like elixir and all and could be replaced one by one here in this case in this example And as you see as we have cowboy right cowboy is awesome and how I have a lot of this stuff You can also do this stuff in cowboy But elixir is really great to do that and we got a lot of syntactical sugar over the stuff You could do that and I think pausing out this stuff is pretty awesome with the function with the pattern matching and the function header matching Even though time is running out. I really want to take this Five seconds of fireworks to my for my company Jim knew and because they basically enabled me being here So they basically enabled you listening to this stuff. I did here and Yeah, if you want to chat about how website builder works or I don't know How working in the most beautiful city in Germany is just drop by My name is only Michelle is if you got any feedback to my talk to the stuff I do I do not or I mentioned or I did not mention. Please just follow up on Twitter. This is thing I do the best I do the most I don't have business cards. I don't know why people still have this that tree in their pockets Putters from business cards. So if you want to do that just follow me There's follow-on get up. So if you into that, you can follow me and get out. No one does that I'll check out my own page my side project. I don't know This is really all I got and I want to thank you very much for listening to me You