 We are a meetup group for people who design and build APIs. Thank you for coming. So the topics we usually discuss during our meetups include API design and best practices, the API development process, from certification to development, testing, and so on. The business of APIs, the future direction of technologies that's related to APIs. And yeah, that's it. This is our second meetup. We had our first meetup last year. And I think that was pretty awesome. And I hope this will be pretty cool as well. Today we have an interesting set of topics by three speakers. First is Joss, who will be sharing on Beyond JSON, exploring different serialization formats. Then we have Gao Xiang, who will be sharing on API gateways and microservices. Then we have Pei Song, who will be sharing about APIs with the main driven design. So without further ado, let's start with our first speaker. I'm the first speaker. So hi, everyone. I'm Joss. I'm a software engineer working at a company called BandLab. We're a music startup. And today I'd like to share with you Beyond JSON, or fantastic serialization formats and where to find them. And so over the course of preparing for this talk, I realized that serialization sounds awfully similar to serialization. Do you see that? Yes. So I'll be using a lot of serial related examples for this talk. So for the benefit of those of us who might not be familiar with APIs, I'm just going to start with some introduction about what the web API is and what exactly we're talking about. So first of all, a web API is basically like a website for your software. Let software communicate with each other over the network or the internet. And my talk will be covering about alternative serialization formats. And serialization is a key step in this software to software communication process. It answers the question of what medium does the system communicate with? What language does the systems communicate with? So examples of web APIs. I think all of us are familiar with logging with Facebook. Logging with Facebook is essentially calling Facebook's API, Facebook's website, to communicate with Facebook's underlying authentication system. Other APIs include third party services like Twilio, which is an SMS communication API. Stripe, which is a payment API. And HelloSign is an electronic signatures API. And there are also services that allow APIs to communicate with each other. For example, IFTT is short for if this, then that. Let's you trigger, for example, if you receive an email, you would trigger something else, maybe an SMS, using Twilio's API, a charge, using Stripe API, or something else. It wires up different APIs together to communicate and create actions that are triggered using APIs. And Zapier is kind of similar. Both are third party services that utilizes APIs to build new things. So this is the structure of my talk. I'll be starting with a brief introduction to serialization, what serialization is. I'll be talking about JSON, which is the most common format of today's APIs, followed by message pack and protocol buffers. And finally, have a conclusion. And that's serial. OK, so serialization. What is that? Anyone have an idea? Serialization. OK. So serialization is the process of translating object state to be a row in your database into a format that can be transmitted and reconstructed later. So one analogy is like you have a letter and you want to send it to someone else. So you put it in an envelope and you send it over the wire. Someone opens the envelope and retrieves the state, the message inside. So another analogy is you have a bowl of cereal. You put it in the box and you send it to someone and that someone just opens the box and pours, just eat the cereal. So for APIs, communication between systems is key. So the main use case for serialization is transfer data between different systems. This is because systems might be implemented in different languages. For example, you have a Ruby API, a Node API. They might have different data types, different ways to represent strings. And to communicate with each other, they need some kind of shared language that they agreed upon a priori at the beginning that they agreed to be the language that they're using. So for example, some of us here might be bilingual. We speak English, but maybe we speak a different language. But for the sake of this meetup, I'm speaking to you in English. It's like a shared language we both have and we both understand. So serialization formats is that shared language. So this example, you have different software systems written in different platforms, different languages. They have their own way of encoding data types. So to communicate between each other, they need to somehow translate the data types in their own language to another language. But the problem with this is that for every pair, you need to have some kind of integration or some kind of middle layer to handle that translation. So instead of doing this, we use some kind of common data serialization format that the systems agree to speak. And so this is what we'll be looking at. Different data serialization formats. So the most common is JSON. So your Ruby API return a JSON response and you consume JSON. It's the de facto serialization format of the web. And so in this type, we'll be looking at three different formats, JSON, message pack, and protocol buffers. So in looking at these three formats, we will evaluate a few key attributes. One is how readable are the formats? Does it spot types, validation? Does it spot schema evolution? And so on. We'll revisit these three attributes when we look at the three formats. And with that, let's start with JSON. So JSON is today's standard way of, standard serialization format. It's easy to pass. It's easy to generate and read. It's human readable. But the drawback is it has no schema and no type checking. To sum up, it's easy to work with but not very efficient with the wire. It also has no schemas. So this is what JSON looks like. It's based on JavaScript. It's an object with keys and attributes. And yeah, this is JSON. Looks like this. So I think of JSON compared to the other two formats as like a postcard. You don't need an envelope. You can just send it as is. You need, and it can be read, it can be read by the browser natively. So but the drawback of using plain old JSON is that it has no types. So the types of JSON are limited to those things. But beyond that, for example, the other types that might be available in statically typed languages like enums, for example, we cannot represent that natively with JSON. So some features of some types are lost in translation somehow. And because JSON is dynamically typed, if you want to validate the format, the shape of the messages, we have to do it by writing code. Like for example, if this attribute exists or if this attribute is an integer or a Boolean, these things have to be written down in actual code rather than having a kind of schema or type to validate this automatically. So to check if a required attribute exists, if you check the types of an attribute and other validations, these has to be done in actual code. So that's not nice. And I realized that, have anyone heard of JSON schema? JSON schema, yes. So this does exist. And this does, JSON schema, it's okay. So JSON schema is a way to describe the shape of your JSON messages and validate them. So for example, you might be sending, so this is an example of a JSON message. So maybe you could say that, hey, these messages I'm sending have a first name that is a string. The last name is a string and other attributes. We can specify a format that we can then use to validate without having to write code to check manually every single message. But this is not perfect. And as we'll see, other serialization formats have a more expressive way to solve this issue than JSON schema. Any questions at this point? Okay, thank you. So next we'll look at message pack. How many of you have used message pack of? Oh, cool. What do you use it for? Serialize the object and installing the latest. Ah, yeah. It's a common use case for that, yeah, thanks. So message pack. Similar to JSON, but there are new formats. I mean, also when you have data, there are new formats. I've heard of it. Yeah, yeah. I'll send it to you later. I do essentially, close it to the protocol buffers, the third one, yeah, which we'll look at later. Yeah, that's pretty interesting, next. So other than message pack, so this is the best way I can describe message pack. It's like a clown car of dogs, I don't know. So message pack is like JSON, but it has an efficient binary encoding meaning you can compact it very well. Takes, first of all, it's not human readable. It's smaller, it takes less space. And as a result, you can cut your client server exchange traffic because the messages are smaller. And it has schemas and types, but we won't go into that in this talk. And the key strength of message pack is that it's useful for systems that require low latency and high throughput. And it's also used in storage, in reducing the amount of storage that you need to store your data. In real life use, it's used for real-time games and systems that really need that low latency and high throughput. And it can very easily be a drop-in replacement for JSON. If you're already using JSON to use message pack is like three lines, of course. So this is what JSON looks like. And that's what message pack looks like. It's a binary format. Again, it's not human readable. It's not supposed to be, but it takes less space. So, yeah. In most cases, usually it's around a 50% reduction, but it depends on your use case. So I will quick demo for message pack. Is it visible? So, okay. So let's say we have a serial object. And let's say we have an API that usually returns JSON, a JSON string. And using message pack, just convert it to a message. Oh, sorry. And that's it. And you send this instead of a JSON string. And then at the other side, when you want to consume this message and just decodes that message pack into a JSON string. So it's just a simple serialization into message packs, into message pack, and deserialization back into whatever it was. And it's not limited to JSON. It could be just straight up, for example, it could straight up decode like a Ruby object or any of the languages, data structures. So just a side-by-side comparison between JSON and message pack. Basically it's, I mean, message pack is using some, it's own code, it's not readable. So that's message pack. So the key idea, key difference between message pack and JSON is that message pack has this idea of compressing the messages into a different format for efficient transport. So that's the first idea. And next we look at protobuf, which has a couple more interesting ideas. So protocol buffers is, according to Google, a way of encoding structured data in an efficient yet extensible format. So the two key ideas here is structured, the three key ideas here is structured, efficient and extensible. So it's the, so for internal services at Google, apparently they don't use JSON, they use protocol buffers for their internal service communication. And it's a compact binary format like message pack, as we've seen, it's also efficient. It compiles a different, smaller, more compact version prior to sending over the network. It has schemas, which we'll look at next, and has client generation, which we'll also look at next. So this is an example of a protocol buffer schema, they call it a proto file. Essentially this describes the shape of your messages that you send between systems. So this person, message, has three attributes, an ID of an integer, format, a name, which is a string and an email is a string. We have other annotations that describe whether or not these attributes are required for it to be valid. And if you notice there's like three, I don't know what the, three numbers. And this is used for schema evolution. Essentially over time you'd expect that your system will have changes to the messages it sends to each other. And so how do you spot that? So we'll take a look at that soon. So the rationale behind protocol buffers is this quote. It says, we carefully craft our data models inside our databases, maintain layers of code to keep these models in check, and then allow all that forethought to fly out of the window when you want to send that data over the wire to another service. You might have a very carefully written database schema, but when you send it over the wire using JSON, it just ends up as a string or something else. That's not exactly the schema that you came up with. So this is schema. A key feature of protocol buffers is given a schema, it can generate the client that consumes the messages automatically. For example, given the schema, it just run a CLI command. Generates the Java client code that can be used to consume and form messages of this format to other systems, to other services. And not shown here is the fact that if we didn't, for example, set the ID, and then we build it, it would throw an error. And all that is done automatically by the generated client. Schemes are awesome. It's a very expressive type system. It's not just like required optional. There are other type systems like enums, composite types, you can define, for example, we define a new type here, then use that type in our schema. So it's a pretty expressive system to describe the shape of the data that you're sending between services. So next we look at schema evolution. So in most cases when we're building systems, we only know something once we start doing it. A few comments later, we might discover that, hey, we're doing this wrong, we shouldn't have this field or oh, we're missing this field, so we have to add it. But we also have to support existing clients that are already using messages of a particular format. So the challenge here is, can we add new fields or schema over time without breaking old clients? So this is schema evolution and it's not available in JSON schema. So this is pretty interesting. So remember the numbers, this is where it's used. So for example, let's say our previous version of our app uses this particular message format as an ID, name and email. And in our next version of our app, we've changed the schema into this. So notice for the new field, we removed one field and we've added a new field. And we specified a different attribute ID. So protocol buffers, when this happens, the old code will happily read new messages because the email is optional anyway. And to the old code, any optional fields that were deleted in the new schema, we just have a default value or something now. And the new implementation that uses this schema will also just read the previous schema, previous messages, just fine. Yep. I know it's a little confused. Any questions at this point? What if we make it take away from this one of the various things? Sorry? If we make it take away from this point? Yes, okay. Think about, so no, it doesn't work like that. Unfortunately, you cannot delete required attributes. You cannot add new required attributes either. So it's very limited in that sense. Yeah, thank you. So yeah, that's pretty cool. So just to quickly summarize, when is JSON a good fit? Would you want data to be human readable? When the data is consumed directly by the browsers or by JavaScript, you don't need to sterilize, you don't need to transform to a different intermediary format. And when it's not important for the data model to be tied to the schema. When should we use message pack? When low latency or high throughput is key, or when storage considerations, when you want to reduce amount of storage that you need to store your data. And so both message pack and protocol buffers are only really recommended for internal communication because anyway, if it's a public API, it's the browser that's consuming it. It's, yeah. And protocol buffers is pretty interesting because you can serial structure data and use schemas. And this client generation, given a set, given a schema, you can just generate a client in any language and platform that is spotted by protocol buffers. And also has a somewhat useful feature for additive schema evolution. Yeah. Yep. Yeah, okay, thank you. Thanks. Yes. Is there a specific reason other than your young age that you didn't look at XML? Thank you for addressing the alphan in the room. I think it's... Yeah, I figured it would be... It's close enough to JSON. It's... Yeah, sorry. Actually, I did consider looking at other formats like Avro. I just figured there was not enough time, so thank you. Sorry. Thank you. Here, it's talking with the JSON for JSON, right? Yep. But we also said JSON, there is no schema, so how does that handle them? Because your message pack had a schema, right? Oh, this thing is optional. In most cases, when people use message pack, they don't even use this. Actually, you can use JSON schema. It's just... converting the string into a smaller string and sending it over the wire. So you can definitely use both JSON and message pack at the same time. Whether we use case to use message pack in a browser or is there a big serialization tool? I think... I have seen an implementation using message pack to send state updates for games, for example. You have sent a lot of packets over the wire. And, yeah, it's on the browser. So you would have that deserialization step in your game client to receive that state update. Is that the question? Game plan. Game plan running in the browser. It's a... Oh, okay. That was a good question. That's my question. To find the format, suddenly and then independently? I just want to... Oh, you have to explain... You're too old for that. I can't do that. I'm not a computer science major. Oh, okay. Something I should mention is that message pack and protocol buffers have implementations in different languages. And they all satisfy the same unpacking and de-packing... unpacking and packing to the same message pack and protocol buffer formats. So they have a message pack and protocol buffers have clients in multiple languages. And they all follow that same spec. So it should be independent in that sense. Or applied for AI again. I don't know. Thank you.