 Cool, good morning everyone and welcome to how to write trust instead of see and get away with it So Antonio and I are both software engineers at Yelp and we're here to the to guide you through it So first of all, I work for Yelp Yelp's mission basically is connecting people with great global businesses So yeah, let's get started. So first, why are we here? So Yelp is mostly a Python shop You can imagine like a user visiting the Yelp website hitting some Python code and our use case today would be data sterilization And so we rely a lot on the format called Avro Apache Avro You can think JSON but binary with schemas So you have your user Eating some Python code. We do need to do that data sterilization in Avro using the official Avro library written in Python Then we get back the data magic happens and you get like great reviews for great local businesses So very quickly how Avro looks like so as I said you define the schema So this is how a scheme I look looks like you just define like some it's just a JSON schema That's not gonna be inside your data, but just and it's you needed to encode your data So it's a sterilization format So who says sterilization says usually hard on CPU and it's not where Python usually shines A quick example of our data like if you take this this schema with a quick instance like in JSON You could look like like the first line and in Avro is just what bytes So yeah quick flash back 2004 when Yelp was founded it was all Python now back to Flash forward to 2018 with the event like Docker services, etc Yelp is still mostly by Python, but we've seen some services written in Java Some in Go some and some in REST nowadays So one of the new use cases you could think about would be well You have a user that visits Yelp and it hits some REST code at some point in the in the path And then you need a like a REST library to do data sterilization in using Avro You get back the data magic happens and you get great reviews for great local businesses So you can see the scheme coming along like well you have Python Python code Never library written in Python or some REST code, etc, etc Good thing about REST is that well Antonio won't get you through it a bit later, but it's safe. It's fast, etc So why not why step just here? Why what if we could get something like this? So yeah, you have a user on Yelp that Gets hits like a Python code. It does data sterilization using a REST library instead Which is fast fast faster than Python for doing sterilization. Hopefully Magic happens and then you get great data Yeah, so that's pretty much it before we move any further. Let's stop for a second and What you bring hope a big bring hobo fully by the end of the talk So first is well outright REST code in Python packages Why and when would you do that? some Python C and REST facts and Jokes maybe so there definitely will be jokes, but they might not be worth bringing home And so we let the new further wait again, let's and Antonio introduce you to the problem so As Flav said before Front time to time when we are in production and we're looking at our Python code We look at the metrics. We look at the application. Everything is going fine, but it feels kind of slow a little bit sluggish and really Well, we want here to do is taking Python from being a snail to be something a little bit faster Remote works great. So the first thing that you should try to do to make your Python code much faster is scaling up or out your application and That's it really End of the talk we can go 99% of the time. This is gonna work questions Okay, great. I got a smart one in the audience so Front time to time this is not possible to do because you know Coast like if you want to spin up one under this sentence on a WS is gonna be you know a bit of a deal or your I don't know your Company has other options. Maybe it takes, you know, 10 days to get a server or something like that So in those cases you may try to change interpreter and When I mean change interpreter, I mean using pi pi or else So for the few ones who don't know pi pi pi pi is a another interpreter for Python that is gonna optimize Your code based on you know Just in time compilation type inference and this kind of stuff caching It's very very good And it sure it should be your second option as soon as you want to speed up your code Or maybe you can use the first one and the second one together they work But in case this is not enough then it's time for us to bring out the big guns and This is the big one for Python using the C programming languages. So as you know Python the main interpreter is called C Python and is developed in C Being in C you can write C extension for it Meaning that you can actually write C code compile it and make your Python application run the C code seamlessly so Just you know just to get a feel on the crowd here. How many people have ever Britain the C extension? Wow nice How many people like writing C extensions? Yes, you get a hang of it So nowadays You'll find a lot of projects and instead of writing C extensions they use C types or CFFI even better So C type is a module in the standard library CFFI is a library instead also developed by the people develop pi pi and some others Basically what you can do with the C types or the CFFI library. It's calling via the ABI We'll go through it later on functions that being defined and codified in a C binary So these way you don't need to you know fiddle with all the internals of the C Python interpreter You can just call these external functions pass arguments to them and then use the result and this also works with both C Python and pi pi without much of a difference and good performance for both why C extension are kind of At the moment and our library by Avro RS actually uses this approach The third option that you have is using Cyton Cyton is a compiler for Python and it provides you a super set of the Python language and basically you can write Sound code and then it's gonna be compiled into C and executed by your application Fast Avro the library we are gonna compare against uses this approach instead One thing to remember about Cyton is that if you want your application to be fast both on pi pi and C Python You go that kind of duplicate your code because the way you're gonna write your code is like the difference and pi pi is Inexperimental support. I believe anyway fast avro does exactly this it has two versions of the same code One for pi pi and one for C Python So since we're gonna use CFFI for our code. Let me just you know go very very quickly Through it. So this is basically the code that you need to use CFFI in your application Basically, you need the nether file with the declaration of all the symbols that they're gonna use and The binary file where you know the actual functions are gonna leave and the actual structs and Finally, you can call all the functions or the struct via the object that you just created As I mentioned before the trick here is using the ABI ABI is test for application binary interface And it's just a set of rules that can be used to make your Python application or any application really talk with C functions So basically, it's a way to put arguments on the stacks to read returns on the stock this kind of stuff So basically as the ABI Makes your Python application talk to see how do we make our Rust? Application being called from Python application. Well, what do we do is gonna be these guys in rust as it was C and Then we make it work So at this point of the talk you may ask yes, sure this all works, but it's works in C as well So why should they bother writing rust instead of C? Well, if you're at this talk, you probably have heard about rust before right say yes note, please great So rust as you know very cool features if compared to C the first one is guaranteed memory safety Thanks to the ownership to the ownership paradigm that is encoded in the language and the borough checker and Couple other stuff rust guarantees that you want double free memory You want access memory that you shouldn't access this kind of stuff. And if you ever program C You know that this is fantastic The second thing is concurrency without data races again. Thanks to the very smart compiler the rust as third one is zero cost obstruction Basically, you only pay for the obstruction you use and these obstructions are actually much better than C You have maps and list and the rays and everything and they just work fourth model syntax you have option result pattern matching and many more and For and the last one is awesome tooling Like I love the tooling really cargo is amazing if you ever try cargo Compiling installing new application linting Format it does everything for you. It's just amazing and just works even cross-compination just works now Being this the problem and our approach I let flag Guide you through the Avro library that we wrote and it is now used as the main Avro serialization and deserialization library for rust Thank you. So well, let's begin our journey for writing Rust instead of C and emitting everything in Python. So first we need all some rust code So first how to write Avro using rust So we were looking on like what we did what anybody would do we were opening Google and Stack Overflow and looking for a Library that was doing Serialization and deserialization We wanted it to support Serde which is for those of you who don't know what serde is it's a Rust framework for doing well serialization and deserialization of any rust data structure So you only have to like implement a serializer and a deserializer and it gives you code generation and lots of cool features that you don't even need to worry about because it just works and last we wanted a library that was well supporting all the the specification of Avro which is more complicated than it seems So yeah, we're looking through all of this and we couldn't find any any good implementation that was fitting all the Parameters so we did like any software engineers will do like when they are free time we build our own So this is Avro REST or library that we wrote and that does all the things that we mentioned just before Very quickly how to use it in using rust so this is just a test structure The only thing you need to notice here to note here is these two attributes deserialized and serialized So these are the serde attributes and that's all you need to be able to to interact with the library using For for this structure. So serde does all the rest for us Which is awesome Next I mentioned in the beginning schema. So this is the way you define a schema Just passing in a string and it returns you like well schema Now if you want to write data you need to well have something that can write so a writer you give Schema to the writer so that the data fits the schema and something that you can write into For the purpose of the talk is just like Vector is just this it's gonna be like a byte array But it's basically anything that you can write into so it can be a file can be a socket can be anything can write into Create a record There is like this is the automated way to create a record There's a more manual way to create one like creating a record in 27 to a fooling to be And then you can just happen records to the writer and flush it in case so there is like some internal buffering So that's why the flush is is used So you have over bytes now So now that you have your bytes you might want to be able to read them and get those data structure by So you can create a writer from this reader from these bytes And iterate over all the records of this reader and well, that's that's all you need to know about to worry about So yeah, that's it so far how to write average in grass. It's pretty straightforward once you have the library And I'm gonna let Antonio get over to the FFI layer of the presentation So how are we doing on coffee so far you still awake Good because we're gonna have more code. So first thing if you just remember the library The fluff just showed you like 20 seconds ago. There are two things that we need to expose first one are structs So we can expose them in two way The first one is defining a struct for example, we define this average string over here Which is basically something that we use for passing Smart strings instead of using pointers and you can define all your attribute in rust Using some weird types if you ever programmed rust usually you don't use star moods each are I mean, these are row pointers. Those are just using these FFI layers for a function interface That's what the acronym stays for and in this way Your user the user of the library the user of the FFI library actually is gonna be able to Access all the attributes of the structs directory and also remember to always specify this Attribute of a rear rep see this basically makes the rust compiler actually encode the struct in a way that can be understood by the application Otherwise if you don't want your user to be able to access all the fields in the structs you can use a pack Handlers we use it a lot in our application as well So you just define an handler over here and then you have your internal struct and you cast your struct when you return it back to this type I'll show you in a second actually. So the second the last thing that we need to expose is basically functions Well, it is pretty easy, but the first time you look at them So this is called from our library as well. They kind of you know, they feel a bit intimidating. So let's destructure them Basically the first thing that they do is just calling Functions that are defined in the rust library So this one is the first function the fluff showed you before is the one parsing a string to get an average scheme out of it So what we do in this function is just you know massaging a bit the arguments in this case the referencing the JSON pointer because it's still like a C like interface and Converting it to a rostering and that's it. The second thing we do is doing the same for the result So what we're doing here is using box, which is basically the rust correspondence of Writing something to the heap instead of the stack Otherwise at the end of the function that will be located and also here We are actually casting into the opaque handler. I showed you before as you can say as you can see So given this the signature of the function becomes a little more clear So you get a pointer to an average string you return a result with a pointer to an average schema The real magic happens in this macro that we wrote and the basically wraps all the functions that we wrote in the FFI layer So if we go inside it Basically, this is what happens after the macro get expanded You'll get a first attribute on top Which is basically the no-mangl attributes just saying to the compiler a please don't mangle the name of my function Because I want to access it afterwards and then adding this to the signature that basically say to rust So this is an unsafe function This is going to be called from C code and finally basically wrapping the logic that we've wrote before into this little util that we called safe and wine so Unfortunately when you write FFI layers in rust since you are basically dealing with C code on the other end You have some gotchas that you got to take care of so the first one and that's why we wrote that method before it's unwind and safety Unfortunately, there is no common way between rust and C and all the other languages to be told to unwind the stock When there is a panic so what we need to do is catching the panic and unwinding the stock in the rust way So it's clean when the when the control is given back to the C application calling our library The second thing are error codes since the Library is actually gonna gonna look like a C library then we need to define error codes and the way for your user to get them And we need to do explicit memory memory management because yes the library is pure rust But this little thin layer is gonna be used by C application So we need to be very careful in saying to the user what is managed what is owned what they need to free what they don't need to free Also, we need to duplicate the nooms because unfortunately rust is this shortcoming at the moment Complex arguments are a bit of a pain. For example python dictionary You need to serialize them with transforming in some some kind of an object and then feed them into your library and many many more And this is just this is too much really So, you know what the solution to this what which one is the solution to this problem? Basically, it's copy pasting Go to our library copy paste the utility function copy paste the macro copy paste all the utility structures and everything and You're almost there 90 percent Now I would like to point you to a decent library instead of copy pasting But there is none that I really like and this is our second project on the f5 layer So on the third one, we're probably gonna write the library and then I'm gonna point you to that The last thing that we need is the other file So as we remember for us to cff file We need another file with the declaration of the symbols so we can find them in the actual binary Now the other file is going to look exactly like all the definition that I showed you before but in c syntax So char start data this weird you int ptr underscore t. What is what is that? I don't even have an idea This type depth structs and everything so and again This is a pain code duplication other syntax But you know we have a solution for that too c bind gen c bind gen is also Is a fantastic command source project. You can find it on github What you need to do is basically just looking it up in your make file and With just this little command we'll auto generate the other file for you. It even transforms There are doc strings in command. It's amazing. Trust me again copy paste it and You're done. You don't need to write another file anymore Now that we know how to ffi Flav that is our resident python expert is gonna explain to you How we basically wrap everything into a decent python library so that the users won't even notice that we are using rust under the root Cool. So yeah on the onto the well last step of our journey python So now that we have all the the cff file layer and the rust layer that we don't even need to worry about now because we have Well, ffi is what we're gonna deal with The only thing that we have to do is uh, well write thin python So, yeah, it's not great. Um, but that's how it is. So if you for instance want to well serialize an integer You just call the the cffi function avro value into new and give it an integer and well, that's it As Antonio said Support for complex data structures is not not really there. So if you want like for instance serialize a python list Then you will create an array give it give it a capacity in the beginning and for each of the items you manually like up and then to to the Avro array and then you return you return it in the end. So, yeah, well just writing c in python um But python is a little bit better than c and some Some in some cases and well code readability is one of them So what we did for our library, which is just an example and it could be reduced anywhere It's just defining a small class and that maps each of the uh, like primitive types that python has and uh to a well cffi function and depending on the type of the avro of the value that of the Data that you want to serialize just going to call the the right function um, and it's well if you have like a nested uh value like like additionary it's just going to call, uh Recursively and all the thing that your user needs to worry about is this property in the end, which is uh value, which is Yeah, they don't need to worry about anything else. That's just they just use the the value property and that's all that they are going to see Um, so how would it look like? As a python end user, so if you have a schema. Well, you just define a schema using Uh, the the schema as a string if you want a writer you create a writer with the schema if you want to map the data Well, you just do it like this Uh, you flash the data you get the bytes you create the reader and for each of the items you you can iterate on them so it's well, it's That's how it looks like for the end user. It's just python code It could be like written in python could be written in c could be written in rest could be written in anything And it looks like python and so Yeah, we kind of like it um So now that well your users can Use your library. You need to find a way to or package the library and distribute it to people that want to use it Um, first you need the ffi code. Um, so we're just using gitsa modules for that Then every time you have a change into your ffi layer, you need to make sure that you won't compile it all the time So, uh, the the way we do it is using cargo, which is the packet We are with like the toolbox of rust and it does can do compilation for you So every time your code changes you need to make sure that you run this command If you want to package your python package, you need, you know, define setup.py and well run the commands and everything And yeah, so I would the setup that I look like in python. So that's pretty much how I would look like The only things that you need to worry about is this Package data. So this is the stuff that is not python that you're going to include as well in your Packaging to make sure that you include The header that Antonio just mentioned and also the well the actual binary rust binary You also need to make sure that zip safe is set to false because When you package python packages that contain data files like binaries in our case They're not zip safe. You just need to make sure that it's how it is Um, and last you need to worry about this, uh, be this will so here is just a function that and it's Over complicated for like it's just you need to make sure that you compile it for the right like python version a bi version do the great platform and Yeah, it's it's a pain. You just need to like too many things to worry about. Um, I don't want that But um mil snake is awesome So mil snake is an open source to build by a company called sentry and what it gives you is basically, uh, like It's awesome. Um, so let's go into this. Uh, first you They give you like you you can specify insert the pie that okay I have an external something else that I want to build. Um, in our case, it's just a rest Package it can be like any anything could be c if you want it could be Anything that you need to to to build. Um, so in our case, you call the comment Nothing you we need to worry about is the Well, where is the binary? Um, so it's platform independent. You just give it where where it should be and how it should look like and that's it. Um Last is uh, well did the header file. So well, this doesn't change and it just gives you where where the other file is So in the end, this is how your setup.py would look like The only thing you need to to worry about is adding this and that's it. You can just build a python package It's gonna call the rest And indian, uh, you're gonna have your python package. Um, so yeah, trust me milk snake is awesome Um So yeah, now now you get a python package that runs cross codes But indian like was it really worth it? Was it really faster? What white bother? So i'm gonna let him turn you get it All right, all right since We're basically all scientists engineers This should be the best part of the talk for us numbers charts graphs Let's start. So this is just a random benchmark. I mean not really a random one. It's a pretty good one for us, but Um, basically don't focus too much on the numbers. This is just some, you know, encoding and decoding or abro messages It's seconds on the left. So the shorter the best And we can find that our code performs decently well both on c python and pi pi You can see that gets optimized on pi pi for writes not much for reads But you know, you get some optimization on pi pi as well. So it works fine. Let's compare it With fastabro, which is the library using siton instead of rust extensions are we do So as you can see like sometimes they're faster. Sometimes they're faster But we're more or less there Like they are comparable This means that our approach using rust extensions is comparable to siton Which is basically, I would say the state of the art for making C code running in your python application. Now, this is very important. We actually for example Deliver much better performances on reads Who knows why actually, but we do and Now just for the sake of it Let's compare both fastabro and pi avro res against the pure python Avro library the one maintained by apache Uh, I'm going to change the scale. So it helps you out understanding what's going on. This is the scale Now as you can see on c python the gains are real On pi pi the gains are still real like less than half or whatever but As you can see pi pi really optimizes very well your python code. That's why before Embarking in this very fun journey of writing graphs exposing that as the ffi writing a python wrapper and everything or writing c python or writing siton Just try to change the interpreter really do it for me do it for yourself and um, okay before to close actually I'm sure that you know inside you really really really want to know but I know it's a different language, but how much faster Was just the avro library the rust library. Well, it's the red column Yes um So I'll change the scale again. I'll compare it against only our library And this is the green column Yes Rust is fast very fast as fast as c basically to conclude Last part of the talk since we promised in the title how to get away with it Which is basically? Answering to two questions first one how to convince my colleagues Well rust is the most language the most loved language in the stack overflow survey For I don't remember even how many years in a row and if so many people love rust there must be a reason Second compilation just works again cargo is awesome. It just works cross compilation just works you're aware of that Faster recycle the recycle of rust it's only six weeks if I remember correctly So you don't need to wait years to see your favorite feature landing in at least the nightly build And it's a ton of fun. Otherwise me and flood wouldn't be here Second question the most important actually how to convince my company because you know they kind of pay the bills So first one is wheels compile ones install everywhere. Well given platform So in our company at yelp what we do for distributing this kind of packages is basically building a python wheel Somewhere in dock air on jenkins machines somewhere and then installing them on The same platform in production without even requiring the cargo installation or rust compiler You don't need to compile any code if you already package the build for the right platform Second is ffi interoperability and this is a blast that only comes with the c-type cff I'm Option so if you write a rust application a rust library and then you package it in ffi Then it's accessible to basically all the languages in the world java via By a gni or jna objective c c plus plus actually at yelp We have another project where we have some rust code that is shipped to Our website the despite on our android application and our ios application via the same mechanism Rust is as fast as c which means the chip to run Some gains over here and it's safer than c so it's cheap to maintain So, you know other gains for your company and finally is used in production by many companies Yelp mozilla dropbox and many many others probably bigger than yours. So why not to try out So again, what to bring home write trust instead of c. Well, that was easy. It was the title See by engine is also remember that Milk snake is also remember that as well and those are the links for the copy pasting So, uh, remember that we are hiring Both london and umberg but instead of going through, you know, the usual Shameless hiring plug. Uh, I'm gonna tell a story here. So basically, uh, I think it was four months ago three months ago I was pretty pissed off was drinking a coffee with flav and I was working on the other project the other rust project And I was like, oh man, I really have a couple cool features. I don't want to implement But you know, there is no good, uh avro library in rust So I guess we're just gonna drop it and wait for someone to implement it And flav was like, ah, I see. Uh, uh, poor you poor thing Then after a couple months, uh, again, uh, drinking coffee because we love we do love coffee Uh, flav comes up to me and say, oh By the way, I've written the library you you needed you want to try it out And so if you want to work with these awesome colleagues that do your job for you Then come to you Follow us on any social network too Talk questions So any question? Right, so I don't know it works for the microphones over here. So I just shot your question. Okay, just talk normally and we're gonna repeat it. Yes, let's do that Sure No, we just do distribution of wheels as I said, so we have basically what we usually do for these kinds of libraries we have a Jenkins pipeline and We do have different docker containers for the various platforms we are building and every docker container installs the right dependencies You build the wheel Then we basically upload the wheel to our internal pipi and the users can just install the wheel in correct So the question was How do you do the distribution of these packages? Do you use python wheels to use something else you distribute the source this kind of stuff? Thank you very much again And so and that before was the answer Is that okay or does he answer the question? We do not so. Yes. What about macOS? We do not ship any macOS code. We just have our iOS library and For that one, I don't know. I'm not in the ios team actually You know what? We just have some mic pros that build the thing for mic Next Okay, I'll just go in order. Oh, we have a mic. Great And just to complete the previous question. Do you have external dependency when you build a binary library? in risks I mean you could and Yes, in your case. Do you do you depend on something? No, no, is it a standalone? Yeah, yeah, we just we just used standard rust Just we don't have any dependency. Yeah, really. This is just an encoding and decoding Protocol, so it's cpu intensive a lot of bytes going Up and down but no more than that For the other project instead actually we have external dependency as a cell And I don't remember what else what we usually do It's basically installing them into well the machine or actually in docker if it's running docker But they're not they're just, you know Dependencies told of the operating system level. We don't package it with the Thank you very much for sharing. There's a very detailed journey. That was very insightful But the ffi part seemed to be very painful now. I'm just curious. Why did you do these? Holds if I think ffi thing yourself as opposed to using one of the libraries like paio 3 or rust c python That would do this for you That's a very very good question actually, so But let's take paio 3 for example So for anyone who doesn't know paio 3 what he does is basically Make impossible to call python from rust and rust from python And it's very very similar to the c extension syntax so In you can use it. It's gonna work. It's gonna actually maybe be even faster than our library But the only problem is that if you use paio 3 or this kind of Of libraries then you are bound to python Instead with this approach You can use your ffi package layer in any other language if you want to build a wrapper for that And in a company like yelp where we are actually supporting a lot of languages Even if python is our main one. This is a very good pro And as fluff showed at the very beginning instead of having You know different implementation of avros in many languages. They can come with different quirks We just have one and everything gets encoded and decoded with the same application. So I guess that's a good pro but yeah, it was a very good very good question and There is also another talk about paio 3. I don't remember which day Okay, so There was there was you know one of the shady advertisements over there, but anyway good go to the talk Any other question We are free. Yep Yeah, you mentioned that um, you would write a generalized library for the ffi layer for rust When you are if you were doing a project or or two in the future. So when can we expect that? Of how much free time do you have? Too much soon Next So do you have the same code base for working with python 2 and 3 or Some difference between very good question. It's exactly the same Yeah We're gonna get the microphone back to the to the front to the end Sorry, uh, from what I have seen like in the beginning of the talk, uh, you were using the cff is a bi mode, right? Yes, and according to the cff i project, I mean at least taking the documentation It seems like to be the one which is more problematic and also the slowest one compared to the api mode Uh, yes the experiment like with something like generating a sea bridge which will compile into an extension Uh, no to be fair. No Okay, because I mean from my experience It seems like that for small functions. I mean, uh, the cost of calling it is much higher than for example the Time of execution of the function itself So for this a bi mode, I mean, you have like the advantage of generosity So it works like for for many languages and not only python But especially in windows, it's kind of very problematic if you have like different tool chains compiled with gcc or mspc and I mean your Code which is just ship it with a shared object library. I mean on windows, for example, could not work that easily Uh, yes, you're correct to be fair. We don't target windows as well at yelp. So And I don't own a window windows machine. So I guess I I I really know but I will try that out. Why not? How we better this roster the organization and I mean, when you decided to do this project I should imagine you've already would have had a lot of c++ programmers and perhaps experience in c++ Didn't you face a bit of an uphill struggle? I I I let flower answer to that Let me um, I don't think we have that many c++ programmers at yelp. I mean at least they don't show it and they Yeah And I guess so, um, I mean, we didn't have any like we just came in with something that was fast and people were happy about it. So Okay Actually, I have quite similar question but more specific about learning curve for us and How much do you like borrow checker? I love it when my code compiles All the other time I yeah, I well It's just martin on me and I don't like it But I mean the learning curve of rest for like I tried rest maybe three years ago And I didn't like it. It was too hard to me and then I came back six eight months ago and it's It's like It's fine. Um, you just need to take the time and it's Like all the documentation all the tooling all the community everything is is great So you have all the tools to to be able to to Like to succeed in writing rest code Thanks So if you have any other question or you just want to offer us a coffee because they're free Uh, just come and talk to us directly. Thank you very much. Thank you very much