 Harma points for you Okay, and then I guess that we can start They're 49 still in one minute. Yeah, I guess I guess yeah. Yeah. Yeah, sure. Um, so Still congratulations for being here on Sunday morning still doing well perfect We're gonna go for the fourth talk. So we have here Andrew and J. Lee And they're gonna talk us about like software engineering practices in RAS specifically. So applause for them And let's go Hi I'm hover bear or Anna, and this is my friend Jay or busy Jay as you can imagine. He's very busy I'm gonna talk the first half. He's gonna talk the second half We work at a company called pink app which makes distributed databases We maintain Thai KV, which is a CNCF project. It's a distributed transactional key value store Those are some big words doesn't matter. We operate at a really big scale We were one of the first major rust projects in production and we've been in production for over three years We're responsible for 15 petabytes of data at over 300 customers worldwide As you can imagine, we do a lot of engineering some of our customers are the largest companies in China We're used in commerce where some of our customers have over 290 million users in Banks where we offer really good SLAs with no data loss and on ride sharing Platforms that are consuming over 30 terabytes of data every day This is a very high performance safety critical system and we really like rust here. It's perfect So how do we go from beginning with rust to actually doing some proper engineering? Now you might have been at the point where you've read the rust book. You've published your first crate You're pretty happy, but now you get the big questions. How do I share my work? How do I migrate to rust 2018? How do I do all of these other things? I don't really want to do because I just want to hack and You're probably going like this guy over here So we're gonna go on a bit of a whirlwind tour I'm not gonna deep dive on anything if you want we can talk about it after outside So first connecting and sharing your work I've had the pleasure of helping organize quite a few of these conferences and events or being at them The rust community has over 90 meet-ups worldwide and over six well six conferences as you can see there's some in Europe Russia and Asia and A couple in the US, but they don't really matter There's generally an event every few days. So Most of them are really welcoming to newcomers, especially if you have a project or something interesting to talk about So please do we want more people in more speaker slots? Okay, next one. How do you get to rust 2018 from rust 2015 because you probably didn't start your project in the last month It's really easy because it operates on the crate bound not on some other bound like Python does You can just use cargo fix and it will do most of the work for you The best part is if you're on 2018 You don't really need to worry about people on 2015 because they can still use your stuff if they've used a recent compiler It's a little bit different when we're talking about release level because we don't provide the same guarantees as you May see Travis lets you configure to test multiple versions Please do so and please test on nightly. It's very important we want to know when things break and Tell people to actually look at your CI script to see which versions you support okay Designing your APIs when you first make your API you probably just go for the simplest possible and that's good simple's best But a lot of things when you're trying to move into real-world use Russ has a lot of trade-offs. Nothing's a silver bullet. You're gonna fight from complexity verse convenience readable errors verse flexibility and speed verse costs There's not always a right answer So for example if you just want to accept references to strings And I want to emphasize if you're actually cloning the string internally take a string Do not clone without the user knowing it's really mean they'll hate you But if you're taking a string reference you can go Impla as ref SDR and then in your code you can just go as ref This means you can pass a string a string reference or something else anything that implements as ref SDR Similar with collections your first attempt you might accept an iterator and then you're gonna try and write and pass a VEC and you're gonna realize That doesn't work and you're gonna look like this guy The proper way to do it you can try this using into iterator and now you can pass a vector or an iterator That's because iterators implement into into iterator. What a surprise Now when you've been writing rust you've probably seen this error message when you call the function that return to result Now you can actually make your own if you've got something that needs to be used you can put this must use Attribute on your code. That's great because now when someone goes to use it incorrectly They get the same error message and they go damn it So if they consume it it goes away great Now quite often you'll have functions that take optional arguments Generally, it's best to use a builder, but that's not always a good solution for your problem remember trade-offs if You accept an into option you 64 or into option T You're gonna find when you call into in there that you can get your option out And this means you can call with the value or an option or your your happy little nun This means your user doesn't have to worry about writing some This is quite handy You can do similar with the Variatic arguments, which we don't have But if you're okay with having some really ugly shit in your code You you can simulate it with tuples. I'm gonna show you a couple sides on how this works First you define some structure because you can't implement things for things you don't define and Then you do some from implementations on what you want to support being passed in Then you can accept an into whatever Now all of these things over here. They work just fine It's really ugly. So don't use it all the time. Please. It's not for abuse One cool trick that we found in the last little while with Russ 2018 is you can now actually Destructure and take from arrays This was much harder in Russ 2015 and this is thanks to non-lexical lifetimes This makes designing APIs that take fixed size arrays Much easier and you can do some really cool Optimizations around that so this seems really harmless and if you're a beginner you might be going what the hell special about this But if you've been writing rest for a while, this might surprise you Wish will actually be talking about this later on in his metrics talk at I believe one A lot of people ask us about how we deal with errors and We're doing some really nasty things with errors and we're trying to move to using the failure library Because we found that the standard error traits a little bit insufficient and a lot of people agree Why because you can't downcast the cause of an error you also can't get back traces that sucks The failure crate allows you to have both of these things and it's really well documented It goes out of its way to describe four different ways of dealing with errors in your code three of them I think are good ideas You've got the prototyping way where you just pass around some strings You've got the lazy way where you just have a normal error type And if you're writing some really high reliability software it describes this other way called the error error kind method The first two you can probably figure out how they work just from hearing about them And this link will take you to the book which will describe them all in excruciating detail for you I'm gonna give you a little bit of a demo of the error error kind You just see you get an idea of what it is and I don't have time for more so first we just Define this error and it contains this context thing which comes from failure Now it's gonna hold an enum which we define down here. This doesn't hold This is not a fat enum. It's just labels and they have these attributes which gives some display message And you can see we have to derive fail on it Okay, you've all looked at it long enough. I'm gonna change the slide now We're gonna implement fail for our error and we have to define these two functions where we take a backtrace and a cause Right now we're just proxying to the inner but you might do more in your implementation You also have to implement display, but that's not a surprise to you and Then we implement some froms as you can tell from and into are just crazy useful It's important that you implement both for the enum and the context because this is important on this side. Oh The next slide. Sorry We also provide some way to get at that context now We can call something that gives you an error and you can actually add some context That's specific to your library and the user will be able to downcast and get that original error and all of the context That comes along with it. This means if you if your user actually cares about their error They can dig down really deep and figure out what's wrong if they don't care They can still this unwrap it and crash. That's fine if they don't care. You don't have to care You also you don't need context though. You can just use the enum. That's where we have those two from implementations Okay Now let's talk about fuzzing and I know you all like fuzzy things. I did not bring any fuzzy crabs for you I'm sorry. I'm actually talking about the programming technique Now fuzzing doesn't replace your unit tests But it does complement them because it finds out if you've forgotten tests or you didn't know about them Because you weren't that smart But please note when you're implementing fuzzing it shouldn't be part of your normal CI test runs when a new contributor makes a PR. That's mean Because if you get a failure, which they don't always happen that poor contributor is very confused so Travis lets you define the Determinant event type if it's a cron job, which you can you can make them. They're not hard You can detect it and that's when you should run your fuzzing that way Every night or every morning when you wake up you get a nice failure report saying how bad your code is So fuzzing is a way to randomly explore the space the state space of your code. This means That you're not always going to run over the entire space And this is a good thing because if you're running say over a u64 state space This could take forever To find one edge case Fuzzing is only going to run a few thousand every time you run it. So it's a great way to find bugs eventually and As you get more and more complex with your fuzzing you're gonna have a bigger and bigger state space So you might think ah, it's no problem to run through a u32 and I agree but a u64 is a lot bigger than a u32 We recommend that you explore prop test, which is kind of the newest one out We're actually not using it yet, but it's a really great crate and we've been playing with it and we'd like to adopt it It's from the quick check family of fuzzers So essentially the way you use them is you define some properties that get fuzzed over It supports this idea of minimization So when you're fuzzing over say a string and it finds one that doesn't work. That's really long and crazy It will go through a whole bunch more iterations to find the most minimal example it can So that you're only testing very simple examples that break things It also remembers regressions for you and keeps testing them to make sure you fixed it So you can't be lazy because we all like to be lazy and tools that keep us honest are good The way you define how to fuzz over things is quite a bit different in prop test Compared to the other quick check family things you actually define these strategies, which you can customize quite closely I think it's a really good alternative to lib fuzzer or AFL or quick check itself You might have different opinions. That's great. I support that. I like opinions Here's a little bit of a demo on how prop test works. This is the most basic one I could think of and it's actually part of their documentation because it's the most basic one you can think of We define some ad trait or ad function and then we use this macro called prop test and We define some ranges you can see a in Zero to a thousand be in zero to a thousand This means the fuzzer can pick any numbers between zero and a thousand for both a and b Try to add them together and make some assertions So when I run this and I run cargo tests It's gonna run a whole bunch of them and this state space is actually quite small So it will probably only take me maybe Five or ten fuzz runs to get through the whole state space As you can tell though this code has no problems. It's very simple But if we were exploring all of the I-32 state space, I might have some problems because there's such a thing as overflow But you're probably like, yeah, I don't really care about testing over you 32's. I use a whole bunch of complex types So what about like a key value? So a non simple type that holds two byte arrays you can go and you can implement a function which returns this strategy thing and Here or here on that tuple you can see I have this these reg axis And I can define these reg X's to be whatever reg X I want and there's various other things such as ranges I can use for this then I call prop map and I can use it to map to these this new structure. I defined So this function will actually return to me an arbitrary Kv value with some vectors to vectors with random garbage in them Now when I write my prop test I can say Kv in this arbitrary function and it will go and get out random ones a whole bunch This is great. As you can guess though the state space on this is gigantic So I could probably run this a million times and still have new outputs So it doesn't explore the whole state space. It's not a silver bullet. Okay, Jay Okay, thank you Hannah And I just introduced some methods that help you to find bugs and Actually, it's quite easy to find out that an application doesn't work because in my panic or it will report Alice But to reasoning the bug usually it can be hard Because bugs can be unpredictable. It may be reproduced. It may not and Usually you will need more informations to reasoning a box I have been into some situations that I I add some print and we run it again and And over this procedure over and over again. So it also new lux So you might think that I wish that time can just tip back so that I can I can collect as much as much as informations that I need and Fortunately, there's a truth that can do this for you. It's RR And it's a tool produced by the Mozilla Just as the awesome rust and it can it can we call the failure once and debug the recording deterministically and You can set break points and watch points and escalate and reverse escalate the command to the command to use this tool is quite simple just how we call and Once you fire failure then just replay I Will show you an example about how to use our to debug a rust rust what's program and Okay, I implement a simple program that just simulate a Masculation that has race condition A fun fact is that Rust when you write code in safe rust is gravity that there's no data race but Race condition is on white unavoidable and so we have an account Account has name and and the data directly when you initial initialize an account if you write data to the Right is remaining money to the data directly so sensitive and You can query the remaining And you can also set the remaining and what we are going to do is the transfer Transfer will first check if the source account have enough money if it if it's not it will return false if it if it do if it does it will Set the remaining about To the source and the target as you can see this program is just written for the Just a written SM example. So the logic is quite simple and we can try to run this code Well surprised it failed immediately But as you can see We are not okay Usually we are not that's the lucky But But to show you it's a good day today. I For is Sunday, right? Okay. Yeah, Sunday seems a good day The point is once we we find out that it's hard to reproduce the bug it always pass and how can we do it? How can we solve the problem? When using our Due to the uncertainty of the network, I'm not going to run the RR in life So I just show you some pictures to show how how you work the command you can see is I Use our to record a program with a H flag H flag means that enable the chaos chaos mode to the RR it means that because I actually will emulate a single core machine. So It will try to switch to other threats when there is some like six cores or some Some Instrument counts is is rich. So the chaos mode is that just switch Just switch it whenever you can so that we can reproduce the bugs very easy as you can see that There's a 78 in the end. That means I tried 78 time to reproduce this bug and Finally, I will post I will produce the bug then I can repay it to see what's happening and Just time I replay you can see it it shows up Interface that is very similar to GDP actually I use GDP in the in the hook so When you replay the replay the failure it will pause at the very beginning So when you type C C means continue it will just reproduce what we just recall So it panic lie 73 Which is Which is here and I has Anna is is ready to have no money, but it has 20 So let's check out the transfer function. It turns out that The value set here is 20 So we can add a breakpoint to the to that line that is I add a conditional breakpoint if the remaining minus Amount is equal 20 then break then stop it And I use a special command here is reverse continue because the book the program is panic and We need to reverse it to the very point it just At the very point error happens So after we was continue we see that fret 2 hit a breakpoint and we just Enter a empty command, which is repeat last command just reverse continue And it shows that there's true threats different threats hit the same breakpoint. That means That's the very place Race race condition happening That's how we use our as we just said that in Emulate is a single core machine. So if you are debugging a multi if you debugging a Friday application if you have some decrease in performance and The second mutation is it only works on this of course. It shouldn't be a problem at all Okay, and so if you use our to debug a probe debug an application and you find out bugs so And you believe that you fix it but how to prove it usually we should we should write a test case to To check to check it will never fail again and to prevent is is bringing back accidentally accidentally in the future But stable test case can be hard of course if you are lucky enough it will be easy Yeah, Sunday and So to stabilize the bug we just said we may need to when we made a special schedule policy to the press But unfortunately, we can't we can't control this. It's depends on the operating operating system So how can we do it? We write a crate to do these things in name fail it's free in primitive in rust and The and this thing is inspired by the Fred BSD's fair points with fair points Fair parts are co-instrument co-instrumentations that allow errors and other behavior to be injected dynamically at runtime That means you can you can just use some like You just use use this library to make some friends like pause or sleep or yell whenever you want to simulate some failures like I Oh, I'll failure or six core failure or anything else and This is the github. Yeah. Yeah, welcome to star And so let's see how to define fail points. There are three way There are three methods to define fail points. The first one is pretty simple just define a fair point using some unique name and Generally when when application ran into the fair point, it will just emit the fair points just like as if is never defined and The second one there is a caution here I will talk to I will talk about it later because it's used in a very unique case and the third way is that We add a condition in the middle place and the condition means that the fail point will only be triggered if the condition is true so Like if you run some tests is concurrently you can just make the fair point taking effect if the test is is The specific you want to run So you have to find a fair point then how can you configure the actions about a fair point? You can you can do it Either why are the environment environment variables? Or you can just use using the interface API interface into just using the API and It is quite simple. Just a pair of the name and their actions actions are defined in in the in the format like this there are there can be many actions and only Only when the first action is not trigger then the second action will be trigger and Every action is defined as P and count and task and arguments P means that the probability that the action is trigger and The count means that the mess the mess times the the task will be executed and currently the supportive task include this many and I think the task name is raised themself. Well The only truth needs to it's way here is the delay and the sleep delay means that You can just pause the flat for the Given seconds and delay will spin the flat is a busy waiting So let's let's take a look to the example about a fair point configuration This means that the fair point has 20 percent probability to bring a light still alive and have 80% probability to just panic and if it brings still alive, it will bring as many as three times And Let's see how to stable the Race condition we just described and we can see that I Defy a fair point slow update here Because we know that the race condition is that the light 39 fetch Remaining can be can be mutated before like 44 so Actually we should we should hold the lock the whole in the whole function, but it's not so we put a fair point here and We air configuration to make it sleep 100 milliseconds So that in In general if there's no bugs in the kernel Fresh will all block at this at this point so that The steroid will will be always hold so we can run the test to see how it happened I think something turns out wrong. Are we in the same direction? Oh? It's not my computer so Yeah Okay, what are they so it works um Yeah, and so no matter how many times I try it always panic and this this is what we want We use fair point to make a test stable. So when we fix the fix the transfer it should be Not panic again, but how to fix it It's beyond the stock and so you how does fair point work under the hook? Actually, it will maintain a global registry that map the fail point to its configuration so every time the fail point is Sqt it will query if if there is any configuration for the fail point if it is if you execute it the Configuration if not if you just enough So you may wonder that if I define so many fail points in my source code It may affect the performance of the application. It's true. So We offer a feature gate We offer a feature guard that can make you this Disable all the fail points at combined time So is is also one of the rest philosophy just pay for what you use Yeah, so it seems we both timed ourselves out to be about 20 minutes and we were both about 12 minutes So thank you very much. We we hope to see what you can build with some of these tips and tools We're hiring distributed systems engineers. You can email me if you want a job We really like remote people we really like open source we like to read research papers We write rust go and see plus plus We hire anyone. We don't really care what you look like or how you identify We like creativity passion and teamwork But we have some time for questions. We have a whole 10 minutes So if you have questions, please ask them if you have comments we can talk in the hall Hmm. So the question was there's some tools that exist that will actually mutate the code to see how robust your code is Those are that sounds really cool. I haven't actually encountered any for rust. Have you Okay, okay, I Think oh, I think that sounds really cool, and I really want to try that now. So thank you More questions So the so the question is is it easy to automate some of these practices that we talked about using tools Clippy will get you quite far As I kind of discussed at the beginning a lot of these are trade-offs For example, if you're using some of the into tricks You should try and minimize their surface so that they're only existing in the public API Because there is a compile time cost associated with them and some of them do have a small runtime cost So you do need to be aware if you're just throwing these everywhere you're gonna be paying for it So a lot of these work best when you limit them only to public API So I think a lot of tools might not do that Definitely, I highly recommend you run clippy on every build though so in Tai cave itself we're currently using lib fuzzer which Does not provide the same level of abstractions and tooling So we have to do a lot of implementation to get it work properly It's very useful in tools like databases though because when you're storing arbitrary Arbitrary strings and byte arrays you need to be very careful. You're not mutating them Certainly that you only get so much mileage out of fuzzers though There are other tools that you might want to explore when you're doing multi-threaded programming such as namazoo Which will go and Chaotically change how some of the threads are scheduled to help you find bugs and in logic instead of just in properties So so the question is are we aware of anything that reads contracts and helps us define properties automatically? Unfortunately rust does not have anything like whore notation Similar to what you might find in add-a So it's kind of hard to make assumptions about what the contracts is are I Have seen some libraries trying to add things like contracts to functions But most of them aren't Really at the stage where I think they're kind of usable in production That's certainly something that would be very useful to a lot of people So if anyone's interested in implementing that please do and tell me about it So the question is how do we choose the language we use on projects? Because you know we are writing take a week and actually we are in a more big project is named tidy B and For Tiger is the underlying storage for the tidy B and we want it to be To be more deterministic for example if actually tidy B is written in go but if we use use go in take a week it has the like g-state problem and It's it's combination is its performance is not as good as rust and Also because the storage is the fundamental of the database. So we want it to be safe and fast and And that's why we choose rust to build a TV Yeah, I think I think rust the way that drop semantics works is Extremely useful when you need predictable performance because you always kind of get really reliable function call runs So the question is what would make it so that we would choose rust for writing tidy B the the thing on top of us instead of Go I Think we can both agree it would be it's easier to hire a go engineers than rust engineers right now Particularly in the distributed system space because a lot of go engineers are out there slinging distributed systems code and not a lot of rust people are writing distributed systems and I will say that Although I don't want to admit it But go is really flexible then rust because you you don't meet too much Combination errors in go so you can get things done very quickly and in the circle layer there is many functions and many Other course that you need to keep compatible with my circle. So your scope will be a Mob appropriate choice in in this case We got time for maybe two more questions I Would be very happy to chat with you and just give you a huge list because Like we work on this stuff all day, and it's our favorite thing in the world Certainly, there's lots of great blogs out there. You can look at Nick Cameron's blog without boats's blog Erin Turon's blog they all talk about a lot of tools that are coming out and they try and highlight some really cool things But definitely I think the rust blogging ecosystem is Really really vibrant Matias actually runs a great blog. He just presented and you can find out I find out most of the cool tech we find from blogs and Certainly there's a lot of effort right now to make rust things and research papers as you may be aware There's a rust belt project that's trying to formalize rust and things like that And they're discovering a lot of cool stuff you can do So certainly I think one of the most beautiful things about rust is the really heavy research bend on it So the question is does adding a lot of fail points make it hard to maintain your testing suite? Yes, it does So Actually, we use we use a fail point in production Entirely they are we We ask you all the fair point at fair point test is in sequence So there's no racism in this duration But we are planning a feature to fair point is that supports like Contest aware fair points so that every fair point is bound to specific test cases So they can work concurrently. So I think in this case There will be no more headache about too much fair points I think this feature will will come before the release of 1.0 of fail fails So we're at we're at 43 minutes. So we should probably stop It's been a real pleasure and we can definitely continue the conversation to the outside if you want