 Today I'm going to introduce a library called Rust Prometheus and it's a simple library but you will see how Rust makes this library safe and fast. So let me introduce myself first. I'm Wish, an infrastructure engineer from PINCAP. You may already notice this name if you attend the lectures before. In PINCAP we mainly build two products. One is a distributed transactional SQL database called TIDB. It is written in go long and another one is a distributed transactional key value database called TIDKV written in Rust and TIDB is just a SQL layer built up on the KV database. The key value database is the storage layer. TIDB and TIDKV have many many customers worldwide and we have adopted in banks, internet, enterprise companies for more than 15 gigabytes data. It's pretty large data and these are all used in production cases. So let me introduce the architecture of our product. TIDB speaks MySQL protocol and so your application can just use MySQL drivers to talk to him and TIDB act as a stateless SQL compulsion layer and for the underlayer it is TIDKV. It's a distributed KV storage and this distributed KV storage is building Rust. In PINCAP we also created and maintain a lot of many other Rust crates. For example here the Rust promissures is the library I'm talking about today and Rust RocksDB it's binding and wrapper for the RocksDB database and Rust. It's an implementation for the Rust distributed consensus algorithm and also gRPC. We build it to wrap the gRPC language core to be high performance and also we build FAIL IS. It provides FAIL points. So what is promissures? Oops, so I know this one. It's just a system monitoring and Latin toolkit. Here is a common flow of using the promissures. Usually your application will collect matrix using a promissures client and then you push the matrix to promissures or let promissures server pull your matrix and finally you will use some visualization tools such as Grafender to visualize the matrix you collected. Rust promissures is just a promissures client. It is not a promissures implementation in Rust. Sorry for that, but I promise you that you will find many interesting things even in this simple small client library. So let's first get started and take a look at a small example of how to use this library. First of all, you need to define your matrix. There are many kind of different matrix like Cunder, Histogram and Gauge, so on. Here as the code demonstrates, we create a Histogram matrix named HTTP request de-origering using the macro called register HistogramVac. The Histogram has one label called method. As you can see here in the fifth line, there is one label called method. So this is actually a Histogram vector because for each value in the label, it will be different matrix. They are counted independently, so they are Histogram vector. For the second step, you will record the matrix. As an example here, I just use random generators to generate this matrix. In real applications, you should use an instant to record the duration instead of what I did here. It's just for demonstration. Here I will, here the code simulates a request. As you can see, its duration will be randomized from 0 to 2, and its HTTP method will be from one of get, post, pretend, delete. We use this line, requestDuration.withLabelValues method.observeDuration to record this duration. It means that there is a request in a label specified where the method variable takes the time specified by the duration variable. Finally, let's serve the matrix for the promissive server. There are two ways, generally, the pull and the push. The code here provides a matrix service using the hyperlibrary for promissives to pull a matrix from. So here the code is pulling, but the promissive library also provides push functions. To pull a matrix, you just need a tax encoder. As shown here, it's a tax encoder, new. And this tax encoder will encode these gathered matrix to the output response. And the output response, just look like this. So far, you have already collected your matrix to the promissive server, so it's time to visualize it. Here, I use the graphener to display the histogram using the promissive query language, as you can see in the blue text. Then you can see the graph here. Of course, you have to feel to use any kind of other visualization tools, but commonly, graphener is a wise choice. So far, it looks normal. Just simple matrix library provides matrix recording and encoding features. Next, we will see how Rust promissive library provides some unique features by utilizing some Rust advantages. First, Rust makes the library very safe. So while we care about safety, we use Rust promissives in our key value database, as you already know, TypeKV to record and report all kind of matrix. There are hundreds of matrix. And safety is very critical for TypeKV. For example, we don't want crashes. If TypeKV crashed, your service is unavailable. Although TypeKV is distributed and for the tolerance, but we want to minimize this possibility, we don't want to see that happen, so we want to eliminate crashes. And then, for the most important thing, we want to eliminate data corruption. As a key value database, it stores data permanently. So if there is a memory safety issue, normally the memory will be broken. For example, there may be garbage in the buffer that TypeKV is going to flush to the disk or transfer to other peers. In this case, if this data is corrupted, it's horrible. It means that we will lose our data permanently, so we will not let it happen. That's why we very care about safety. So now let's begin our case study to see how Rust enables safety. Let me introduce some background. In the library, you can define the record labels for matrix factors, just like this one. Here is HTTP request. It's method. It's post. It's IP. It's IP. And it's pass. It's slash API. Then, after defining and recording this matrix, you will be able to query a matrix for a specific label. For example, you can know how many requests come from this IP address and how long it takes for request to slash API endpoint for 99% of requests. So as you can see, the label is pretty useful here. In fact, in TypeKV, you heavily use the label feature. When you're defining it, you use countervex. For example, I'm using a counter. So countervex, new. And I provide three label names here. And when I record matrix, I use counter dot with label values and provide three label values here. Here is a restriction. So the number of labels you defined and the number of label values you provided, they must be the same number of values. So how can we do that? Normally, we can just check the length at a runtime. And if the length doesn't match, we can panic or slow errors. But this is very simple, but there are some disadvantages. For example, your code may be hidden in a branch so that your text will not cover it. But you know, in production cases, all errors may happen. And that means if your text does not cover it, this error will be in production. It's very... I can't tolerate that. And also, there will be runtime costs. Because you are just checking the length in runtime, so there will be runtime costs. So how can we fix that in a right way? We can use type systems to enforce label lengths. Here, I first declare a trade called label. Then I implement this label for different kind of string arrays. For example, I implement this label for an array contains one string, and the array contains two strings and three strings. These types are all labels. Next, I will pass this type, some label type in the new function. I just pass some label type in the function and create a kind of act containing this type T. And for the recording pass, when we want to waste label values, we also accept this T. So as you can see in the usage, if you pass an array with two strings, you create a counterfact. It's T as string with two elements, two elements array, string array. And when you use this metric, you also need to pass a string, a two and an array with two strings. If you want to pass an array with one string or three strings or other kind of strings, the compiler will reject your code. So in this way, you ensure that the label numbers you pass when you record the matrix are exactly the same when you created it, while you don't have to check its length at runtime. So it's both safe and fast. And for some improvements, we may also want these features. For example, we want to able to define the label using into string and using the label using as rough string. And also, we may want to be many values. For example, four, five, six, 32, we want these many labels. And we don't want to repeat so many codes here. As you can see here, I repeat three lines for string length one to three. And if we are going to create 62, 62 labels, it will be a lot messy. So this is possible in Rust, but it's quite complicated. So I just paste a link here. You may refer to it later to see how it works. Here I demonstrate a code piece that uses this improved version. As you can see, when we create the counter, we can pass either a string reference or an owner string. It is into string. So if you are passing an owner string, there will be no cost. If you are passing a string reference, there will be string clones. That's what we expected because we will store these strings in a structure. So it must have set owner strings. But when we use the counter, it is as a string. And as you can see here, also you can pass string references or owner strings. It's all fine. And the most fancy way is that if you define a matrix using owner string, you can also use it using a reference string. And if you define using a reference string, you can also use it in owner string. It's all fine. So also it has all the features previously we thought that is to ensure the length at compile time. So if you pass a string, passing an array only contains one string or three strings, it won't compile. There are some other cases in the Prometheus library. For example, we utilize send and sync markers in Rust. What is send and sync? For send, it means that a type can be sent to other threads. It is safe to send to other threads. And for sync, it means that the type can be shared for different threads. So for example, considering thread local variables, it is not sent because if you hold a value in one thread, when you send it to another thread, the value no longer holds. So thread local variables is not sent. And we'll see an sync example soon. And also we utilize must use attribute. Must use means that this type must be used. For example, in Rust, the result type must use. If you have a result type and you don't put it in a variable or don't call its method, the compiler will reject your code. For example, in Rust Prometheus, we provide a timer that records, it laps the time when it is dropped. So if you don't use it, it will be dropped immediately. And this is not normally developer may expect. Developers may expect this timer to run for the whole scope. So it is a must use timer. Now let's see how Rust enables the Prometheus library to be very fast. Sorry, I made a mistake. It should be fast. So you see I just copy it. So why we care about performance? Because we record metrics very frequently. For example, we record a lot of metrics. For example, duration, scan keys, skip keys and so on. And we have hundreds of metrics recording every second. And also we record metrics for all operations Taiki provides. For example, get, put, scan and so on. So the overhead of the metrics should be very minimal so that we can know what is happening without sacrificing the performance. Now let's study a case, which is the local NSYNC metrics. Normally, our metrics is the global metrics implemented using atomic variables, as you can see here. The advantage of using atomic variables is that it can be updated from everywhere, for example, multiple threads. But as you can see, for atomic variables, we need to use atomic operations to modify its value. For example, I use fashat and it's 10 nanoseconds on my laptop. And to improve that, we can introduce some local variables. So this is what local counter it is. So for local variables, they are not NSYNC. And we can flush back to the global variable periodically. So you can both achieve, you can just achieve the speed while not meeting some data race issues. For example, you can just use x plus equal one to increase the local counter. And the counter will be flushed to a global counter. For example, every two seconds, three seconds, although it takes 10 nanoseconds, but it's fine because it only happens every two seconds one time. So in this way, in this way, it is very fast. And for a local counter, it is not NSYNC because you know data race, if you create two threads, update the same counter, you won't get the final counter number you expect. It's a simple data race. And in Rust, it provides NSYNC marker to mark that this is not shareable for multiple threads. So the local counter is NSYNC. And by using this technique, we can both achieve fast and safe. You can use the local counter in a very fast way, and you will never share it in multiple threads. You will never share it in a wrong way. So it's both fast and safe. Let's study another case, which is cache in matrix vectors. Also, let me introduce some backgrounds. As you may already know that matrix with different labels, they are counted independently. For example, here is a code created a counter fact with two labels. And then I used the label, for example, I record with post slash and record using get slash and record using get slash API. And finally record using post slash. So post slash happens two times. So when we want to get this value, it should be two. And for get slash API, only record one time. So when we get it, it should be only one. And for other kind of label names, it should be zero. And of course, get slash is also one. So as you can see, although it is the same matrix called the contests named the name. But it has different label values. So actually, they are counted independently. So what happens inside the function with label values? It actually does these things. First, it will hash your labels. Here, it will just hash your post slash and it will get U64. And then it will perform a hash map lookup. And if you have already used this label value, the hash map entry exists so it can just return it to you. And if this label value is fresh, you have not used it before. It will create a new entry. So it's just an atomic variable with zero initialized. So as you can see, the with label values function actually does a lot of things. For example, hash lookup and also some branching and finally it returns the matrix. You may notice that there is a simple optimize. For example, instead of doing these things, which is accessing a counter, looking up a counter using get and slash API and then increment it for the whole process, we repeat it for 100 times. Instead of doing this, actually, you can lookup for only one time. So you lookup one time and then you increase it for 100 times. This is pretty fast. As fast as the case when you use the counter without any labels. So in Type KV, there are many service endpoints. For example, there are transaction guides, transaction batch guides, pre-write commits and so on. There are so many service endpoints. And we can use this piece of code to accelerate just in this way. We can write it manually to cache these labels. But you can see it is not dry at all. For example, here we have only one label called service. But what about there are two labels, service and status. As you can see, I should repeat for example, transaction gets success, transaction gets fail, batch gets success, batch gets fail. It's a lot of code that even exceeds my screen. So how can we solve it? We can solve it using Rust macros thanks to the powerful Rust. Yeah, I'm pretty a fan of Rust. As you can see in the code, this is the macro provided by the Rust Prometheus library. It's called make static metric. And when using this macro, you can just provide your labels. For example, services, I give it these services and for status, I give them these status and it will take care of you. When you use it, it's pretty simple. You can just m.transaction.get.success. You will get a counter and you can just increase it. So how is this macro actually implemented? It is pretty cool, right? But you may not know how it is written. So let me use some simplified example to illustrate how this, what happened to my laptop? Okay, that's fine. I can use some simplified cases to illustrate how this static metric macro is implemented. So let's take a look at some simple cases. For example, we want some macros that expand these kind of things. We just want, after expanding, we can use it like this. For example, here is a counter with label values four. And this should be something that the macro expanded. And after expanding, I should be able to just increase the macro. And also we can just get the macro or some other functions. So the implementation is pretty simple. I can just create a struct called my static metric and then it has two counters. When you new the struct, you can just cache the metric using counter with label values just like what we did before. So this is the code we write. And this is what we expect. So our macro should expand the code in the left to the code in the right. But as you may know, in Rust permissions, there are more than one labels available. For example, you can supply three labels. For the first one, there are two values. For the second label, there are three values. And for the third one, there are two values. So in order to implement this kind of metric, actually you need some more complicated codes like the code showing in the right. For example, we want to access full.usra.success. So actually you will need three structs. The first struct contains full and bar for the inner two. And for the inner two, it contains three fields for the inner three. And for the inner three, it contains the actual counter. In this way, the iterative macros won't work because they identify a concat here. And also you may notice that the logic, the repeat logic is not the same for every label. For example, for the first label and for the second label, it will expand to a struct we create. But for the third label, or let's say the final label, it will expand the code to some counter. So the character macros won't work. Then let's use procedure macros. So what is procedure macros? Procedure macros allow creating syntax extension as execution of a function. For example, you can create a function like macros using procedure macros. And you can create derived macros. For example, here is the custom mod derived created by me. And you can derive this macro for your own type. And also you can use attribute macros. So to use the procedure macros, first you need to declare an entry in your cargo manifest file. It is stay inside the lab entry. So it's just a procedure macro equals to true. This indicates that your create is a procedure macro and the compiler will recognize it. Then you need to write a function. It's just a steps and parameter, which type is token stream and produce a token stream. So it's just a transform function, setting a token stream and transform it and producing the new token stream. For example, here I didn't write the body for the function. I just print the token stream debug mod. So you will see what token stream it is. And remember that for procedure macros you need to add this attribute procedure macro for it so that your function will be caught when the macro is invoked. So let's see what will happen. In the left side, this is what our macro users will use. It's just write make matrix and pub struct my static matrix for bar. You may notice that the syntax here is very similar to the rust structure, but actually it's different because for full and bar, there are only names and no types. So actually it is an invalid syntax in rust. It's just similar to rust structure. For the compiler output, you can see for this kind of code, it will generate these token streams. The first is identifier pub. The second is identifier struct. And the third is identifier my static matrix. And then there is a brace group. And then here comes inside the group there is an identifier full and comma punks and then identifier bar and also comma punks. So this is what your procedure macro function will get. Now you will write a function to transform these token streams into the token streams you want. So in order to do this, let's first pass it. Normally we use a query called scene to pass these tokens. For example, as you can see in the right, this is the scene parser implemented by ourselves. It's just called a metric definition. It contains visibility that is the pub. If you don't write pub, its visibility is different. And it also needs information about the name you provided and also a list of identifiers. It's just a list of label values you will provide. So here it should be full and bar. So this is the structure that we want to pass from these token streams. And then we will write the parser using the scene facilities. For example, the first, we just input dot pass. We are passing a visibility. So we pass the visibility to the visibility variable. And then we will find a keyword called struct. So we also pass a token called struct. You may ask what will happen if the user doesn't provide some struct. For example, he may write pub in now. In this way there is a question mark here and there are errors. So you can just get what you expected. And next, I expect here to be a name. So I also pass an identifier here. Finally, here is the brace. So I use the brace macro provided by scene. It will pass the brace. And the content inside brace, as you can see, it's just a list of identifiers concatenated by commerce. So I can also use some facilities provided by theme. It is pass terminated. And token is commerce. So it's commerce concatenated. And token inside is an identifier. So finally our pass is tokens into some vectors. So it creates an iterator and collect it into a vector. Let's see what it will happen. For example, I just wrote a puzzle and I will use it in this way. You can refer it to the scene documentation and you will learn more. Here I just demonstrate a simple use case. So in this way it will pass the input token stream using your metric definition puzzle and generate the structure you want. So here is the structure we passed from this input. As you can see, the visibility is pub. And the name is our my static metric and there are two values foo and bar. So right now it's so fine. So for the final step now we already know what user has supplied and we want to reassemble the code, reassemble the token stream to another token stream. So here we can use the code code. For example, I want to reassemble the code for the yellow part. I want to resemble it. So I will write the code in the right. And the most important thing is this piece marked in yellow. As you can see I just write code and visibility, struct, name, and values counter. Since here is a sharp and some brackets, so it's just a repeat list. So actually it will repeat every values and repeatedly produce these tokens. This is pretty you can write it in a way that's much like source codes. As you can see finally I will output this expanded token stream. It's just like this. Although there are different space for example here there are new lines and there are no new lines, but you know it's fine. And if we omit this space you can see they are exactly the same what we want. Now let's generate the rest part. It's implement my static matrix. I want to transform token strings into the this part. So I naively write codes like this. It's just for example the same repeat list. And inside the list there is a double code and sharp values and let's see what will happen. So in the compile output you will find something you are not expected. You will find that with label values its parameter is a string. So you are actually providing this string, right? And it's just to produce the string for you. So what's wrong with it? It's just because your values is an identifier but here you want to produce a string. So you need to transform your identifier to the string than it will work. So let's do it. Let's transform the identifier to the string. Here it is just a simple map. As you can see data.values.daterator.map then so I will get every identifier. Now I will create string literal using little string new. In this string literal I just call the content without modifying it. And actually you can modify the content. For example you can concat or add prefix or suffix everything whatever you want. It's just fine. So you can freely manipulate this identifier and create a string. And finally we will use this sharp value string instead of sharp values to be here. So this sharp value string is a string and it is not an identifier. So now it will work. This is pretty exactly the same what we want. We want the macro to generate this piece of code and the macro will generate this piece of token strings. So it's exactly the same what we want. For more information you can check out these two links. For our toy macro you can check out this gist and for the full implementation using Rust permissions you can check out this link. It's a pretty complex implementation because the macro provided in Rust permissions provides many functionalities. It has a very complex syntax and the syntax is very powerful. So to support all these kind of things the macro itself is very long. But as you can see the code idea is similar. You just first pass it and then manipulate it and finally you will generate a new token string. So finally let me talk about some future plans of this library. Currently this library is not 1.0. It's just 0.5 or 6. I don't remember. But anyway it's not 1.0. But we are planning for the 1.0 release. We will adapt it to the Rust 2018 edition and also we will clean up this API because this create is written several years ago and during this year some functions of permissions has changed. For example there is no product buff anymore so we will also remove the product buff support. We will provide type save labels. So it's actually a prototype but we will provide it. Also we will provide some smaller library size and maybe we will support the no standard library environment. You can use it in web assembly or some embedded service. It's fine. For the future we will also support a metric kind of summary. Also we continuously want to make the metric very fast. So we have already got an idea. It's core local metrics and we are still exploring how we can implement it. By using core local metrics we are expected to have some greater performance. Also we will provide some easier to use polling handler and more to come. Contributions are appreciated. Thank you very much. You may ask some questions but I'm not a native English speaker and I may not understand your accent quite well so please speak slowly and so that I can understand it. I'm very sorry for that. I think maybe there are two questions. If there are more questions you may ask me afterwards. So the type save labels is just okay. He asked what type save labels really is. So let me turn back. Here it is the type save labels. The type save means that the label numbers you provide when you define the matrix will be the same when you use it. It will be checked by the compiler instead of at runtime. So it's type save. When it is chatting the compiler time and you don't need to worry about for example not being covered in tests or something else and there is no cost. We are doing it in a type save way. So there is type save labels. Oh yeah. He asked a question about whether it's possible to just switch off all of these things so that the performance can be greater. It's of course approachable but this is not implemented. It can be approachable by using example config. You can configure a feature gate for these kind of things and you hide it behind the config so you can switch it on and not switch it off. When you switch it off the compiler will not generate these codes so it will be no cost. Are there any other questions? Maybe one more. So you are asking questions about whether it is fast to use atomic variables. You mean other kind of approaches? Do you mean other kind of approaches? Okay so let me show it here. Actually we have benchmarked the performance of using atomic variables. Usually it takes 10 nanoseconds when you use atomic variables in single-threaded environments and when there are more threads the condition may be worse because there may be cache contentions. That's why we introduced local matrix and also we found we are investigating this approach. It's collocal matrix. This collocal matrix is also implemented using atomic variables but it can avoid cache contentions so in practical it will be faster. But we are free to discuss some other implementations. We always want it to be very fast. Thank you.