 So, hello, I'm Lugin Ceramityka and I'm a space developer and I'm going to talk about NETLIN decoding in S-trace. Probably everyone here would decide to come here to familiar with what S-trace is. It's a diagnostic utility, a Cisco tracer, but it's not only a Cisco tracer. It has some other capabilities, though they're not so prominent. So, and this is mostly the result of the fact that the interaction between the user space processes and the kernel is not limited by the Cisco themselves. And, well, S-trace as being a defining tool and tracer tries to capture as much as it can in terms of this user space kernel attraction. So, some of the items mentioned here like dispatcher syscalls or your urine BPF, of course, themselves as syscalls. But the issue with them is that they're not, like, behave like some other syscalls, basically. They do not impose any specific semantics in terms of the argument and leave it out to the rest of the kernel to implement. Like, the famous, or infamous example is IUCTL that basically has no semantics. It's just a way to do something with something associated with the scripter. But there are also dispatcher syscalls that are pretty much a kitchen sink. For example, FCNTL, it's mostly used for controlling flux associated with the scripter. Also, it controls locks and seals associated with file descriptors and some kinds of locks are not entirely associated with file descriptor but the underlying file. The same goes for PRCTL, which is abbreviation for process control, and so on and so forth. So, your ink is particularly egregious because, basically, it hides from this usual user-based kernel boundary that can be expected with PRCTL's goal by introducing the asynchronous kernel mechanism that has to be expected in a different way. But we're here mostly talking about netlink and, probably, worth noting, client-link is needed in the first place because, as I already mentioned, there already exists a virtual syscall, which is IUCTL, that basically provides ability for kernel developers for kernel to provide any kind of service associated with a particular file descriptor. And by the virtue of issuing these file descriptors, for example, using special syscalls, like Perf, SignalFD, what else? My MFD zone, you can basically produce this file descriptor as soon as you go and don't confine by those devices the level of life that DevFest has. So, IUCTL is a vehicle that is used for implementing many kinds of kernel interfaces, like most of them, of course, device-bound, like various kinds of devices, like NTD, video for Linux, AREO, NPD, RTC, so forth. But also, as I mentioned, by the virtue of producing virtual file descriptors, some parts of the kernel that are not directly associated with devices can also be controlled by RUCTL, such as SICOM device mapper. And that is actually kind of problematic. Well, it's not problematic per se, but the usage of IUCTL from both the user space side and in terms of kernel implementation involves dealing with lots of different kinds of problems that people most often are not aware of, for example. If you implement some IUCTL, you probably want to... You want it working on architectures that support compact processes. The most notorious one is X86, but it's also ARM, MIPS, and several others. You probably won't have an ability to extend the interfaces and don't add new ones instead of existing ones every time you need to add a new field or add a new flag, for example, because if I got to check the remainder of the field and since user space can pass garbage, it will pass garbage there and you can't basically use the remaining bits of the fact field anymore, for example. And all this knowledge is not part of any specification that is imposed by RUCTL. There are numerous guidelines like how to use requests, how to implement IUCTLs, how to handle compact, how to handle extending the interfaces, how to write these IUCTL interfaces and structures that are used by these interfaces in a way that is extendable. But there are still issues that are associated with using IUCTL. People usually don't care about anything at all and just hard code IUCTL numbers in their code so they bring some power and so on. One of the issues that Netlink addresses is basically trying to be a better IUCTL, providing a better general facility that allows various parts of kernel to implement their user space interfaces. By the way, how many here do know what Netlink is? Okay, so I don't need to explain what Netlink is. Great. So Netlink does a lot of heavy lifting by imposing a specific protocol and structure, basically mandating that every part of the message has its type and its length. And also the kernel part of Netlink has a certain facilities in place for parsing and verifying Netlink messages passed by the user space. So even though it is a lot of boilerplate code, it is still much better than IUCTL. So many interfaces as existing one as well as the new one decided to switch to Netlink. And basically there is a certain shift in the usage of Netlink comparing to IUCTL for example. NVIDIA has switched to Netlink since Netlink was historically created as part of IP route 2. It also uses Netlink instead of historical IUCTL interface and so on. So it would be nice for us to handle it and it does since 2016. It's not a modern feature. This is for several years. It was implemented as part of... The initial implementation was done as part of two Google Summer of Code projects. First by Fabian Siron under the maintenance of Gabriel Vazcar in 2011. And then by changing KIAO that has created a lot of implementation for handling Netlink protocol and Netlink as a Linux and several other protocols. Yes. Yeah, and since then basically decoding the implementation in Strays is maintained and extended as much as time. So a bit about implementation. Actually there is not much to talk about because Netlink is pretty straightforward protocols that don't have much peculiarities. Probably one major wrinkle that Strays has since we have a different way of handling memory because we don't have the Netlink messages in local memory but rather retrieve them from the Tracy memory. We have pretty elaborate error handling in this case and instead of handling the full Netlink message at once we read it piecewise and handle possible errors that Mayak are viewing because it's retrieval and handling piecewise. So rather than rely on Libanel and its implementation that basically allows you to pass a Netlink message and a set of attributes and get a table where you get all these attributes passed. We perform some kind of progressive parsing using Typekit decoder tables. So with regards to the testing it is mostly done the same way as the testing of most of other parts of Strays decoding capabilities which is we synthesize some payloads we want to check parsing for and perform specific syscalls on Netlink socket. Basically we write to the code this synthesized payload and check whether the way the message argument has been passed is the expected way. So here's an example of this kind of parsing. As mentioned in this slide we have just a set of various markers that aid this kind of testing like various part of... various implementators of Netlink interfaces uses various types and uses these attribute types in various ways and divides different hierarchies of attributes and as a result we have quite extensive set of parsers. And this is one of the simpler ones that basically checks whether attribute payloads that is interpreted as an object is passed properly by the virtue of trying to supply a shorter message, a message that bounds unreadable memory. So we're here on the second line basically unreadable memory which is denoted by its address and some successful parsing. Here's an example of using of a Strays output when it traces shows sockets binaries that is part of IP route to program suite and here you can see that even though I tried to turn it down a bit it's still quite elaborate because well... I think messages are quite elaborate and some of them have quite extensive headers so for the referential purposes the associated message is illustrated on the right. But you can see here that we try to handle various kinds of attributes and various kinds of data parsed like internet addresses big Indian, well... big Indian and precorder data and this kind of stuff. The interface is not decoded because it's zero I don't know if it's zero but yeah probably because the request doesn't have enough flux. Basically the same thing you expect from other parts of Strays in terms of decoding capabilities. So let's turn for a more interesting part which is well... I think decoding is not so boring. It's not so boring when general break something. For example, one of SogDiag protocols SogDiag protocol implementations namely SMC protocol decided after successful implementation of IPv6 support of the protocol being able to be tunneled on top of IPv6 decided to supply the address family it's tunneled upon as part of the InternetDiag header which is used by Strays to disturb which protocol the InternetDiag message associated with and basically it may be possible for Strays to decode this protocol correctly anymore it is for three kernel versions. The funny thing is it's mostly when unnoticed initially because the main user of this netlink protocol which is SS, show sockets it doesn't implement damping of sockets for all protocol families at once so it doesn't need to discern between messages belonging to different address families but rather it performs damping for each address family separately it just ignores this family field. Another part is that most of the time you can understand how what should be interpreted by looking at the attributes type and know where in the message you are but it's not always the case because sometimes these attributes or attributes hierarchies are protocol specific or address family specific and one way some parts of kernel implemented is to provide an additional kind or address family specific attribute that tells what kind of address family it is which works nice when you are just parsed of the netlink message at once but doesn't goes well with progressive parsing so as a result you have to perform some context tracking and pass this information between the decoders. Luckily all the parts of the kernel that provide this information about type of about the protocol address family this is the attribute hierarchy associated with first provide this information about protocol and then the rest that is protocol specific. So far it works well but who knows how some, you know, implemented decides to use it because because there is actually other way to provide this kind of information and basically use this protocol as a type of the container of a nested attribute. Unfortunately at least one place that provides this information this way it botched because almost all families except one do exactly that except for I've breached that decided that it has to be special and doesn't provide this hierarchy and unfortunately it can't be fixed because the netlink attribute hierarchy is part of UAPI and you can't break UAPI. So there is also some minor wrinkles for example as I mentioned since netlink is a better UCTL is supposed to allow netlink interface implementers avoid all the issues associated or most of the issues associated with UCTL. Unfortunately all this goes to way once someone decides to pass structures as is as part of as a lot of netlink attributes which brings back all the issues associated with extending attributes maintaining compact compatibility handling variances between alignment of not naturally aligned fields different architectures and so on. Despite the fact that it is more or less known and these mistakes are less often than they used to be they still happen from time to time. Luckily for S-trace most of the time it is possible to at least discern between various versions of the structure based on its size which is not always the possibility with UCTL because some UCTL implementation some UCTL interfacing do not populate structure size properly in the UCTL request number. Another form of fact that some parts of netlink interface implementation just ignore the fact that netlink attribute type field is for types and use it as a ray index. Currently we have several protocols supported in S-trace probably most prominent support is netlink route I would say that almost all message types are supported almost fully like we have some issues with here and there for example this some protocol specific attribute hierarchies but mostly associated with tunnels and Sokdyak is not actually fully supported because there is like one attribute that is not properly decoded that is associated with protocol specific circuit information but yeah it's a good shape at least for this several protocols there are of course many more but so some fun statistics I already have compared netlink and UCTL several times and it's pretty much comparable in terms of decoding implementation like UCTL decoders by far like most extensive set of decoders present in S-trace like they account for more than 10% of all S-traces code and basically netlink decoding support actually is quite close to it so as I mentioned like we have quite extensive netlink route support and one of most prominent parts of netlink route is netlink route link protocol set of messages and yeah it's basically the third largest file second largest file in S-trace code base right behind the main S-trace file stuff the file with utility functions what pertains to the future plans basically one area where significant view liking is support of generic generic netlink several attempts have been made to support it and coincidentally we've gotten yet another pull request basically this week so it's probably can be addressed and paved the way for adding various generic netlink protocols into S-trace another big area which is probably quite important is lack of proper netlink netfilter decoding because while we support the IP part of IP route we don't support much the TC and netfilter type and things I think the encoders are finally moving to some machine generated code and machine understandable specifications it's probably would be nice to support them as well similarly to the way we support generation of IUCT decoders based on these color like specifications and this is probably it so any questions? yes please I'm interested how you were generating the last first the next one you said that the 1.0 game, what's that? so it's relatively recent probably 2 years why NL is a netlink scheme based for various protocols that are written in yellow so there is like this is slowly going like one part is describing existing netlink protocols for example there is a lot of effort in fully describing KTH2 and another part for generating the part-shifts and handlers for this netlink in the car because well as everyone probably knows that netlink decoders and netlink parsers in the kernel and also boiler-tweight code that don't that can be easily generated I would say you still have 5 minutes, do we have any other questions? in continuation of the previous question so the good thing if in the kernel they would use this generated power or interfaces this would reduce the number of all these audities in strays after a walk around and what could be done on the two condensed kernel developers actually do this? well it's basically the same way as with IHTL the first like several historical netlink protocols are handwritten and have all these peculiarities and we basically have to live with them but as the subsystem measures it's more stimulant and newer for example genetic netlink actually has some provisions like strict specifications how to handle structures, how to handle arrays this kind of stuff and provides specific netlink schemes that allow this implement in the same way in an extensible way so basically it probably will be the same as with IHTL like we have some part that is historical and contains all these peculiarities and we have some generated part that supports the newer interfaces and newer protocols yes? with regards to the output it doesn't there are a few examples in the slides and I must admit it doesn't have that much comprehensible what could be done about it? almost all S3 output is currently tokenized there are like several remaining bits but there can be definitely done specific push at least to coverize that and the next step is probably to provide additional information in terms of how the next step is probably for decoders to provide additional information for output generators that can be used for producing more structured output we actually have a pub request for that but it again went nowhere probably 3rd or 4th time but yeah I mean it's better shape because when it was first time attempted 2015 it was basically rewrite of each decoder in a structured way and it was unsustainable right now the latest iteration is pretty close to what can be done in a way that actually upstreamable at least well the tokenization can be done we can already almost everywhere like there is some weird parts for example device mapping decoder decoding S390 specific syscalls ptrace commons that are not very well tokenized but otherwise it's pretty much can be interpreted as streaming tokens at least