 All right. Hello. Welcome to another stream the first stream of 2019. It's pretty exciting as Some of you may be aware there's a voting site that you can now use to vote for what you want to be the Next upcoming stream idea. It runs basically we did a stream where we built this website And it basically runs Ranked choice voting so you can rank all the things you would like to see and it will basically run an election every time I'm about to run the stream and that way we can sort of collectively decide what we're going to work on next the current winner is That we're gonna port flame graph to rust so flame graph is a Code profiling tool that gives you things like this for your code saying like oh this is where you're spending your time This is where your program is spending its time and we're gonna take the tool that generates these things and try to port it To rust and that's gonna be a little bit different from any of the streams who've done in the past because the past streams Have often been sort of focused on acing computation and that kind of stuff Whereas this is gonna be much more straightforward like Basically writing a program doing like parsing and performance analysis So it'll be very different from the past streams, but I think it'll be interesting I don't know what the next thing will be at some point I will mark this as being completed and then the algorithm is gonna run another election and we'll find out what All of you want the next stream to be All the recordings for past streams are up on YouTube So if you're ever like if you're watching this live and you didn't know they were recorded then they are recorded They're all here. Most of them are under the rust live coding The rust live coding playlist. I'm also running a class at MIT at the moment called hacker tools Together with Jose and Anish and we're basically doing sort of walkthroughs of how to use your computer more efficiently Like how to set up programming environments how to use editors version control all of that stuff These lectures will all be video recorded and posted online. So I will Probably link to those either from Twitter or for my YouTube account at some point just you're aware that those will be coming They will not be under rust live coding, of course One thing I wanted to mention before we start is I've gotten a bunch of emails from people and Twitter messages and such where people ask sort of what What is a good project for me to start working on if I just I want to use rust, but I don't know what to build so Detonai releases repository a little while ago called requests for implementation and it's pretty neat It's basically a list of things that they wish existed and you can submit more if you want But most of them are intended to be small well-defined not really research project Sure, I can do that They're not really supposed to be research projects or anything just like things that are well-defined in scope small easy to implement And so these are really good starting point for if you want to build something on your own dashboard category I'm guessing there's not a programming Science and technology, huh great Programming Great rust all right great. It's now in the science of technology category great But yeah, so this is a really cool place to look at things most of them have a design as well for how you might implement it And so you should really go in here and take a look if you're looking for projects to start up on The other thing is if you're looking for something larger like if you've already written some programs and you're you want to like Have an impact or however you want to define that I Have a bunch of crates that I have built up through the years Some of them I maintain some of them I sort of built and now I'm not really doing much with them And so these are either listed as passively maintained or looking for maintainer And so if you go to crates.io and you go to my user take a look You probably want to sort by recent downloads because that will show you the most active crates if you see anything here We are like that's cool I want to help maintain that like reach out because I totally want more maintainers for these projects Fantagini for example, I would love to have more maintainers for For the next life go any session that's fine So yeah, if you want some the bigger to maintain take a look here and see if there's anything fancy So that's those So if are there any questions before we start dealing with film graph because that's the next thing we're gonna do No, great. Okay, so flame graphs It turns out that when you're writing any kind of performance sensitive software very often You want to figure out why your program is as fast or slow as it is and normally the way you do this is Well, there are lots of tools to do this one of the most common ones is one called perf And so what perf will do in fact I can show you here Is that font large enough? I think it probably should be but I'll make it a little bit larger So now I'm on a remote machine that has Noria one of the research projects I'm working on and I'm gonna run a benchmark now. I'll make it run a little bit shorter So this is gonna run a benchmark if I SSH to this machine you'll see that It's really busy. It's doing a lot of work and So we're just gonna let that run for a little bit. You forgot to click update. Damn it. Did I Let's try that again While the benchmark is running in the background anyway You are totally right. I did forget to click update Programming and also probably software development English sure, let's try that again. All right Right, so when I run this benchmark it generates results and say hey, this is how well I did and This doesn't really tell me anything about what was slow and what was fast Like what was my CPU doing so that all of the eight course of this machine were totally busy? Well, this is where perf comes in so perf has a bunch of different commands that are useful Perf stat if you run perf stat it will occasionally print information about the process that's running Instead of tell you Apparently did not want to do that. Oh, sorry when it finishes On YouTube the video is a bit pixelated on and off That's interesting Well, I don't know what that is Let's see it seems to be sending a full 1080p stream, so I'm not entirely sure Bitray 2400 Well, everything seems fine as far as I can tell from here Maybe my Wi-Fi is acting up I guess what I'll do then the recording should be 1080p anyway So what I'll do is just increase the screen size a little and then hopefully the pixelation isn't too bad How's that? Is that somewhat readable? Okay, so Notice that I ran this command with perf stat and what that does is after the program is finished running it tells me a bunch of sort of interesting stats about my program like how How much time it spent Executing cycles how efficient the execution was how many times to switch between threads that kind of stuff how many branch Misses lots of information that may be useful in performance debugging But this still doesn't really help tell me anything about what in the code that I wrote Which is like there's like 60,000 lines of code in this project like where am I spending my time? Well for that there is perp record Perp record is gonna sample your program Ever so every so often and basically check where your program is like it sort of stops the CPU and ask you ask it What are you currently executing and then if you sample this lots and lots of times? Over the execution of a program you learn which places are hot places in your code and those are the things that are executed more Often and where you're spending most of your time Top-level stats they're aggregated overall threads the entire execution of the program so Perp record is gonna sample throughout the execution and then afterwards give you basically information about where The CPU is spending the most time now you probably want to include dash G which basically In the in addition to asking where it is it also keeps track of the call stack So you don't just learn that a bunch of time was spent in hash map get But you learned that it was hash map get which was called from somewhere else to keep sort of the call graph You often also want to include call graph dwarf So this uses the dwarf debugging symbols, which gives you a lot more information about the names of functions and Basically gives you a more precise call graph, but also causes you to generate much larger things So let's run with just dash G first to see what it looks like So now it's gonna run my program. It does come with a little bit of CPU over it But just to show you why this matters. So this is gonna Execute for like 20 seconds or so And what it does is it produces a file called perf dot data? So once this finishes Great You see perf record here says it was woken up a bunch of times to write data it wrote 42 megabytes of perf data And so there's now a file called perf dot data my current load current folder And if I do per report Then per report parses that file and prints out This is where time was spent and then I can sort of expand Like for example a bunch of time was spent in future poll Yes, you see here. There's like a lot of things that I can't really dive in deeper into Like the call stack is there But it sort of seems to stop at weird points and this is one of the one that the drawbacks of not using dwarf information It's often you don't get the backtraces you get especially for complicated programs Aren't just all that good. So we're gonna record with dash dash call graph And that's gonna record a lot more information about what's currently going on at each sample The result is that it's a little bit higher performance overhead when benchmarking Or when recording and also the file ends up being a lot larger because there's a lot more information in it So you saw the previous file was 42 megabytes had 622,000 samples here This generated a four gigabyte file With 506,000 samples of fewer samples and significantly more data But if I now run poor for port, I think that she's default, but we'll see so now That's annoying. There's a There's a bug in perf on my particular kernel that sometimes means that perf can't read the file When you include dwarf, that's a little unfortunate. I Don't know if I have a way to work around that That's really annoying Yes, you notice dwarf produces a lot more data and if we're lucky We were not lucky fine Let's run it for shorter and see if that helps But what what this would give you if it if it did in fact run is basically a It gives you a better view of here, maybe so this was a runtime of 10 seconds It's still generated about two gigabytes of data This was already fixed So it gives you a better call graph and that call graph that you saw me navigate earlier is Really handy, but it's kind of hard to parse because it's given you as a it's it given it's giving you a Sort of just lists that you have to walk Sorted by how How frequently some top-level function was called This might fix it But what flame graph lets you do is take the output of a perf report and then generate this so notice at the bottom This is sort of the entry point to your program And then it says like called this function called this function this function called that function this function called These three functions this function called this function at sort of all the way up And it lets you inspect visually really nicely where time is spent in particular You can look for things that are wide and they spent a lot of time of the overall compute And then you can zoom in by saying like I want to look at this function Okay So this function called these functions and so much most of the time was spent in this particular function And so there's a really nice way to sort of dig through visually the performance of your program There's a question woken up n times means that it also sampled the execution state that many times no so perf actually samples very frequently but if you notice the message it gives is Woken up that many times to write data So if it tried to write to file every time it sampled it would be far too slow And so instead it only occasionally So it will only occasionally write to basically flush out the the memory contents It's a little annoying that I had to do this but it might not even be fixed upstream So flame graph is written by a guy called Brendan Greg and It has support for Basically, it's a two-step program where the first thing you do is take the output of some tool that generates profiling information and Create what's called a fold file So it basically keeps information about which functions called which functions in a sort of not standardized format But in a format that the flame graph binary understands So it's basically a translation process where you can take output from many different profiling tools and produce the same file And then use flame graph to produce an image of that file No, I don't do that Apparently I have to do this We'll let that work in the background So for the way you would actually use this program is if you have some perf data file You would run perf script and then you would pipe it through stack collapse perf, right? So stack collapse is one of these programs specifically the one that's built for perf Perf script produces a textual version of your the recorded samples That are eat basically that are guaranteed to be in a format That's somewhat reasonable to parse and then stack a perfs is going to produce that fold file It's ignore the all flag and then you would pipe that through flame graph Which is going to produce the final like flame dot svg Perf is Linux only on other systems you can use detrait. Yeah, exactly So this is why it's divided into like it supports multiple input formats because perf is a Linux only tool But you can use other tools like detrace and I think there's a detrace I can't see it now, but there's also BPF trace, which is really really handy So whatever kind of profiling information you have there's probably a stack collapse tool for it That will produce the appropriate fold file and so the the pipeline looks a little bit like this You take the output of your profiling stick it through stack collapse program pipe it through flame graph and that generates your image And of course you can edit like gdb for everyone sort of although. It's not really gdb Because gdb lets you do debugging, but it doesn't really let you do profiling Like you sort of can do profiling with it, but it's not really what it's intended for In any case this is sort of the the pipeline and in this stream We're gonna focus on perf because I think it's not realistic for us to port all the stack collapse things in one go And so what we're aiming to do is to port both Stack collapse perf and flame graph so that we can basically run this pipeline and have that work correctly What I'm going to do is I'm gonna take the perf script from the noria run and store it in a file Just so we can operate on that because it's It's large enough that like sort of four gigabytes you saw is large enough that the existing stack collapse actually takes a while to run So hopefully if this machine comes back online at some point then What I'll do is I'll measure how quickly stack collapse perf runs and Then we'll try to compare that to how fast the Rust version will be if we manage to get it working in the time Let's do It's basically I just want a perf script file I Could also benchmark some other program locally, but it's it's handy to have something that right So this is a two gigabyte file. So that is hopefully at least Please work Now it's being difficult about this but fine. So in that case what we'll just do is we'll just record Without dwarf information and just record it running for longer. Just so we have a decently big file to work on It's not terribly important what the contents of this file is as long as it's large enough that we can benchmark it Another cool version of the flame graph tool. Oh Yeah, there are a couple of other things like a stack is I think is the name of one. Maybe that's the one that was linked in chat This one and they're all like pretty much the same. I don't know if there's a demo image of this Yeah, right. So generate. Oh, that's not great Yeah, there are other tools that do the same thing So the programs we're gonna port our stack collapse perf and flame graph They're currently both written in pearl. So in theory, it shouldn't be that hard to produce a program that has higher performance It doesn't mean that we sort of have to understand what what is in the perf script And so we we have two options here either We could just start reading the output of per script and figure out how to parse it Or we could take advantage of the fact that someone has already done that work And all we really need to do is port this file. We're gonna go with the ladder because trying to parse Purse per script on our own. We're probably gonna run into really weird cases, but let's take a look at the help pages lease So, let's see The question is what per script outputs so as with all previous dreams I've not looked at these in events because I think it's more interesting for us all to figure them out at the same time comma Separated list of fields to print so that sort of sounds like it's almost generating a CSV for us in fact, I wonder for this Parties of multi-line stacks generated by perf script. Now. It's a semicolon separated stack followed by a space and account If memory addresses are present, they're stripped Should include stack traces. Oh, this is another thing. That's important So this is CDD a CDD L licensed, which does mean that I'm not entirely sure it I looked up this license a little bit And I think the idea is that we basically have to open source the end result and we probably have to use CDDL when porting because technically it's It is a modification of the original source. I'm not terribly bothered by that So if you see as you see stack collapse perf is not that large. I'm just trying to see What it is how it parses the inputs that it gets Okay So it looks like there's some line in perf script Okay, so if you haven't read Pearl Pearl before Pearl has a lot of like special arguments There's special variables. So dollar underscore for example is the current line So this is basically iterating over the all the lines of the input This is matching the current line against that reg X So this is saying starts with a pound and then followed by command line So this is just me that there's some line in the in the script that That contains the name of the program that was run skip comments Ignore filtered samples So what is P name here? Oh, right? This lets you so it sounds like there's a way to Say that you want to filter. Yeah event filter probably Okay, and then it's parsing out event records and those are space separated Okay, so it sounds like for this we can actually just use the Rust split split white space Method on strings To parse out these fields Okay, and then it's matching the line against whatever up to colon followed by Non-space up to colon. Okay, so it's parsing out this part of the line So this reg X is anything that Wonder more of anything that's not a space followed by a colon followed by any amount of white space Followed by the end of the line. So that's parsing out this word the last word on on those lines Right, and then it's doing event filtering Okay, so It looks like there are a couple of different types of lines. So if you look at the example that's further up here Where's the example input here? So there are some lines that look like this and there are some lines that look like this And I think each of these is an event and then this is the stack when that event happened is the way I read this as and So this is why in the example output. What this is saying is when this event. So I guess swapper is the current location and Then notice that it it's producing a semicolon separated list of the stack at that event Followed by a space followed by the number of events with this stack I think that's basically what it's counting and so that's why here It's parsing out. This is the the lines that look like that. So that's the one that had swapper earlier So we're parsing out those here Otherwise, it's a stack right. So it's a stack line, which is not the not the start of an event So that would be this is an event line and this is a stack line and So a stack line I Sort of want to see the stack line down here. Oh well any matter of spaces Any thing that is just ASCII characters basically Spaces anything This is a weird regex Feel like we could probably do better than a regex here Worst case we could also parses with the regex great and use the same regex is because my assumption is that these are actually been battle tested What caused an event to get in Perth? Um, so Perth can run in a bunch of different modes. Usually your CPU has built-in support for event sampling. Basically, you can tell the CPU to like Sample at certain rates or when certain events happen and basically Ping you as some external process and say this thing happened. So an event is just like The program the basically Perth decided to sample the program at this time is an event And there are a bunch of different types of events. So that's what this business is So this is the kind of event that triggered the sample sort of That I recommend that you take a look at Perth help record It explains all these different event types and it specifically explains the ones that are available in your machine Okay, so this is parsing out the line Stripping out symbol offsets, okay, so in line I think is With certain debug symbols, there are some pieces there are some functions in your code that might not be treated a separate function So they don't really get their own stack line because their code is just folded into whoever called them But I think you can extract that in line information. I'm guessing that's what this is about I guess we'll see when we start parsing Reminds you of eBPF. Yeah eBPF is basically not quite inspired by Perth, but it it is a strictly more general thing for event sampling in the kernel Split by arrow Why split by arrow there's no arrow in the stack lines Oh in raw Interesting. I guess we'll figure out what that means. Oh This is probably for So the name of functions that are included in the debug information is going to depend a little bit on the language you're using so for example in If you're sampling a Java program, it might actually give you the sort of module path as well And so I think what this is trying to parse out is the The module name of the function So that you can say so you can sample you can sort of highlight things that are modules separate from things that are functions By basically by trying to parse the function name that's given out and this tidying up of Generic names so often the debug symbols are going to generate names that are Sort of obfuscated now. There are actually tools for doing this What's the one in rusts? I think Fitzgen wrote this Mangle I think it's called that's the one by fission. Yeah, I was right So cppd mangle basically demangles Symbol names so it might be the way you end up using that to tidy up these names Right so tidy generic tidy Java Detect things that are in line or kernel things Okay, so it looks like it's basically just parsing out Events and stack lines, so that shouldn't be too bad. Let's look at whether we actually ended up with some samples here Okay, so this is now 160 megabytes of perf data so get the report and see what we got right So this gave us a bunch of samples. It is a little sad that all the That these are all weird I Wish we get this before but I guess we'll have to make do for now So I'm gonna stick this in a file. Let's see how long it takes to do that So PV is a is a really handy command line tool for measuring how fast something For measuring how fast something is consuming data. So this here for example, like new perf script Basically measures the throughput so notice the perf script actually takes a while to run because it has to Parse the perf file data file and produce this other format that we're looking for and so you notice it can do about 80 megabytes megabytes per second And then I'm gonna steal that file. We're gonna do card on you been what are we gonna call this? So flame graph is the original. What's a rusty name for flame graph? Rust catch Is there a way to make like actual rust Catch fire using some kind of chemical reaction because that would be kind of fun Does rust catch fire that's not what I meant Maybe not all thermite is good. I feel like there's already a thermite crate Thermite is good though Experimenting with a toy browser engine We're gonna go with their might. Although It's a little sad. They're already is it great. How? Busy as this crate last updated two years ago three years ago We're gonna go with their might. I like their might okay, um So we're gonna steal this Perf.data.script file and stick it here Don't you need to recompile your binary to include dwarf metadata Yes, so in my cargo.toml by default rust will compile in Debug symbols and debug mode and not debug symbols and release mode. However in this particular project I've set That in release mode it should always build with debug information So debug information is always included with this particular program And so you'll notice the the perf script file actually ends up being fairly large because it the perf.data file is a binary format whereas perf.data.script is a This is was a fire at the amusement park in rust Germany That's pretty funny. I mean it's awful, but what was okay? What's the name Pascal? What's the name of the amusement park? I? Think that that might be true grotesque Let's see I Mean I could also Shouldn't have done that that's gonna take a while to rebuild But fine. I guess we're gonna rebuild it and try later Okay, so we now have this perf.data.script file and let's just for fun and see what happens if I take this And really not have PV here either that's kind of silly you rope apart. Okay, so Now that we have this perf data script. Let's pipe it through PV and then pipe it through stack collapse perf To I guess actually and we're gonna stick that in Perf.data.folded. I see what kind of throughput we're getting here Okay, so stack collapse perf is giving us what about 16 megabytes a second a Little bit higher. So this is the pearl version that's in the repository So notice that parsing this file, which isn't even all that large of a of a perf report Takes a while right Inferno In I like inferno. That's even better and also if it's not taken that's perfect Sorry there might they're gonna be inferno instead Yeah, but this oh, that's why red ox is called red ox Rust and burning are known as red ox chemical reactions Okay, no inferno is good What's your demo workflow for noria look like when builds take so long Builds don't take all that long as long as you don't have to cargo clean, which I just did Okay, so as we noticed it took almost 30 seconds to parse this perf data script file It's half a gigabyte large, but it should not take 30 seconds to parse that file This is one of the reasons I want to want to One of the reasons I want to port this to us because I feel like that should be faster Once you have this folded file though, notice that that folded file is actually a lot smaller This is in part because we don't have Dwarf debug information and so there are a lot fewer strings in here There's also why I'm recompiling this to see if I can get a dwarf one But if we now take the reason I'm doing this Separately is so that I can actually benchmark each of these in isolation So this is going to generate perf dot data dot SVG and This is probably going to be pretty fast Yeah, so the because the final file is so small piping it through flame graph doesn't take that long So it sounds like the thing we want to focus on this is stack collapse perf Which is the thing that's taking a long time Okay, so let's look at our perf data script file. What we've got here. I should not have been that should I Okay, so it looks pretty much the way it was documented in the file, right there are a bunch of event samples and for each one There's a stack So that seems handy I Guess we start by Here Script just so we have a reference open I Guess eventually we're going to bring in clap, but for now we're just going to look at standard in Eventually we're going to want to let the user specify a file Let's take standard in whatever there's a lot of stuff that we can do but let's start with a very basic So I just take a file from standard in I'm going to write the standard out nothing else So we're gonna go look at Building nori eats up all CPU cores for me and hangs PC with Linux. Yeah, that's all right So we're gonna Sure probably also on Specifically I want to lock standard in because we don't want to take the lock on every iteration So standard in is basically synchronized because you don't want multiple threads to read from it from this at the same time Because then they'll each get subsets But so if you read from standard in just direct if you just use Whatever it is The thing that does that you read normally from standard in It has to take the lock each time the same as true if you do a print whereas if you lock it once then The overhead of reading from it is a lot lower We're probably gonna use a buff reader because it's really inefficient to not use one Buck reader is basically it keeps a memory buffer that whenever it does a syscall to read from from its input Instead of reading just like one byte at a time until we get a new line It reads as much as it can into the buffer and then it looks at the in-memory buffer instead And so in particular what we're gonna do is we're gonna do That's interesting So this internally uses a buffer reader That seems fine I mean we do just want to look at one line at a time. So sure. Let's do that. Oh But that doesn't lock I see no, I do want to keep the lock because we're gonna read many many lines So like IO reader for files and go. Yeah, it's pretty similar I'm trying to learn rust from only having a Python background. It's pretty complex. That is true Rust requires you to understand more of the underlying mechanics of what's going on than Python does But the upshot is that you have a lot more control over the the performance of your program If that is something you care about I also find that I really like the more expressive type system It doesn't mean that the compiler is gonna like stop you from doing things more often But it does also lead to fewer runtime bugs in your program. So I think it's a worthwhile trade-off If you're reading buff reader, it locks standard in internally, which is pretty efficient How can it? If you're using buff reader, it locks standard in internally. How can it do that? It doesn't know that the read it's given is standard in unless it's using specialization. I'm not sure In any case we're gonna make this explicit and then we can we can always tweak it later. So what I'm gonna do is We need to have a buff reader, this is the thing we're gonna read into We're gonna have a String buffer that we're gonna read lines into and then We're going to so this is gonna take standard in so This doesn't need to be And then if you look at buff reader Down here it implements, where do we have buff read Buff read so it implements buff read which is a trait saying that it can Basically that it allows you to do certain operations on the stuff that's in memory more efficiently In particular what we want here is read line So we're gonna do something like while let Actually, it's a good question. I think for now. We're just gonna do while reader dot line We're gonna read into the line string Is buff reader tight as standard in no buff readers not tied to standard in that's why We're constructing a buff reader here with new and explicitly passing in standard in buff reader can operate over anything that is read and standard in is read We're just gonna unwrap this So while We don't even really care about what the The the return value here is gonna be the number the lake basically the length of the of the line Which I don't think we really care about Does this include the new line? All bytes up to and including the delimiter will be appended. Okay, so it will include the new line Okay, it's good to know Right so what we want to do is if you remember from stack life collapse perf if we go back to here Oh, this doesn't have any comment at the top, which is also a little weird So it means we don't actually learn what program is running. I wonder why Do while it and unwrap at the same time makes sense Yeah, I mean the thing here is we You yeah, I guess While requires that you give it a Boolean. This does not give me a Boolean. It gives me a number It gives me the number of bytes read This Let's me match on the thing that read line unwrap returns And so the while let it's just always gonna be true. This is basically the same as a loop and do this It just happens to be a I think it's a slightly nicer construct It's also because eventually we're probably gonna reconstruct this to be question mark instead And once we do that actually I wonder if main can can main return errors yet on Stable I don't remember probably not But it's on its way Right, so the things we want to parse out So this is basically the equivalent loop in in stack labs perf It looks for comments. And so we're gonna do the same thing if line starts with Does it take any pattern? remember We go up here. I feel like the The expansion state of things is never what I want in In rust code in the rust documentation. I feel like I always want block. I want The default I want is this like all the Method blocks are expanded so I can see all the methods, but I don't want any of the docs It can return if the implement debug that might be true Yeah, normally I would why let on the result the reason I don't make this while it okay is because we would just silently drop the error Which I don't want to do is some can do the trick. That's true But then we would also just drop the error. It would actually be is okay You're supposed to be careful with unwrap Yeah, so you are totally right. This is not the way you would normally write this code I'm only writing this to prototype the way you would probably write this file is something like like operate on and it takes a buff reader And returns a IO results Nothing and then we would do like inside here We would do this in fact sure we can just do it now It's not terribly important, right? And then you would do this where question mark would return to main And this would just call Operate on Right, so this is the nicer way to do it and then now in main you could unwrap or you could print a mirror a Message on the error So the only reason I'm not doing that right now is because for the time being I think it will only complicate the code But you are right that that is the correct thing to do and what we will do eventually You can return errors from main on stable just try to do yesterday No, no, maybe my stable is old. I don't think it is Why not and then We can't use and then here because well Because we want the loop we need the loop here, so there's no there's nothing to end then right? We don't specifically here. We don't care about the result. I could And then to map it to a Boolean, but that just doesn't seem worth it All right, so we're gonna say that the line starts with a pound Then we're gonna continue. I'm just gonna ignore anything. That's a comment for now Similar to what this does Although notice that there is this business where we might want to parse things or a command line But for now, we don't know what those are so and we don't see them in our files. We're just gonna ignore them This is saying that if we encounter an empty line Then something has to happen right so it's gonna be something where we're over time. We're accumulating a stack right and Once we reach an empty line that means that the event The stack is ended when we're about to encounter a new event and in that case We probably want to save whatever stack we're currently on so notice. This is where it It sort of adds this Keeps track of the stack that it's accumulated so far. So while we're doing this counting. It's probably gonna be something like occurrences, which is gonna be a hash map and If wine is empty Well, I guess actually because we have to trim the new line Is there even a trim? I think there might be a trim. You may need to return results Yeah, I may have to make main return results. It's true And Yes, the unwrapable panic in the case of error that is totally true. We will get back to error handling. I promise Okay, so this is saying that if the line is empty Modulo a new line then we're gonna like So we're gonna have to keep track of like the current stack And the current stack is gonna be like a vec and Here we're gonna say occurrences Dot entry stack dot join Semicolon Jaws or insert zero That plus equals to one so this is pretty funky occurrences great So if you haven't seen this entry API before it is really neat so occurrences here is a hash map entry you give a key and if the If the key already existed in the map, then it gives you back basically a reference to the the current value otherwise it gives you back basically a Pointer to the empty slot and then you can do or insert which is if it was a pointer to the empty slot Put the following value in there and give me a pointer to the value And so after we've done this what we have is a pointer to the value no matter whether it existed before or not and Then I'm saying now. I want to plus equal that value. I might not even need the star So that's kind of neat And then we want to stack all clear right because every time we finish a stack Then we don't want to keep adding to that stack as we move on to later entries So we're gonna clear the stack and then we're gonna continue So this is basically the same as this business I Don't know what this P name is we need to figure out what P name means But we're gonna just sort of ignore that and I guess here. We're only gonna do that if not These links are the oh Yeah, great the links in the chat show up on a blue background with a blue Text it's very unhelpful It's probably process name. I agree I Guess we'll get back to it when we're parts later Okay, so let's leave some comments for ourselves. So first this is End of stack frame end of stack So a mid stack entry So this counter is just gonna keep track of how many times we've seen a particular stack and then down here One of two Types of line are possible event line start Either we have an event line like so and Then let's give an example of some event lines I guess we could probably give these either we have an event line Which Or We have a stack line and the stack lines Let's take the examples from up here And let's also include some examples from our own file Just so that we have them for later reference. Let's also So this is just sort of for our own For our own purposes later. This will be easier to keep track of oh Yeah, Vim commands are really handy There's a as I mentioned earlier We're doing a lecture series that MIT currently on hacker tools and that includes things like editors Learning them commands super handy someone mentioned in a YouTube comment that we should do I should basically enable Screen caps so that you can see all the keys I press I figure it's it would probably be mostly noise So I don't want to turn it on but I might do one stream where I have it on just so people can see Or we have a stack line That is a part that shows one line of the One Stack entry either we have an event. All right, so how do we know whether it's an event he parses it like this? I guess I Don't have a good sense for how to parse the difference between these. I think I actually want to do with things a little bit differently We know that either we are in an event or we're not so what we're gonna do is this And then we're gonna say instead of just seeing whether the line matches If we're in an event if we're sorry if we're not in an event then we must be about to enter an event Otherwise this into here So if we're not in an event then we are now in an event And then whenever the event finishes Then we are no longer in an event. I think this is a more robust way of doing it I can't find the docs for a vec.join. It's not on vec It's on slice where the type is string. I think Or I'm just like misremembering Okay, so if we have an event Then we know what's the way That he parses it That's such an interesting way to parse this Hmm You get it on back through D refi. I think that's true. Okay, so What is it we care about here what's calm actually let's have a look at Actually, this is finished now. Let's just try to Record that with Let's do Help script and see what it says fields options are those things Does it say what the default is These are the fields Oh, that's so hell unhelpful. Why is it not saying what these are per? calm means command name of the task So this is probably the name of the thread would be my guess I do wish that this would say what the default Why does it not say what the default? Script is that's very unfortunate um Although here I guess this is the default per script Default fields. That's kind of unhelpful Well, I guess when in doubt Um When in doubt look at the source code. Where is the source code? This one really Uh And where do we want to go tools? Perf built-in script Off-put fields Uh-huh Are these the same My p sum of set dso period that includes Output Okay, so the first fields are always the same Oh, no, those are just the available fields. It's not the default fields. So what are the default fields? Uh, let's search for Uh-huh, uh-huh Hmm, maybe it's not even declared here What the defaults are No, they're only used there Casual kernel of code hunting. Yeah, I know, right? Um I wonder if this is just the default the default is all fields But that's a little weird because if you remember back to this file, um In this case the assumption is that whatever is last Is the um Is the event that occurred But that's not the case here Right event name is there not anywhere else So Parse output fields. So look at that. Oh, I see. Okay, fine bars Okay. Oh, this is awful Argument parsing directly by string comparison. Great. Where does it see? No fields requested Okay, where does it decide? So this is parsing the stuff that follows dash e So I guess the question is where is this called from all trays? valid types fields It doesn't say what the defaults are Command script. Here we go Yeah, those are the fields. That's fine Uh-huh set of scripting sub command Has record args Forks to do some stuff Uh-huh, I wonder where it decides to print that's really uh, okay, where does output come from? field to string Okay, I think that the default is not well defined I think the default like depends on whether you say you want hardware software stuff I think it really is just this it's just this list of fields And so therefore I feel like this is wrong. I feel like this is assuming that you did I don't even know like trace points. Maybe I guess we could just look at the output we got So the output we got has calm PID This is the tid so let's see if that matches any of these uh calm tid CPU Sure, that's probably cpu time. I can buy that output time No, not output time. Well, maybe actually Maybe this is cpu and this is output time Uh Event name. Oh, maybe this is output time Sure, it does seem to be monotonically increasing. Okay, sure. So that is output time then and this is The event name Okay, let's go with that And then we can adjust it later right so that means That what we're going to do over here is we're going to do a line split white space It's going to be that And we know that the first field is going to be calm And again, we're going to unwrap later. We're going to not do that and instead probably just skip the line So we'll probably have a function that's like parse line. We might guess Uh, the next thing was tid it's going to be fields dot next dot unwrap dot We're going to parse the tid as a number probably We may not even care. In fact, is the tid even important? Do we need the tid for anything? See here You parse out tid by this This is why this parsing becomes pretty important. So it's any string This non greedy match is tripping me up though. So it's any, um String without a white space followed by any amount of characters. Maybe Followed by some spaces followed by a number Because it's parsing everything up to the first number. So it's not really split by white space Right this for example This thing is still the calm That's good to know Um, oh, it's the slash It's the slash that does it right. I mean just gonna trust that This guy figured out what was going on So in that case, let's uh Undo that and instead say Sure, let's bring in the regex grid. I'm gonna assume that this regex is correct and then take it from there probably Um, so in that case, let's switch over to cargo toml and do regex 1.0 and we're going to do Match line it's going to be regex regex new So that's going to be matt. I guess match event line and then we're going to have This is match stack line. I still don't believe that that's true Feel like these should be easier to parse than that, but So matching a stack line is going to be like this And then we're going to say here What's the oh, that's right. No, you're totally right The regex crate does not require the slash. It's good catch Oh, and I also need to unwrap them well You know error handling Um, and then I guess what I want is captures Not quite captures. Actually. I want just a single capture Yeah, that's what I want So if we're in an event then We're going to match on match event line dot captures on the line Actually online.tremend because it Buff read includes the new line F12 Oh, what do you mean by f12? um Right, so we're going to match on that and if it matches Then we're going to get the captures Or I guess we could call that fields um If it does not match Then something is really weird, but we're just going to continue We're going to say that we're not in an event after all That this was just a unexpected line, I guess Unexpected what that line was Question mark after the plus makes it match laziness did agree to Lee no, I Oh, yeah, I mean question mark is super handy I just don't understand why it's in that particular match That's the weird part I guess it's because Okay, so I suspect that the reason it's there is to make sure that you Actually, this must be why it's there. It's to make sure you don't match past the first number, which is going to be the The TID Right, so you want to match up till the first number even if this includes spaces So that's why it's there All right, so if we get back fields, uh, then And notice how straightforward this porting is right like we're we're actually just transliterating the code to To rust So we're gonna say we're actually gonna do so calm is going to be fields dot get It's one indexed right Zero is the entire match. Yeah, and then next thing that comes in is the TID No, PID and then the TID Um, and Mate that's a good question. What does captures give you for a Uh capture group that didn't give a match If the capture group did not participate in the match Then none So I think that means that We're gonna match on fields dot get three And if that is some Then that is the TID Which means that I guess let PID you want fields one. No, I specifically won't get this is a captures Oh, you mean here I mean if captures implements index then sure This one, I know that fields one will always participate. Sure. That's fine Um Here I guess what we want is TID PID or PID TID if you will um If there if the third field participated that means that we got a TID which means that PID is fields Uh two And TID is well Field three However, if The third field was not there So that's basically we're matching on whether there was something following the slash in this case. This is PID TID in this case There's only uh TID and not a PID Right. So, uh, so three which is the thing after the slash did not participate So we get none in that case what we want to return is a question mark for the PID uh and the And fields two as string as the TID So this is equivalent to what's being done over here Where if the TID is none then set PID to question mark. It also gets rid of the asterisk, does it? That's handy Sure, then I'm all for it Um, great So we've now parsed out those and then there's this parsing. This is the business that extracts the um The event and this is where I'm I'm Very confused as to why It's okay to assume that the event name will always be lost But if that's what's being done here that I guess we're just gonna Assume that that's fine and then we can always tweak it later So we're gonna do match event line event Uh any reason for regex versus text.io scan macro Uh No, really never heard of text.io scan macro But regex is straightforward enough. It's in part actually because the Existing code the one reporting already uses uh regular expressions if text.io's already does that then that seems fine Yeah, I don't I don't particularly mind in either way um great, so This is if we can match out an event. Okay, so um if So this means that we did we were able to match out the the event at the end then Parts out the event which is not really used then event is going to be Captures one And then there's this like event filtering business Which I guess we're gonna just ignore So here we're gonna do to do Filter by it. I guess what did they do? They don't really have this issue So we're just gonna leave it like that Um What is mpid and mtid? Why does it keep track of that? Are those used somewhere here? No Why this reassignment? I wonder Because that is assigning to globals, but I don't see those globals being used anywhere else So why are they globals? I think that must be left over from some old code Um, okay, so there's an optional here of whether we include the tid in the output So I'm gonna say here uh to do Make including pid tid an option and then Oh, I see this is where The p name is being set Right, so here So here we're gonna set P name Is going to be equal to And I think what we're going to do is we're going to do calm uh replace space with underscore Because you can't have spaces and calm names in the final output Too many if else what about branch predictions? What about branch predictions? I'm not too concerned about that the branch predictor is going to be very good at this This will not be a problem Also, this will still be probably orders of magnitude faster than the pro code, but we'll see Um, so we're going to set the p name here Uh two string this two string is a little sad, but it sort of has to be there um Right, so that's basically all of this code. This is just uh allowing you to include pid and tid in the Uh, basically in the stack frame what's going to be the root of the of the stack frame Is it going to include the process id and the thread id or is it or are we going to collapse by thread collapse by process? Um into just name In fact, we could also have an option that says remove the the thread name entirely But for now, let's just keep the the p name as the calm Great, so that is if it were an event then we've now set p name Which is basically the only thing that really matters and notice here that we're ignoring tid I guess for now. Actually, I do want the warnings for this to remind us of the to dos Doesn't replace producing new string um It's a good question in theory Because in theory, uh, well, I guess if you have a mutable reference to a string you could do this transliteration directly Yeah That's a good question Now it does return a string here, right? Great. Uh, okay, so now we're at the point where we got a stack line So for a stack line, we're gonna use the other match that we have we're gonna do basically the same thing as we did down here We're gonna match on this I mean, it's gonna have to produce a string regardless like that is what we have to store in p name because by the time You move on the reference isn't going to be okay anymore So we can't have we can't have p name be a reference because it would be a reference into the current buffer Which may be reused for later lines So it does have to be turned into a string regardless in some sense I'm happier for a place to do it than for us to have to do it after Uh, right, so if we Fields if it does not match, then it's the same thing. We're here Let's make this weird event line stack line theme isn't this uh Which theme oh my vim theme. Uh, it's one called, uh Atelier dune, right? Let's see. So if that matches then See, this is where it gets interesting So the way it filters out events is by not setting p name But how well, why would p name ever be unset turned samples? Why are they filtered see where else p name is used? Is p name ever cleared? Oh, it is I see I see what's going on. Okay. So p name is being used in the sort of weird way where If you're filtering the event then you the event line you just skip Uh, and because you skip the event line p name is going to be unset from where it's cleared at the empty line And so therefore by the time you get to the stack line P name is going to be unsets. You're going to know to ignore it Okay So the question is do we want to do it the same way? So it's going to be something like if whatever then, um So we could do it the same way they are Of just not setting p name But in some sense, I don't know that I want to I guess we could just do p name clear here That's fine So in the case where you filter you just continue and p name is not set and then down here Um I guess it means that we don't even have to match the skip line So that's going to speed things up a lot if you filter So if p name is empty It's a little weird for that to be empty. I think we will we actually want this for this to be an option Although that's also a little unfortunate because it means that um So so the the reason I'm hesitant to make P name and option is because it means that we're going to keep reallocating the string instead of reusing it But at the same time, I think that's what we end up doing anyway because of Um the replace because in theory Um, we wouldn't need to allocate a new string here So maybe something like Reuse existing memory in yeah, this could save us a lot of allocations We're going to split this file into into multiple functions and such after a while just like we're going to do Error handling later as well. This is more to get the first place started Um We could do a skip event pool. That is true I mean at this point, I sort of want this to be an enum. It might become an enum at some point Um, don't clear key name And then here it's going to be if the filter matches Then skip event is true and we want that to reuse the name as possible Which we can do. Um I mean this is arguably Premature optimization. I wonder if there's a transliterate here Uh, part of the problem is because transliterate Characters in strings is a little troublesome in utf-8 because they might actually take up a different number of bytes Um, and so it's not clear that we can just do this But it would be nice to not have to allocate a new string each time But for now, I'm just going to ignore that. Um Right, so that would set skip event to true In general skip event is going to be false and down here If skip event Then continue. This is what's being parsed out from that line Let's see what we can do about that um So PC so PC is the program counter. It's basically the the address that was on what it made the sample And so that's going to be field zero We may want to parse these out as numbers at some point, but it's it might not matter Uh, row function is going to be field one. Uh, so that is the actual function that got called and Module is the stuff that comes at the end here That tells us, uh, in which module does this, um This is function reside. So kernel call sims means that it lives in the kernel This is this section of the kernel This is this particular library file and this means I have no idea where that run The unknown you tend to get if you don't include, um dwarf debug symbols for example Like if you try to run, uh perf on a program that, uh, doesn't have any debug information in it Most of the things will just be unknown because there's no association between the the program counter and Like there's no way to map from program counter to function name. And so all of these are just unknown Uh, let's see Sure Right, so that's these Um, and I think we just don't care about those strip Of trim end matches. Is that what it's called? For each character if the character is Well, so here's a good question whether we can do this Because I think this is given one character at a time Can I give it an f and mute like what intermittent what implements pattern? Uh pattern is implemented for What is a string searcher see I feel like this is promising because we basically want to trim everything well We don't want to trim everything to a zero x because you could have a function that contains zero x in theory, um well Yeah, we specifically need to match on on hex Is where it's a little awkward Yeah So we could do this with a reg x that also makes me a little sad But if trim end matches lets me give a mutable closure Um, then we can do this pretty easily by saying Uh, it doesn't even have to be that. Here's what I think we're gonna do I think the efficient way to do this is Uh, look for the last occurrence of plus zero x Walk from there to the end of the string and then trim it or we just do it with a reg x What do we think? right, so what i'm thinking is like uh hex or Offset is going to be If let some offset is raw func dot, um, what is it? Find our find plus zero x off func Well, that's another question whether I can say everything from this point onwards It gets really weird because this is a string so they're all utf code points So I think we're gonna do this with our reg x. It's gonna be It's gonna be nicer Because I don't think I can Split this out of character Oh I can So I can say raw func dot get offset Unwrap offset dot dot unwrap And I sort of want offset. I guess plus three If end dot I don't really want to assume that it's all ascii. Um, if I can avoid it Is there a is hex? Is alphan america's too nice? I actually want Is ascii hex digit great It's a symbol off func is gonna be Raw func Offset plus three because we want to skip over these characters Specifically x is not a valid hex character. So we could do this with a reg x It's like a little bit stupid code but Let's try it and see what happens Uh, we could have it like we are going to have to debug this like some things are going to be wrong But that's fine. Uh, if show inline Uh And Module does not match that to do something about inline This is probably presumed. Sorry. The get should be just slices can't panic here No get returns an option in case the character is out of range. Oh, you mean this get That I should just unwrap it Maybe But we don't really need to so it's fine um Skip process names. That's what it's telling us to do Uh Plus x position n you may slice it in plus three. Yeah, that's what I'm thinking. So what you're saying is unwrap here Oh, sorry, you're right. Yeah, I guess we do know that offset is valid I I guess we do technically know that offset plus three is valid too So fine It would be very weird if this did not work. Can you slice directly? On a string that certainly makes that code a lot nicer If I can do that, okay And now This is where it like splits the uh the function named by arrow Which I think is like a java thing Um, did they were there any examples from the lines here that had arrow? Not really If we look through our file like there are there any arrows in here Really, I guess here we could actually make it split split by double colon It's a little weird to me that arrow is like specifically the thing they're splitting by Just not single indexing. Okay, nice. Yeah, I don't know why they're doing this I don't know why they're splitting by this specifically That's what's weird to me But I guess we will so down here Um This is sort of walking basically the name the parts of the names of the function It's java lambdas but It it looks a lot like it's not just java lambdas because this is being used for other things than java too Although I guess in general like split is going to return just one thing if there's no arrow Uh, oh, yeah the page I had to close the patreon because I'm a I'm an international student in the us So I'm not allowed to basically have income. Um, and so sadly I'm not allowed to have one Quite unfortunate, but such as life. Thanks though um But we do want this code, right? So this is the thing that's uh Using the module part It's like basically trying to trying to be smarter about things that are unknown So if the function is unknown, but the module is not unknown, let's set the function to be the module um And then replace everything up to the first slash with nothing So module is the file so that makes the function be just the file name It's a little weird Uh line 89 should I oh you're totally right good catch Massage function name to be nicer. I don't like this split. So I'm going to ignore the split from now And just say if um, if Don't need that seems to have something to do with inline functions Yeah, I mean This business down here is weird, but but notice that none of these have these arrows in them Even the things that are inlined even this java stuff Right, none of those have these arrows. So where are the arrows? Actually, here's what we'll do Uh blame and we'll find where's the line? Where's the line? Where's the line down here? This include the complete inline stack to code location in the form Hmm, that's interesting I think this is only for java code like inlining in java code. Yeah Specifically for perf java Okay, that means that I'm just going to ignore that feature for now specifically ignoring Oh, yeah the The pro code isn't pearl in general is like not great. Um, but this is not bad pearl like This is not at all bad pearl compared to how bad pearl can be So what we're doing is if raw funk is unknown I guess we should oh, we already have great. So if raw funk is unknown and if the module Is not unknown then Uh, we're gonna set raw funk to be The Is there a trim left until so we're gonna do dot Find unwrap or zero module everything from there And onwards I guess actually that no We don't need that raw funk as a string um Right, so this is find the first slash Increase the index by one if you do find it so you skip past the slash Otherwise start from the beginning of the string and give me all the characters whose module as function name use uh Everything following last slash of module otherwise Raw funk is going to be unknown. This is just stripping off the square brackets If include adders This is where we're gonna add like command line options and we're gonna have a command line option parser and whatnot is gonna be Oh, that's kind of awkward now. It's gonna have to be a string Oh, sorry, you're right are fine Good catch. Um Oh, that's sad We are changing raw funk to Oh, that's so silly That's so inefficient include This is gonna be this is like a lot of unnecessary string manipulation uh If include adders then it's gonna be unknown followed by Um, because you might get the module. Yeah I think raw funk is gonna end up being um Or funk is gonna end up being a string Which is a little sad um We can make it a cow I So I think what we're gonna end up doing we're gonna end up having to do something like this It makes me a little bit sad to not reuse the memory here same thing here Actually, we don't even have to do that here. We could do We can do a little bit better. It's gonna be Oh, actually, no, we can we can yeah I sort of want funk to be To reuse memory across executions rather than allocate on each Each run, but let's ignore the allocation for now So we're going to instead say Let funk is going to be equal to this Down here, it's going to be format Then it's going to be raw funk and PC Otherwise, it's just going to be Raw funk and square brackets Otherwise, it's just gonna be Can you do match funk unknown Um, I could Sorry, this should say module. In fact, one way to do this. Okay. Let's see how we would write this with a match um, we're gonna match on funk module and Include and we're gonna say if Then funk is going to be If include adders, then it's going to be this See the problem here is we're going to have to write out the rule for include adders multiple times Well, actually no, that's not quite true We could do this Then unknown I guess this I guess this is sort of what you meant This is a little bit nicer to read then this right and otherwise And then let funk is going to be um Well, see this gets weird too because now we're going to Double square bracket if it already had a square bracket right Whereas the the previous code will not do that it will not add square brackets unless It already was unknown And in fact it will add square brackets around Module being used as function name. I guess we could do No, I think I want to keep it the way it was Even though I agree with you that it's a it looks a little yucky um Yeah, the problem is it's not really a boolean table or because This happens Regardless of which of these happen, but these have to happen first So if we were to split this into like a binary decision matrix, um, we would have to repeat this code in each of these branches Which is a little unfortunate. Whereas if they happen one before the other then Then it's a little bit easier. So I guess here try if we I think documentation is going to help a little here try to use module part of module name as function And this is going to be include adders uh All right So back to this, uh tidy generic this tidying business is uh Is weird So And the string manipulation is a little bit sad So I guess what we're going to do here is this is going to be a bunch of string manipulation So Is there an inline string replace? um On string Place range I want to replace that doesn't allocate a new string Because replace will Um, I guess the question is whether there's a thing on stir, but I don't think so because it would have to Shift things over So my guess would be that the only replace on stir is a I think that returns a new string like this one extend index Display Yeah, I think the only thing we have is uh Well, there is this replace range Of the range the x-range really, you know what I want either A little surprised there isn't one although I guess it is because it would have to shift things um Yeah, this makes me kind of sad This is going to allocate a lot of strings. So I'm not going to optimize this yet. I'm just going to Uh Try to write it in the way where it's going to reallocate strings and then we can see if we can optimize it later Um, so funk is going to be funk dot replace This is technically under if tidy generic I guess this is replace any semi colon with colon right And then it's saying if it doesn't match If it doesn't match Dot Buy Oh, yeah, my the plan is totally to profile our own tool once it's finished um Interesting this doesn't look like a go method names everything after the first open paren Is just noise I wish there were examples of what this noise is Let's uh keep this blame open Because some in some cases it's really handy to link to the commit or look at the commit that introduce that filter Here I mean, I guess this is similar to the the business we had further up with Uh, if it starts with a is if it starts with a just an open bracket then skip it but That's a particularly unhelpful Commit, uh, let's look at the further back history of that Oh, this is what this used to look like Oh, that's weird prior to this change Okay, so this change Okay, that introduced the The splitting It did not introduce. Oh, it's just the indentation. That's awful. All right, let's go further back Uh down here As far as changed a lot over the years This business here. Okay. What is this? Um, yeah, I have there's another video on my youtube that where I go through the entire setup I have so if you just look at my youtube channel In all the context indicate the beginning of noise I still don't understand what this noise business is So it's not really noise It's just argument list Right So when they say noise, they really just mean cut out all the argument lists Okay, that's fine. I mean, I guess we can cut that Um, except if Right, so this anonymous namespace business is also kind of weird Um, where was that? So anonymous namespace is also something you don't want to remove remove argument list, uh from from function name, but Don't remove Go method names Like that Also link to Let's be good citizens for other people might look at our code And link to the appropriate pull requests. Let's see the other one here D plus plus and non see that so Now how are we actually going to match this? um If it doesn't look like that then Substitute so what are they replacing? Any parentheses That doesn't contain Just the word's anonymous namespace I see Yeah, I think a regular expression is probably the way to go there I wonder if these can be combined though with um I think with negative look ahead assertions you can do this Specifically what we want is as follows Um, I guess we're gonna have to do this Uh tidy stack generic It's going to be replacing sort of this but, uh Actually, let's look at the docs for regex Oh lax look around in back references That's too bad So we can't actually do that But does it even support question mark exclamation? It might not even support that see if there's a That's probably just going to say not supported, huh? Yeah, exactly But can I use question mark exclamation mark? What is that considered? uh regex look around Yeah, let's consider it a look ahead That's awkward So without that we can't really do it with uh Just a regex we're gonna have to Uh man this tidying is gonna Not be cheap We could tokenize this ourselves But I think the thing to do is probably to match Here's what we're gonna do Okay Tidy stack generic what it's gonna search for is anything that is not a dot Maybe So it'll take it if it can Uh Followed by anything Uh I guess followed by every anything But non greedy No Matching brackets is a pain because once you have nested brackets Yeah, basically that the plan is to do a double search It's a little little annoying, but we'll do it. Um So nested brackets are going to be a problem, but let's just assume that they're well formed Actually, we can't even do that This is gonna trip up with if you have go with multiple methods in brackets Although the current code also has the same problem This basically means that we have to like resolve brackets, which it's you can't really do with regular expressions because they're context-free Maybe the way we do this is Um by just walking it's a little sad But it might be the right thing to do Yeah, because it'll be linear Uh So I guess then what we're gonna do is We're gonna make a simplifying assumption. We're gonna look for Only the last opening bracket Because if it if it really is just noise As in just argument lists then there shouldn't be parentheses in the argument list right Alternatively, we look for the first and if the first If the first one looks like a Yeah, okay. Here's I think I think here's what we do. Uh, we look for the first open bracket If it is preceded by a dot then we assume that it's go Uh, if it is followed by anonymous namespace then we assume that it's c plus plus Otherwise, we remove everything following it. I think that's going to be the Like it's going to fail but it's going to fail in different ways than this one But still I think it's mostly same Uh, a symbol is for sure if let some first Func find that if funk first Can I do reverse ranges and string lookups? I don't know if there's a way for me to go to the previous character in a utf8 string It's a good question. I have a string. What can I do with it? It can Yeah The question is what is the index by? Is it by like character or is it Actually, no, this is this is all fine funk dot get first per n minus one If that is some If it's none then that means the previous character must have been a utf utf8 thing And so therefore it can't be a dot Can't be a dot if it is if c is a dot Then it's go. So then we do nothing assume. It's a go method If it isn't funk First why don't I keep writing parent instead of per n? Uh first per n dot dot dot starts with Uh anonymous name space kill it with fire So then we're going to set funk is equal to funk Dot dot first per n The question is what happens in the else case can't be a dot. We still need to do the check. Don't we? Actually, let's ah so the trick is to do this first is go is going to be True We can tidy that up a little by saying Oh, yeah, the regex trade is really well optimized Like it is not stupid It also does really well if it turns out that your regex is a it's just a string It will do really good things like use Memature to search for the literal prefix and then only do the regex search from there. It is very very clever Um, and I recommend using it whenever you can Is none Can't be a dog. So this could probably use a function because then we could return So let's do uh to do Turn this into a Yeah, burn sushi is a wizard. You're right. Uh for what it's worth. I know some of you are thinking this code is pretty hairy I agree Refactoring it a little bit is going to help Um, okay, so that's tidy generic. What else do we have? Oh, it's also doing this Whatever that is Actually do Is that actually where that was introduced or is that just the indentation? No, that was there before this So Where does this come from tidy a horrid javascript frame? Gee, thanks. That's helpful So this is a tidy up ugly javascript Thing and then we're going to replace Any double quote followed by single quote with nothing that seems odd, but sure Funk is going to be See then we're going to have to do that. Oh, no, this is just funk dot truncate To there right? On string itself. Okay, great. So here This is going to have to be I think we're going to do this if funk contains In that case We're going to do replace This just to avoid the reallocation in the common case where it does not contain that string Um really weird like why strip those particular characters? Um, I'm going to ignore tidy java because I don't care about java To do tidy up java annotations. Okay, sure so Right, so this is where this inlining comes from. Um All right, we don't actually need to care about inline Inline Because we're we're ignoring this like java inlining business Um Scalar inline this means that we've gone through the loop before Uh, which will never be the case for us I see so really all we care about is the other ones specifically we care about is if Falls and Ooh, what is this business? Um, I guess this is going to be uh to do annotate kernel because that's the flag we're going to need to do uh and A module that starts with either starts with open square bracket or module starts with The um, what's the order presidents here? It's like, um Oh, I see that's what they're going for So either if it ends with vm linux or it starts with a square bracket Module, I guess this is not equals Not module starts with All right, I guess that would only ever be unknown Right and not and module is not equal to Then Uh Funk string Kernel need to do annotate jit Which we like may care about there's like javascript code and stuff Uh So it falls And what is this business? The module Um matches that That's a good question. I think we are going to be a little bit more liberal and say mod starts with that and This is module and module ends with And then push in line funk. We don't actually care about um, so this is uh Put in line at the front of stack It's a little weird So is the output in reverse order? It is Huh That's interesting That is very interesting indeed Um I see so this is like Uh up here if I think what we really want to do here is just reverse the stack Um False is uh to do Does it include p name? Yeah, we're gonna have to figure out what the ordering here is so this vector We want the vector to be in the order so that it will do this But that means that we have to push the last thing we encounter first Um, there are a couple of ways we could get around this problem. Um I think the way we're going to do it is we're going to say Uh, all of this is going to allocate so many strings. That makes me really sad But I wonder whether we could be more efficient here by saying Uh We might be able to use a vic dq here actually I wonder if vic dq has uh A join because that way we could push from either end and it would still be efficient Um, I might have been hoping for too much. That's totally something people should add though Uh, in fact, where does join live slice conkat? Yeah, so that should exist for anything that implements this And uh vic dq do you have still that so let's make the stack a vic dq So the stack is going to be a vic dq. Uh, and what we're going to do is Uh, we're going to push to the the p name to the front See this too is going to have to Take the entire p name Which makes me a little sad like sort of that basically It doesn't dear of to a slice really vic dq does not dear of to slice. I don't believe you. That's insane Oh because it can wrap around Oh, of course okay the other way to do this then is uh That makes me really sad Okay, here's what we're gonna do What is the condition for not pushing to the stack? Is it Empty including the p name or not including the p name? It is including the p name Okay, so What that means is if the stack Is empty See that just gets annoying too The old vex slice does also create a new string um It's just that now we would have to create two strings because we also need the p name Um, I don't even want to do this Like let's take a step back here and look at it. Um Maybe Okay, maybe we do keep this as Just a stack Uh, the clone is pretty unfortunate But I guess it's fine Let's ignore it for now And then here we're gonna say, uh That the stack string is going to be See this is why I sort of want joined to be over iterators instead of But that's fine. Um We're gonna have to do something like if the stack length Is one Basically have the special case it to deal with the semicolon Then stack dot Pop back this should basically never be the case Uh Otherwise We're going to say that okay, let me It's a little bit hard to articulate this while typing it. So I'll type it out and then say why I'm doing what I'm doing Um, and it may or may not make sense Okay, so that in theory should do the right thing if we want it to be even more efficient what we could do is, um Allocate a single string with the appropriate length And push on to that in fact, maybe that's what I want to do Um Let you Uh Stack line and we're going to say here So if the stack only is one item that we need to make sure that we only Uh, then we have to make sure that we don't omit any semicolons. Uh, otherwise What we're going to do is push the first thing All right, so we know there's a first thing because the the length is like it's non-empty All right, so we push the first thing and then for each remaining thing actually now we don't need these anymore Great that makes me much happier So we're going to keep this um We're going to keep this uh this stack line thing around So that's the thing we're going to be writing the current stack line into whenever we iterate through Just so we avoid allocating the memory for it. Um, and we're going to clear it at the beginning Oh, we could be even fancier than that maybe It might have a way to avoid this clone. Let's deal with that in a second So, um, if the stack is non-empty, we're going to clear that sort of buffer that we're keeping around We're going to push the first thing from the stack and then for each remaining thing We're going to push a semicolon and the string in that and that guarantees that you have semicolon separated things Um, now this is just going to be stack line Uh, and this needs to be owned and so therefore stack line doesn't doesn't help to try to store between these So we're just going to do this And in that case This might as well just be this I tried to be clever and I was not Is that roughly makes sense? So we're trying to construct this like semicolon separated list of the stacked And we're going to the way we're going to do that is push the first one and then for each remaining one out of semicolon in that that string We could here optimize a little bit by trying to Basically by by measuring how long the string is going to be and then allocating that much memory And that might save us some reallocations But i'm just going to ignore that for now. I don't think it's terribly important This is a to do that's fine um What we are going to have to be careful though is the order in which we push things onto the stack So we know that we'll want the peanut to be first regardless and I think what this What this structure implies is that whenever we get to something we always want to push front Right Yeah, we're always going to want to push to the front because that way we're basically reversing the order So that when we walk it in order, we'll get the bottom thing first Uh, so it's where we previously had stack.push Oh me Oh, right. We never got to that part. That's down here great So now we're back to looking at um Each of these we ignored the tidy java. We've done this business and this is where it unshifts Inline onto the stack although for us inline is not an array instead. It's just funk And so now we're going to take That's still tidy inline So that should have ended up here This is matching on the stack line And so here we're gonna Stack push front Know the drain iterator adapter doesn't set the length of the vec until it is dropped And it doesn't shrink to fit just set lend So I think actually drain Does set the length of the very beginning because it cannot it's not safer to do the opposite It can't set the length of the end Um Because then you might drop an element twice if you have a panic. So drain I believe Sets the length of the vector before the drain So changes the length of the vector before the drain Then does the drain and then sets the length At the end it you are right though that it doesn't shrink to fit it only sets length And this is intentional because you might want to reuse the memory. Let's do a cargo check just to see So there's some stuff in there Um So what is left the part from the to dos well There's this inline business which Where is it it calls inline? Here this is if show inline Which I think we added it to do for if not we should do that Uh, that's just below where we parse out the raw funk which is like Here somewhere to do uh show inline So we're not doing any inlining yet Um, what we are missing though is once we've gone through all the lines then This occurrence is is what we have to print out, right? We want to print out Sorted by value really? Or is this sorted alphabetically Why is this sorted at all? Why do they care that this is sorted? Oh, that's unhelpful It's like the origin commit Great. So what you're saying is we have No idea why this is sorted Why is it alphabetically it's sorted by key. It's not sorted by the count. Why is it sorted by key? I guess so it's so that it um I guess it's sorted by key Actually, yeah, I guess it's sorted by key to ensure that Uh, the display is remains the same across different runs Like the any given function is going to appear in roughly the same place, maybe Why do you clone the p name? Uh here Well, the type of the stack is a vecdq of string So what else would I do? Because p name is kept around right so I sort of have to clone it because I can't I guess I could take it um The problem is like the whole reason we have p name is to keep reusing that memory, right? And so if I take it I'm basically removing that memory and saying reallocate it on the next iteration whereas here Uh This clone will only happen if you choose to include p name. So it won't Like won't incur a cost Although split off would do the same thing. I think you just like fundamentally This is going to require cloning something like in the case with include p name I guess you could optimize it away by starting to construct the the stack string here That would avoid the clone, but you are still doing a string allocation. So it might not matter all that much Yeah, I don't think it's super important I also don't want to optimize this too much before we actually profile it and see how it works It is true though that now At the very end so after this after we've read all the lines Then this is where we want to print out all the keys I wonder whether is a For hash map, is there a way I guess there is ordered hash map which you can walk in order In fact, ooh, we could make this a b-tree map instead of a hash map arguably it should be a try map Just because a lot of the stack frames are going to share the same prefix The advantage of a b-tree map is that we could walk it in order by key The hash map is faster But the b-tree map lets us not have to sort it sort all the keys in the end This is what makes it makes it weird to me that they're That it's requiring that we Amit them in any kind of order. I would think that random is fine Because I don't really want to I guess we could sort all the keys at the end It's a good question. I don't know which is better It depends on the ratio of number of lines to number of keys So the number of lines For every line you're going to have to do a Sorry for every for every sample you're going to have to do one b-tree one map One map operation Or you're going to do So so that means you're going to do N log n where n is the number of events Because of so the b-tree map is like log n to do any operation. I think so you're going to do n log n plus n it's going to be the total complexity of A b-tree map for this where n is the number of events um whereas For the map it's going to be Uh like n plus k log k where n is the number of events and k is the number of uh k is the number of distinct stack lines So I guess we don't know whether they're well, we should know this right We know that there are fewer There must be fewer strictly fewer Stack lines than there are Uh, total number of events Because every event has exactly has exactly one stack lines and some of them will be shared So it's better to sort it at the end great Uh, so what that means Is that at the end here we're going to do vek from true Occurrences.keys And then we're going to do keys.sort And then we're going to do four key in keys And then we're going to print out whatever this is that Right, which is really just print line this this And nothing more which is going to be key And occurrences And now we have some compiler errors. That's good. Oh weird youtube is being weird again Uh, can people from youtube still see me? youtube is really not doing well with the streaming business like It's so unreliable the stream to youtube Um can't find funk in 122 That's because this should say we're off funk. What else do we have? 59 Okay, you're good. Thanks Uh, okay. Well, I don't know. It's just uh, there's a status panel that's saying that youtube is still Like zero viewers and retrieving data um The size of string cannot be known. That's fine I guess this is gonna have to be It's weird how my terminal will have a font this large Why is it complaining about that? really Cloning a character is just Taking a reference of it. It's fine. It should be at least what does Where's my where's the car's iterator? cars Presumably gives me an iterator over Yeah Oh, it's um All just gives you a reference No, it's uh, it's the problem is all the problem is not cars Alternatively, I could just collect it, but yeah, I don't want to import that trade to find it 59 7 Slices these are type u size or ranges of your size. I mean this should be a What am I missing? Oh, it's the inverse of what I know, but then it should just work Because all gives you a reference Very strange. I guess. Oh, maybe. Oh, maybe all doesn't know you're right. So actually Yeah, no, you're totally right. You're totally right Uh, you can't single index into strings. Oh, this is the thing we talked about earlier I specifically just want that one string though Isn't there a car at am I misremembering? Oh, maybe that's in So I would have to do This they're a futable wallet. Yeah, we're aware of that. Oh, right. Did this ever run Her report gosh gee Oh I feel like I had a way to get around this but now I can't remember what it was Maybe we could try it on a different machine instead and see if that has the same issue How do you structure your research code? Uh, do I structure my research code? That is a very good question um We do We do generally deal with pr. It's like most of it is just committing to master For smaller changes. Um, but when we're doing like larger Sort of tests, uh, or larger research efforts We usually do them in the branch and then we do not quite code review But at least that way we can look at the changes when they're ready to merge Um So we do for larger changes we do pull requests if you look at the noria repository You'll see that there are some pull requests in the past. Uh, there's also so the branch I'm currently working on is the one called, uh, rpc rewrite Which is a fairly large change So that might be something that it's like of interest to look at the diff there Uh, well it compiles. So that's kind of interesting So I guess now what we're going to do is run our thing the cat birth data script Pipe actually we're gonna first Have it compile that crashed Oh Here we don't need to backslash this Uh recover type text from typing auto. Yeah, I know I've been worried about that to myself Um Sadly there's not really a good way for me to work around it But there I'm also not really typing any sensitive passwords for this Mostly You're right. It's a good concern. I I am aware of the concern Great. Okay, so it runs That's pretty interesting. So perf data script pv stack collapse per Perf data folded. This is the command we ran ran last time, right? Okay, so it's still like Around 16 megabytes per second to parse that file I I actually have no idea Uh Whether we'll be faster or slower, whether it'll even work But we're about to find out Took about 30 seconds last time. It's all right. Uh, actually I'm gonna move Folded to perf data Uh, pearl folded Now Rust folded There's a partially a moment of truth. Let's see what happens Whoo That's not quite what we wanted Huh, what was this? I mean, this just looks like it's printing its input Seems like not what we want. Why is it? Something is definitely weird Why is it printing anything there's now I guess there's weird stack line, but It's just like remove all the prints and see if it still does that It does not. Okay. So it's one of these that are triggering But then why isn't it including? the weird part It should say weird stack line somewhere. Okay, so, uh, let's also put that into Error.log. Let's look at error.log. That's a big file It keeps accumulating all of the lines Okay, so that's definitely wrong Oh It's because we're not clearing line Uh That's awkward We need to clear Okay, fine. This is now going to become a loop Uh line dot clear in this Okay, that makes a lot more sense So let's look at the error log now Great now now it doesn't have lots of weird things in it Uh So let's now go back to not doing that. Okay, that's a lot faster. It still is not as much faster as I would like it to be Uh, you could just use the lines iterator from buffering It's true. I don't like this part. Why is it doing that? What is it being slow about? Why does it currently stop doing really? Where is it stuck? Let's try that. I guess Does this give me strings? That's another question Or does it give me stirs Sorry stirs Now it gives me strings. See this is the reason I didn't want to use it because it doesn't actually reuse the line buffer Let's try that. Let's see if it actually finishes Okay, so this is currently what about 2x faster Well, let's see if we can do some profiling and bring that up Uh, but first of all Let's see whether this is even right So, uh, the rust code is definitely doing something wrong It's a lot of zeros Going on in there This looks like it's picking out the wrong, um Somewhere specifically it's looking like it's not picking out the calm correctly So let's do No from each stack frame that is So here Oh here. This should be one two Oh, the line count is another good example But that will only given that would deduplicating by by function names Um, the line count only works if all of the other stuff is correct That looks a lot more, right? Right. So now the only difference is that we're not including the module name Right and including the module name is pretty straightforward. Let's uh Make these be Or to do so Include p name. We're going to set to true If you're going to ignore that for now I guess we can include that option. So that was include tid So where's the place where that got used here? So this is going to be Uh P name is going to be if Include tid this Uh, and so this is going to be either Regardless, I see No, that is not awkward. It does mean oh, actually we can just do that first This and then say here We're going to do p name P name dot push string Pid or Pid slash And then tid and now we need to declare those so So include tid is false Include tid is false Um, okay, so that's probably true by default So this is also going to slow us down a little because now we have to do more processing And actually it did right But only very marginally we went from 33.8 32 point or 30 Yeah, like high 32s How about now That's looking an awful lot better, huh? Okay, we're stripping single quotes, which we should not be doing This is because This is because replace replaces anything in the pattern as opposed to the exact string Um a parser library like nom. I actually considered doing that. Um, but I figured I wanted to do a more direct port instead Uh It's also this parsing is a little weird because it's like Uh, I don't know it could be but it is kind of nice to be able to do line by line parsing when it is a line by line format Okay, so this replace Won't work because the replace has to Wait, why did this line doesn't include double quote single quote? So why is single quote being replaced at all? That's okay. So that's weird Oh, the pearl version is missing the quote. You're right Huh So their tidy generic Removes both It's not either So now the question is like which is more expensive allocation and copying or scanning I wonder whether we can do In some sense, I guess what I want to do here is Any I don't know if that's more efficient than just always copying the string Uh, do I need to remove it? Um, I don't really need to there's an argument for uh That code actually being the pearl tidying being wrong here Um, it's a good question. I'm not sure why it removes the single quote at all Come to think of it Like why is this not okay? So for this Like why is it important that that gets removed? Oh, it's probably you know what it is probably for the svg generation So I know I think what I want to do is not remove these Uh, the pearl version strips So we're just not gonna We're just not gonna split those and instead what we're gonna do is here We're gonna substitute any Uh single quote or double quote with nothing Now these are looking pretty similar. So let's uh do that There's some difference here in sorting This horrible thing. What do you mean? Oh from from the yeah from the commit message um So why is this ordering different? That's weird That's a little disturbing that they're sorting differently Because these are strings This pearl like sort capital letters before lowercase letters and rust is the inverse Yes, apart from that though, like this is all the same Great, okay. So we wrote something that is about 2x faster Um, let me push this just so if people want to look at the code they can and so that we have a um Tracker before we start splitting this up into Bruno a rust port of Flame graph used for debugging. We're also going to ignore any log files Uh, and then we're going to Uh Uppercase is lower than The lowercase in ascii order Yeah, but then is why why is we're sorting them in the opposite order that sounds like rust is Not sorting in ascii order Uh, yeah, so the next step is going to be to run perf on our perf parser Before I do that though I'm going to do a little bit of cleanup because currently we have all these parameters that I sort of want to parse with clap And I also want to split this into doing like error handling properly and whatnot Um So Let's do that first Uh, we're going to go probably go toml and we're going to say clap Whatever claps version is right now Hmm. It's probably using unicode collation. That's true 0.1 actually this is Uh inferno collapse. Let's see Struct opt instead of using clap directly I haven't used struct opt before I'm very happy with clap. I don't I like the fact that it gives me low level control Um I'm sorting just by using sort. I have a vector of strings and I'm uh Yeah, I have a vector of strings and calling sort on it It could be that the pearl ordering is wrong. I don't know Um, sorry. You want me to use struct opt instead? Sure seems fine. You can do that version In a good way. All right. Oh, I do need to macro use it. Don't I? Even though I'm on the new edition normally you can just use macros now, but I don't think you can do that for derives should be wrong Uh, so Let's Follow the example shall we we're going to derive debug and struct opts. I guess we should also change cargo toml to say Bin You can never remember the I think it's this source main to source Look, it's going to be Oh, I don't need the macro use anymore. Nice. It's because of the tick a Of course, so the pearl version doesn't see the tick whereas rust does Problem solved. Thank you. Good catch Thomas Um Right So now we get rid of these That is kind of nice. I agree Uh, and the things that we want, let's see what Uh, per hell no Stack collapse per we sort of wanted to mirror this as well as we can right so How are these written so I guess what I want is like Include p name It's going to be a bull It's going to be Short is going to be None of them have short so I guess long And what's the default values is the default of the type so false Uh, so I will want default is true for p name. This is not fit. This is p name which Really you can turn off Well, then I guess include p name is just going to be on It's not actually an argument that is so weird How does it so this is an option you can turn off great Uh, well, we're going to let people turn that on and off Uh by you saying I guess What is p name again? It's the process name Or it's the Yeah, it's the process name So we're going to call that names What does calm Because that's what it's called. Oh, yeah, it's probably missing the to do make this configurable. It's true Uh Include Pid which is going to be False by default Uh, how do I express that? Uh, oh, I guess, okay fine. So for this I'm going to have to write something for this It's going to be whatever this says I sort of want to say that include did implies include pid Is there a way I can do that instruct opt? I mean, I feel like there probably is but uh Uh Argument types Help messages Right. Yeah, that that's pretty similar. That's fine Uh-huh optional sub commands flattening string parsers That doesn't really help me does it? Okay, so I can't say That it depends on that uh default value I sort of want to say like requires or includes But I guess it's not terribly important Um, what else do we have include adders? Which is just called Adders Which was off by default And for adders This what else do we have? Inline which we haven't dealt with yet annotate jit and annotate kernel jit Okay, don't know whether is on by default or not, but we can find out They're all off by default except for tidy So that's gonna be jit Uh But here I need to sort of say that all implies both. I don't know if there's a way to do that Okay Looks like tidy is also not configurable. Yeah, if you look at tidy generic There's no way to turn it on or off. So I guess let's just not include those Just hardcode them to true Just like the original code does um You can use something like a magic stuff in the adder Oh So in the attribute I can like give clap commands because that would be pretty handy Uh raw How does raw work? Well That's sort of what I want I'm very surprised that there isn't a Straight forward way to express this Because like okay, so basically what I want is here This In some sense, I don't want to declare this as as being tied to any particular variable, right? I mean, I guess I can It's fine. I can just do it in post-processing. It just makes me a little sad um What else do we have so there's Context and inline which are things we haven't dealt with and there's event filter Uh, which we haven't dealt with and then for uh Right And then how can I define like a Help text to come after the main help The help message that will be displayed when passing help. Okay, great. So here So there I want Sort of this Can I declare positional arguments? Is the other thing the type of the fields gives the kind of argument Optional positional argument or option There are three types of arguments. Okay, so how do I do positional? Oh, if if it just doesn't have anything I see. Okay, so we're gonna have Uh input be an option String and that's going to be This in file is what it's called And then that's going to be something like um Standard in if event filtering we haven't added yet This Is what we're gonna have to include up here The question is where does that go? I sort of want help message that comes at the end I don't know if there's a way for me to do that Like basically the same way as happens here. I guess If I run this it should tell me Oh, did we ever get the Stuff from here Something is very broken with per friend later Kernels like I keep seeing it could be related to rust code too Um not entirely sure Like I see this error a lot And you're saying I should not need to call and then of course I need all the business Is there an after help? I don't think there's an after help I mean there is in clap. So I guess I I guess the one way I could do it is uh Oh, these are I see These are passed directly to clap. Oh, so this actually does specify It feels weird, but okay Um, and I guess this we probably don't need anymore Oh, I see. Okay. Yeah, that makes sense. Uh, if the values aren't strings, then you need raw All right, great So now I just need to actually parse this out, which is this off from args here Uh, and we're going to do Anything that says include is going to be except for those going to be off dot those all nice Where did it get the author name from? Oh, it took it from the cargo Toml probably Yep, that looks pretty good. How long have you been using them for? Uh That's a good question since I'm gonna say like 12 years Maybe more a while Um, okay, so this looks pretty similar So that makes me pretty happy. I don't know if I want the author version author info there Or the summary I don't think I want that I sort of don't want the Can I just say like can I have it not do version? Is that an option that I have? Like if I do maybe version equals nothing Or is it gonna hate me for that? Oh, that's unhelpful Like all three equals nothing. Is it then gonna leave an empty line there? Yeah, no, I did not great Uh, so I guess then I can do I can just skip name the name empty And the version empty it's leaving empty lines And also I don't want to set the version to empty because I want it to be correct when you pass dash v So I guess it's gonna have to be like this But that seems fine. I'm not too bothered by that Um, okay, so we have our flags What now now we're gonna tidy up this a little so oops parse Take command line flags And now we're gonna tidy up this main routine a little specifically what we're gonna do is uh We're going to have a Uh, oh because now I guess in file Yeah, we're gonna have a a handle file method that's gonna be Generic over r which is buffy. It's gonna take an r It's gonna return Also, I guess an opt now here. This is gonna match on opt dot in file. Uh, if it's some I'm gonna do some stuff Otherwise, we're gonna use standard in Here, I guess we're gonna have to It's gonna be handle file with some error handling And this is gonna be With some error handling takes two parameters. That is true. It takes, uh, opt Which it might not be very happy about giving me because I'm borrowing it So f is gonna be and I guess just for Let's call it r for reader do this and then that's gonna pass optimus. Great. Um So file open can fail So here we sort of want to figure out what we want to do with errors I think what I want here is I sort of want to use question mark really Uh, actually that's a good question. What does this do if it fails to open the file? Oh, this is just pearl's default handling. So it does whatever pearl does Um, so I think what we're gonna do is If it's an error and say The the other option we have here is to allow multiple in files Which I think, um Is actually permitted in In this version I think in theory if you gave multiple in files, it would sort of just do the right thing But I don't think that generally makes sense Um, I could make my main return result. The the reason not to make uh Main return a result is that I don't have as much of a control of the error message that you get but like Maybe that's uh, it doesn't mean that you don't really get quite as nice error messages But maybe it's fine. It does save us a lot of sure And then I guess here We could even just do this Why is it complaining here? Oh, this is fine Uh, now the question was Do we ever really want to return error? I guess let's find these unwraps that we put in earlier Uh, these definitely should be unwraps because the regular expressions should be parsable regardless In fact, they can even be const If I remember correctly On straggix business Is it lacy static you have to use to get um Mm-hmm Yeah I don't think that's worthwhile for us. It's fine to just do it there But that one we do really want to be an unwrap because um, don't use const for lacy static. Yeah. Yeah. That's what I meant It's not actually const But I don't really care about it being lacy static like it seems fine to just call it in handle file Um, although I guess with lacy static, I would always learn whether it was valid or not Um Whereas here I would have to actually run it on a file Actually, no, even an empty standard in would do it Uh, what else do we have unwrap? This is now a question mark Uh This is an this is fine Actually, this could be If um Some just we avoid the unwrap That's an unwrap or so that's fine. Yeah, the I had to shut down my um Uh, I had to shut down my patreon because I'm an international student and that means that I'm not allowed to be paid in the u.s So I had to shut it down sadly The rig exists are compiled on each call to handle file, but handle file is only called once so that's not really a problem Um, oh, I see that's fine. Yep, that's fine. This needs to be mutable. What else? 53 See nll would fix this, but I thought nll was on in the new edition Use of mood value. Oh, that's because I'm being stupid This should be okay. Let's do some more cleanup here. Uh What else do we want? Well, we could do something that handles each line By making all of these things be parts of a struct Um, I would probably be a little bit nicer So we're gonna have a struct that's gonna be Um It's basically a state machine, uh, but what do we call the state machine? Yeah, so nll is on it's just the I needed to match with the ref Otherwise, I'm moving the in file, which is not what I wanted to do Um It's gonna be something like Perf state It's not very descriptive, but it's fine. It's gonna be The things we're gonna have here are In event skip event So this is uh Lines until the next empty line are Uh, this is Uh skip. This is really skip Uh stacked Skip all stack lines in this event Oh, this is neo vim. It's not vim, but neo vim is or vim in general is extremely productive when you have the right plugins The stack is basically How I guess actually we're gonna derive Uh Because all of these are default values. Great. Um the stack The stack is Uh that Well, I mean the stack is the stack, but how do we even document that? Um function entries on the stack in this entry thus far as The other question is whether occurrences should be a part of perf state. I think probably Uh, number of times each Uh Each call stack Each each call stack Has been seen. Uh, how are you faster with intelligent? Like what makes you say that you're faster with intelligent? See this really is calm name. It's not p name. P name is a weird old p name I guess So back and forth between terminal and id What about that? I do that only to run things Oh, yeah, the stopping me while typing is really annoying. Um, this is why I had rls disabled for the longest time You're on the program to see compilers while I see them immediately during typing I see them immediately during typing too Control j gives me the next error Control l is all And they're highlighted on the left So they're all there in my editor too Where were we Right, so we're gonna have this state and then I guess what we're really gonna do is On stack state actually we could call this perf state So we're gonna say something like we're gonna have a function for on Uh on event line We're gonna have after event We're gonna have I guess On stack line one of the annoying things is we might need the options in here too. So I guess this will also contain an alt The and so the idea would be that This loop is basically the thing that's going to be doing reading Here we would do Uh state dot on after event event is gonna do this One question is what happens at the end of the file? Is there always a new line even at the end? Uh perf data script There is okay great See these are just weird to me like why are these variables in the Oh, I never remember. What is the Control t no control i What's the vim hot key for uppercasing something? Uh control a no this should be I guess this will really just be Self-taught I think it'll just be right and this will do the if in event Then on event line. So this will be online Oh shift you yeah So this is um Yep, it's a pretty handy shortcut, but I can never remember it because I I use it so rarely Um, okay, so I guess Down here, we might want to split this into multiple files at some point because currently it's getting pretty long I guess it was organized this way. So we might keep it that way and so If we are if it's not an event Then sorry if it on event line that is easier to reason about Um, so on an event line. We want to say that we are now in an event Yep And then we're gonna do oh See this is where we're gonna need the The lacy statics to get the patterns So Simone, I think the the chat can probably help you out instead of me Stopping to do that Thanks chat I appreciate you Continue is going to be weird Why does this do continue? Nothing happening There's nothing happening further down. I don't know what I why that was continuing Seems weird Then I guess here this is going to be the place where we do something like Oh, this shouldn't be self this should be And then this is going to be state.finish and state.finish is going to be what does that so now We've divided it into functions. Now the the handle file is looking a lot nicer now So that makes me happy This is still pretty big and ugly though I think we probably want to move a perf state into its own file Maybe unclear if it's worth it. I think we probably want to keep this here Instead This down here for easy reference. These are going to need a bunch of selves Oh also The symbol stripping worked entirely correctly like this whole business for stripping symbol offsets Must have worked correctly given that the files matched up. So that's kind of This is self.op This is self.opt and this pushes to this stack and this no longer needs to continue Because this continues the default operation Great, um, we do probably want to split out A couple of things for example the, um This business for For tidying up a function name Is probably the place where, uh Hmm Because this business it would be nice to at least have it in its own function Right, uh So I think the way we're going to do that is I sort of don't want to have to give it captures though because then it Although that said on stack line is basically just the tidying that it is all it does So Arguably Arguably, it's just a function that does that anyway. This one is particularly nasty though Um, so let's do a uh, so first of all handle file should be higher up. Yep And then it's the regular expressions that we'll probably have to do lacy statics Um Right, so this business No this business um I guess it's basically a string function So let's do So a tidy generic It's going to take funk as a string and give you back a string This place is just going to do Funk is tidy generic So that does help a little bit Um, I guess we could do the same here Yeah, so that's going to be Uh, I guess module fallback Which is going to take Module, which is a string law funk, which is a string and pc, which is a string and give you back It's going to be with module fallback This is going to be module fallback of module raw funk pc I still don't know what the skip process names is Like it would be nice to have some examples of uh, who Did the stream go away or are you still here because now twitch also went weird Seems to still be online Why is this unreliable now? That's a good question No seems to be fine Yeah, weird. I don't know Um, so this skip process names. Okay. So here's another thing just to backtrack a little bit Back to where we were Uh, there's a test suite for this so in tests There's a bunch of uh Perf results that we could look like You look at that basically shows you what the results should look like for a given file and I wonder Whether this has any examples that contain those parentheses That it's like skipping specifically. What is it? It skips if raw funk starts with parentheses So something like this That's awkward Uh, everything matches that Let's do a clone of this actually it's going to be handy anyway Um, so if we cd into here and into test and then we grab here for Um, I want to look for anything that is uh space open bracket followed by not an open bracket or A slush Really? How does this line match that? Space open bracket anything that is not a square bracket or a slush Hmm anything that is Uh, oh that should be right I don't want anything that Is this or that Weird Is this just printing all lines? It is What's the pearl print matching lines? Could probably do this directly with grep With dash in may have been overly aggressive So there are none. There are no examples of these like start with a Yeah, so where does the skip process names come from? I guess we're gonna go to blame again Uh skip process names here See that's where it all comes back to that damn commit This Is this no Oh wait is that that was not the file I meant to close um So show me before that commit where does it skip Skip summary line wait that doesn't change that line This line is not changed by this diff So why Is it even complaining? Oh this business. Oh, it's just indented I see So really this has been here all along But that suggests that it's common When would the function ever Skip process names? I'm just gonna leave it in because it seems to have been there since like the beginning of time Um, and so I'm gonna do this Um, then inlining is the other thing we sort of need to figure out But I'm gonna not do anything with inlining for now And instead have inlining be something we deal with after we look at the performance of why it's currently Doing well slash not doing well. Um, we do need to get this to compile though. Let's add in Lacy static static two And then we should now get this Wait, can I not make Lacy static global regexes? Because that's really what I want let's see if that works Ooh Capture's not found from this. Oh, I renamed skip event didn't I? Skip event to skip stack. Oh, it doesn't have a default. Sure. It can have a default. It seems fine state uh Actually, no, it's not gonna be default It is however going to have From just so we have a good way to create perfsates in the first place It's going to be false stack false stack is going to be back dq default currencies is going to be hashmap default P name Is going to be an empty string and off is going to be off. This has to take Uh, and once we get this to run, um, then Then we'll try to do performance debugging of our own thing, which is going to be fun. Hopefully 78 what did I call it? Was it not on after event? Oh, of course not. It is after event and these continues Should just be returns that has to be continue That has to be else this Oh, what is that even going to be that's going to be return Is about right. Are there any other continues? No Great. I guess filters. We are still lacking, but so the other thing we now have to do is um Try to figure out why it's slow I mean not that it's particularly slow, but I guess we'll try to run it first and see how that turns out Okay, so about 35 megabytes per second. So the question is can we do better? right In fact, one thing we should check just for sanity is If I just cat out the file Okay, so we can write files a lot faster So here's what we're going to do. Um, we are going to do First of all, we have to set up our Our things such that profile.release has debug info Because without that, we won't really be able to debug it much And then here I guess this is going to have to rebuild And then what we're going to do is basically just run the same pipeline But perf record the entire run and then look at where all the time was spent And at least for the usually the way when you when you're doing performance debugging like this Usually what happens is You run into weird situations where Uh Like initially there's some really big wins and then after a while they just all disappear like after a while There's just no No one thing you can optimize anymore, but hopefully this should be something that stands out to us Um, the other thing to be aware of is that we can't actually use cargo our release here because then perf would record cargo instead of the binary So we're going to do cargo target there slash release slash Uh, inferno Collapse perf we're going to do g Call I mean, that's the plan. We're gonna we're gonna flame graph our flame graph Now, of course, we haven't ported the the actual flame graph tool yet. Uh, we've only ported the The stack collapse part. In fact, this is going to be a little bit interesting, but let's do a zero per Or just perfectly port g first Uh, see this failed to process sample business It's just happening everywhere. Can I run perf script? Like is that fine? And what if we run it through our own thing? Oh, actually, that's not what I meant to do I should have done Yeah, so keep in mind that we can't really go much further than much faster than whatever perf script can do, right? Uh If perf script can only hand us data so fast, we can't go faster. So here this was probably not a long enough run but This was estimating that perf script was emitting about 70 megabytes per second So we can only really get 2x compared to where we are now Because otherwise we would be faster than perf script. So the bottleneck would be perf script Although that would be a great place to be in Um, but what we want though is To cat perf data self script through pv through stack collapse perf to perf data self pearl folded And let's see how fast that is Okay, so stack collab perf here is about 23 megabytes per second And in theory we should be It's perf Yeah, so we are faster currently not by all that much, right? um Well, let's it is interesting though that here everything seemed to work So now we're going to run flame graph on perf data We'll run it on the pearl one just because Something is probably broken self Pearl and then we're going to open that file Which is in Streams inferno perf data self svg all right So where is time spent? Well handle file. Okay, so it's kind of obvious that handle fire is going to be uh For most fun, you might also want to try some config stuff like lto Oh, yeah, I mean there are a lot of config options so you could turn on to make the binary even faster um, I sort of wanted to start this just As easy as possible, but let's see where we come down to like where is time spent? well After event takes relatively little time most of the time is spent in online Which makes sense because usually an event is not finished, but let's look at what happens there anyway So there most of the time is spent in looking up in the hash map Uh and constructing the join string. So that's sort of what we expected online There's on event line and on stack line on event line is taking a surprisingly long time This happens only once for every event, right? So these are running a lot These uh, there's on stack line. It's running a lot more than on event line So here it looks like there can be a pretty decent win Um, there's some string replace that's taking some time, but on event line captures On event line captures, huh so That regular expression the match event line. That's a good question. We don't actually know which one it is So this is the one of the things that's a little bit unfortunate about This setup is you don't know whether you are like If you call this function multiple times, you don't know which call So it could be Because we we do two captures here, right? We do one with this match and one with this match It does suggest that if we can get rid of some of these this is going to save us a lot Either we could try to parse this out um ourselves or Actually, it seems like all the work happens in captures, which is a little interesting I wonder whether Captures iter is a lot faster I wonder whether it has to rematch every time if you use captures directly Yeah No, that's for finding multiple So the work is actually in Yeah, so the work is happening when we call captures and we call it twice, which is a little sad Um, we don't need the trim end anymore because that's already happening elsewhere The question is which of these is the bad one? um So one way we can find out is by doing this It's sort of artificial So now I can just the the only thing I did was just change this to be so that I um I now know I'll be able to differentiate between these two Right to see whether we're spending significantly more time on one than on another Specifically what we wanted was Actually, what did we want what I wanted was? A pseudo perf script and now flame graph that one Okay, so if we now go back to online on event line Huh, so only captures really shows up Whereas this one is basically not showing up at all, which is a little interesting Maybe just gets optimized out entirely Also, that is only being used for filtering So it might be actually here's one thing we could try Instead of doing this We could just comment this business out entirely and just see whether that gives us any kind of performance improvement That is faster right before we got 23 and now we're at 37 So it's already a lot faster So if we look at online Okay, so most of the time is still spending captures, which is not terribly surprising We did see a speed up when we got rid of this This suggests that if we can get if we can avoid having to do reg access here It's probably going to save us a bunch of time So let's go down and look at this one, which we have to run for every event line Really what it's doing Is is looking for the first The first occurrence of a space followed by a digit maybe The question was whether there's a way we could turn the regex into a D fast instead of Of an NFA I think there's a Way you can hear I think there's a way you could like Set it to only do one but not the other Regardless though, I wonder whether we this is actually worthwhile To just have not be a regex Because the thing we have to do is actually pretty straightforward um Also separately this regex is wrong If you have Calm names that include numbers This would not work Like if this was not v8, but if this was just 8 and that was a part of the calm then This regex would be totally broken um So let's see if we can do this in a better way um We're gonna say that Here's what we're gonna do a state machine for parsing be better than regex I mean regex basically is a state machine for parsing You're right. There might be a more efficient one that we could write ourselves, but part of the part of the reason I use regex here is more that Ideally we shouldn't have to It's a lot more expressive, right? I don't really want to write manual implementations for parsers um So I think what we're gonna do here is We are going to Do this uh forward in line dot split white space If word dot Is there an easy way? It's almost always slower, but it is a lot more convenient. Isn't there a is numeric? So uh can split white space also give me the index Because that's really what I want And same with the split right split just gives me um split doesn't give me the index does it? I could just walk it character by character, but that makes me a little sad No, an enumerate isn't isn't enough here because I don't want the index of the word. I want the index of the character I'm at Um, I may just have to walk character by character to get this The upside is that I think I can do it in one pass It's just gonna be uh bear with me here Car oh, I already have that okay great So um That gives me Yep, here's what we're gonna do Uh So notice that I would never have written this code in the first place in the first iteration But now that we know that is one of the performance bottlenecks, then it seems fine um Here we could probably ooh We could use memcar here No, and obviously we're not looking for a specific string We could search for space, but like I guess I could just find space Uh split sadly won't actually maybe split will work And split should be decently efficient, right? um Because split I can give a single character as I can actually keep track of where I am Ooh match indices That's excellent Well I want split indices This won't actually Well, no, this will kind of work, but it's kind of stupid Because this will this will tell me where all the spaces are Which I don't Think really hell stick with this for now Uh last was space true c Is you're basically searching for the first all digit word Although This is gonna be a slash It's gonna be annoying c slash all digits This is definitely pretty hairy What's the point of all digits n equals true? Isn't that same as leaving it untouched? No, so if all digits is false it will stay false If all digits, it's true, it will stay true Sorry, you're right. What I want is I guess actually what I want is just this So you're totally right. It's entirely unnecessary There are still all digits if we were all digits You're right. It's totally unnecessary So this is going to be How word start is Index plus one Found an all digit word If maybe slash Right, wasn't that no found? Tid That's what we discovered And so really what we want to do is whenever we get to here we want to terminate Once we get to this point we're done because then I guess here we're gonna have something like let calm Let PID let TID and we're gonna set calm is going to be a line everything up to Word start minus one So calm is going to be everything up to where we first started seeing these digits Right, remember the lines we're trying to match are these Right, so we want we're going to walk up until this point So word start is going to be the highlighted character here, right? And calm is going to be everything up to the Up to but not including the space before the current word And then if there's a slash then we have a PID and a TID if we don't have a slash we just have a TID Right, so if there was a slash, I guess this instead of being maybe slash this should be saw slash or contains Which I guess we could just search for So this could be I guess that's probably not terribly important We could just search the word for that when we find it but Instead we could also do a contains slash Fun with parsing So if it contains the slash, so I guess if let is contains slash at then PID is going to be Line from word start to slash To but not including slash and TID is going to be Slash plus one to index So everything up to but not including the index of the white space we're currently at Otherwise PID is going to be question mark and TID is going to be Uh word start until The space we're currently at and calm is going to be everything up until then and now I guess The real problem is that these were not guaranteed to ever get to these So the only real benefit of regis is that you write less code. Yeah You get much more concise code and very often performance doesn't matter But you get very nice and concise code sometimes so concise you can't really understand it But it is really nice So that that is definitely the advantage and sometimes they can be implemented really efficiently too The problem is given this loop we can get here and not have um And not have initialized competent If we get to the end So I think these may have to be Mute which is a little bit sad I guess fields is what we would do Is none and then we would say here that Then we're going to set fields is some PID TID call me break and then down here. It's really if let some fields On PID TID Is fields do that we've already dealt with those um Otherwise Otherwise do that. So I think in theory This does all the parsing that the regex did but without the regex Whether that works out in practice. I guess we're about to find out And let's just for our own sanity's sake Check that it still produces the right thing Whoa Something crashed I think our parser isn't correct Um, I guess we should go back to It's saying we have some weird line Weird event line line. Why is it a weird event line? Well Also, why are they not being given new lines anymore? Oh All digits should be true when you find a space You're right Yeah, you're right. It doesn't mean though that if you get a double space it would get really confused I sort of wanted I sort of wanted it to be false because until you've seen a digit There are no digits. Um, I guess maybe here what we could do is uh No, this will be sad if Um Actually, though there is a way we could deal with this and that is Um down here if the Uh, I guess we'll leave it like this for now and say um Two spaces in a row Would see all digits equal true I guess we'll see how this turns out right now run it. What's it gonna say? Okay, well that seems better Uh, now I guess we'll have to do the the vim diff to see So it will seem slow Okay, so this still seems to have produced the same right Yeah, we didn't change any of the outputs. That means that it is now still parsing things correctly. Um So now I guess we'll look at the pv output of that Yes, we'll see that's not even all that much faster No, it is actually. No, it is. Um This is the speed we were getting when we commented out the inner reg X So this actually is faster But I mean let's check how the how the um flame graph turns out flame graph So online on event So this is pretty promising on event line all of that's time is now spent in Captures, we've got rid of captures the one but not the other right? So this is saying that the inner one is still taking a bunch of time Now remember how we commented that out to see Let's see if we comment it out now I think this is gonna be pretty cool. So if we now do this That's a pretty decent improvement right So we went from basically from Uh 29 or 30 or something to 41. That's pretty good Uh, let's check the flame graph now. Of course now I've commented out that captures, right? So the inner captures will still have to do something with But now online Wait, why is on event line still doing a captures? Oh wait, what? Am I just missing a captures here? Am I doing something very stupid? Am I like running the wrong binary or something? I don't think so Then why is it showing captures running inside on event line? There shouldn't be a captures there anymore The only captures now are in on stack line So why is that showing up at all? I see why The flame graph has been wrong all along This goes to that script This looks at folded this needs to output to folded So actually if we bring this back now Actually, no bringing that back doesn't matter but So we build this we run it with that that's gonna put stuff in perf dot data That is definitely a speed improvement though. And remember this was only matching on the event lines if we do the same for the Stack lines that should be pretty severe Uh, so this is going to go into self Pearl folded I ought to do it Yeah, so look how on event line is basically gone now Because of the optimization we made So I guess let's uh Let's put this one back in And deal with so if we look back here, we'll see that in on stack line captures is also by far the biggest offender So let's look at match stack line So that's what that does This one is almost simpler Because it's really just Uh Yeah, it's pretty severe Um, it's really just removing leading white space, which we can do very efficiently Um looking for anything that does not contain a space Pretty much Then looking for Doing a greedy match until it hits Uh some stuff without a space inside parentheses. So this should be very efficient to parse So we're gonna say We might even just actually we can just split white space this this is trivial Um, we yeah, this is very straightforward. Um Um The address of the I guess, uh pc is gonna be line trim start split white space Module is gonna be line Uh, is there a split reverse white space? Our split white space Uh, there is in this there fine. Uh Uh So we can just use split that that's fine So pc is going to be line trim start Split on space Module is going to be line our split on space next Um And Uh Raw funk is going to be trim start From pc dot len No, we can do this even better Uh, we'll just do We'll split this by two Uh, so there's a split n Which splits, uh Only that many times so it's going to split only Uh into two pieces. I think n is the number of pieces Yeah, so it's going to return two pieces specifically It's going to run into the pc and then the rest of the line, right? So this is going to be line is going to be this And then pc is going to be line dot next Uh, yeah, I'm using both rls and um Uh and racer. I think So line is going to be that pc is going to be line dot next Um then Then line is going to become line dot next I sort of want to try around this because they're the otherwise we're going to have to chain it Which is going to be annoying Uh, then module is going to be r split and then it's going to be this Dot unwrap please excuse the unwraps for now r split two Module is going to be line dot next unwrap And raw funk is going to be line dot next So maybe this is kind of weird, but let me see if I can explain it um We get the line in and we remove the leading spaces because there are like tabs at the beginning of each of these lines Uh, if I this is go up a little right So there's some spaces at the beginning of each of the lines We trim that then we split by space into two pieces which is going to be this piece and this piece PC is next of that piece line is then set to this piece the second piece That second piece we split from the right into two Into two pieces on space So we're going to split from the right on spaces So it's going to give us this is the first thing and this is the rest Right, so module is the first and raw funk is the next And that's then going to work even if there are spaces inside of this Oh, that's so weird that the youtube live stream is Apparently the youtube live stream is really bad quality Um, I mean the recording is going to be just checking that i'm recording Good. Yeah, uh, the recording is going to be full hd. Anyway, but It's really weird that the youtube live stream is so bad today. I don't know what what that's about. Sorry about that um Is try on stable It isn't is it 2018 edition? Oh, I don't think so Oh weird We don't get try do we As in try catch I don't think so Because that would make writing this a lot nicer So one way to do this is This is really stupid Okay, we're gonna have f n uh Get or just maybe stack and it's going to take a line which is going to be a string That's going to give you back A string a string But an option of that and then Because now I should be able to do this Uh And say that this is going to be a question mark This is going to be a question mark. This is going to be a question mark. This is going to be a question mark And this is going to return some module raw funk pc raw funk in string order And now here we should be able to match on Uh in parts of line and if that is some Uh pc raw funk And this is going to be if let instead because That will save us one indentation I had to manually select 1080p option on youtube Maybe that's in then Unwrap or default. No unwrap for default is not what I want because I I specifically don't want them to be empty. I want I want none from the entire thing if the line doesn't match Uh specifically what we're taking advantage of here is the fact that question mark works for option Oh split uh split end What do we think? That's a lot faster You see how much faster that is I mean let's just for sanity check that they're the same Not quite the same Specifically looks like there are no unknowns anymore, but that is a lot faster So the question is why did unknown not work? It's a good question curve Actually, let's look for one of those lines Uh This is probably what I'm after then I sort of want like Actually, you know this less perv data Script did I really not copy that? PC off civically I want here Uh, I want to search for like this business and look at how those lines are special odd data dot script What oh right There aren't any semicolons So why did that? Why is there any unknown involved there? Oh, it's uh from the context right so let's give some context here for this So unknown unknown Doesn't get parsed right Yeah unknown never gets parsed out here So the question is why? It just gets turned into a single open square bracket So something here isn't right Let's see what this gives us So this is going to give us A ton of output, but oh, yeah, I could use debug. Uh, actually debug would be a little bit annoying here, but I could Um Oh, it's because we're we're not supposed to include the parentheses around the module So specifically, uh Uh Next dot trim and then there's a Is there a trim matches? Yeah trim matches trim start Is there a trim start one if module starts with I guess You know what? No, we actually problematic we do actually need To only trim the one character Because otherwise Or maybe it always contains one. How do I know? um I guess module always Is always wrapped in So remove um, so what we're gonna do is say module is equal to one two module dot Lend minus one Yep, that did it Okay, now let's try to perf that Remember the old value was 30 megabytes We are now currently faster than perf script. So I'd say we're doing pretty well um flame graph that And now let's look at the flame graph See now it's a lot more Sort of diverse In fact now a bunch of time is spent in on event line We're seeing some like string allocation show up Buff read line is showing up. So this means we're doing pretty well I think this captures an on event line is worth getting rid of So on event I guess actually This should also be in its own function What do we call that stack line parts is going to be event line an option What are the things Compited it's just basically the same string string And then otherwise That makes me feel a lot better And then now we almost can get rid of our dependency on regex And lacy static Go away. We don't need you here We're in performance land All right. So now this one this should be straightforward All we're looking for is Uh, we're trimming the stuff at the end, which already happens and then we're looking for Uh The last word So this is trivial. This is just line R split n Two by space Next event is going to be captures dot We want to get rid of the colon is all So I guess we could we could if we wanted to be really strict about this we could do something like if captures dot ends with colon Chers is is captures everything until captures dot Len Minus one except it's not captures it is event If event ends with Like so Incorrect close the limiter Oh How much faster do we do we think it is now? I mean, we're still gonna check that it does the right thing And also keep in mind. This is while running it through perf I don't know. This is so much faster. It makes me very happy. Um So we're gonna do a vim diff just to check that we haven't Screwed anything up. Nope. That looks all good Uh Do this and then we're going to do flame graph Now what flame graph? Oh So much faster. So now notice um On event line is there so After event is now becoming pretty large because It is doing hash map operations We're seeing string extend Happen to desamount because it has to reserve Ah, so that we can do even better so after event So if you remember, um in after event event, we do this thing where we have to construct the string um, we could do a lot better here by Uh pushing all the strings onto a single one Or onto Basically reserve ones instead of continuously extending that that uh string because that's going to do a lot of allocations So here what we could do is um S dot reserve So on string there's a reserve Really pretty sure there's a reserve Wait, this is No, this is string Really, isn't there a thing on string to Reserve space for it Ah there Yeah Great Right. Um, so with this In fact, we do reserve exactly if we want We know how much more we're going to stick on there, right? Um Stack dot iter dot map string length for each s s dot len plus one Actually, that's going to be a fold for each Does reserve in terms of unicode characters. Uh, I think it's in terms of bytes Um, but len also reports bytes if I remember correctly Yeah in bytes and reserve is also in bytes So reserve exact we basically want to make sure that we have room for all the strings. We're going to push so that way, um, we won't have to We won't have to Continuously reallocate more memory to the string as we go here If we want it to be really fancy, we could do that in here, too Because we know that this will always be the first string. In fact We're going to do some performance engineering. Why not do it properly? Um Allocate a string that is long enough for To hold the entire stack string capacity And that's going to be self p name length plus this and then p name dot extend Self p name or I guess push string is fine. Make sure we don't Of course, if include p name was not an option, then we would just always know that we could use that string Deallocating What else was showing up in our flame graph? So after event that entire string thing now will hopefully just sort go away um Readline we can't do all that much about although it is spending a bunch of time in read One thing we can do is if you look at rust buff reader When you create a new buff reader Here you can say with capacity You can ask it to read more bytes at a time than what it normally would do So by default it reads eight kilobytes at time But we might as well tell it to read more So for buff reader With capacity Why not do this right? Why not only do the fold once so the reason we don't do the fold only once? um, it's because This code doesn't know that this code will execute One option would be to get rid of include p name as an option Actually, let's just do that. It's always set to true anyway Um, so stackster is going to be that and then This is going to be that This is going to be stackster So now let's see add the Calm name the other stack entries if any and then count it Let's see what that gives us so now if we run this 209 megabytes per second remember where we started we started just around 20 Just putting that out there just letting you know that this is fast Now what's going to show up in our Okay, so there's still some buff read, but that's sort of unavoidable um Perfandal file, what's in after event now? Freeing memory Ah, so the freeing memory is probably Dropping the string because the entry is normally in the map But once like once your problem is dropping memory I feel like it's not that problematic. Yeah, so you'll notice here now push string is not doing any allocation It's just pushing So online I think is the one we can do the most about and there there's like on event which does string replace and splitting An event line parts which does splitting so that I don't think we can do too much about on stack line gets executed a lot What was the upper limit on the i o? So that would be this The upper limit on the i o is a lot more Also, keep in mind this was running with perf. If we run it without perf it might even be it might be faster Yeah, not by much actually. It's about the same Uh, sorry, what were you? Yeah, so the question is what we can do about on stack line Um Our find I can't do too much about this formatting It's kind of sad stack line parts is a splitting tidy generic Uh has to do some allocation which is a little sad So this is like allocating strings. This is allocating strings This is just searching. So we probably won't get rid of that too easily um This is formatting the string. So where does that come in? That's this business Here we can do a little bit better because this Doesn't need to go through format every time Notice also that this entire function this with module fallback is inlined. It doesn't even show up here um So I guess There's no other format, right? Yeah, this has got to be this one Specifically we have include adders off. So we're hitting this format a bunch Uh, and that's a little unfortunate Although the overhead of right shouldn't be that much um Might want to try the hash brown crate in place of hash Uh hash brown instead of hash Yeah, although like There just aren't that many hash operations, right? It's only this tiny bit over there It's the only thing we're really doing hash operations Uh, also the other thing to keep in mind um is For this Remember how we benchmarked the performance of perf Let me see if I can find it here like perf script can only emit things at a very low speed So the bottleneck is actually perf script now Perf script is actually probably a little bit faster because now the cache is cold But the bottleneck is definitely perf script LTO true Oh just to see I mean we could it might actually um So, uh LTO is true see struct up is bringing in all these I guess I could turn on thin LTO instead Um Didn't matter that much although at this point our script is too short to really measure anything useful But it doesn't seem to have made that much of a difference So I'm gonna leave it off Okay, so, uh I think we're gonna end the stream there because we've been going out for about five and a half hours Um, so I think we're gonna end the stream there. I'm gonna push these out Don't use reg x Get push So now you have seen both how to use perf What flame graphs are and how to try to write something that's more performance oriented. Hopefully that was interesting We will probably do another part of the stream I'm gonna say probably in two weeks Um, I know there are other things have been voted on too But I think it's worthwhile to sort of finish this one up and do the flame graph part as well Um, and especially because we have some things left like inlining That said I'm gonna do the same thing I do I'm gonna do the same thing with this project as I've done with the other live streams. I've done which is Uh add a bunch of issues to the issue tracker on github So that you can go in and like Please do submit changes to this like if you decide that I want to implement the inlining or I want to make the Parsing even faster. I want to get rid of the string allocation Like whatever you think of feel free to submit prs and I will look at them and read them Um, and sort of give the best feedback that I can um And so maybe by the time I get back to this like All of those issues have been solved. That would be really cool, but it's up to you. Um Um, let's see Just to answer the final few questions isn't lto on per default now Uh lto is not but thin lto. I believe is because it supports incremental compilation whereas full lto does not um With optimizing with flame graphs you generally iteratively take the largest bar and optimize and continue pretty much although There are some things that are Basically profiling is really hard because sometimes the thing that shows up the most is not necessarily the thing that you need to optimize um So sometimes what you want to do is run things like blocking profiles um There are some cool papers on how so one paper that was written a few years ago that was really cool was um This tool um There's a talk and a paper about it too, which basically tries to figure out What is the actual bottom like up for your application rather than just where is the cpu spending the most amount of time? um Maybe publish the crate to prevent someone stealing the name I mean i'm not I don't particularly care, but oh this it's gonna yell at me for licenses and stuff. So i'll deal with that later um Also if someone takes the name Like i'll rename it. It's not that important. Um Great, it looks like it was fun. Uh, hopefully it was instructive. Um I don't know whether any of you will actually end up using this in part because it's still missing some features as we as we know Having gone through it But in general this should be pretty close to the real tool and as we saw it significantly faster Um, and this is one of the things that's nice about rust is it makes it really easy to write these kind of Things in a high performance way whereas with the pearl stuff like you could probably do some of the same But it would be pretty painful In any case, thanks for thanks for joining If you have other questions after the stream like if you're looking at the recording later Leave comments on youtube send me an email or or Sort of tweet at me or i'm also on um on mastodon as well So just sort of reach out and I will do whatever I can to sort of get back to you and discuss Thanks for watching. My guess is the next stream will be in about Two weeks or so So thanks for watching. Bye