 Hello, do you hear me? Hello. So I'm a father working for a console in Munich in Germany. And today I'm going to talk to you about how to implement tail minus F, which is actually part of. It was a hobby project of mine. So I wanted to monitor some system. And I needed to parse some log lines. And so I sat down one evening and thought, yeah, I need some functionality to read new log lines written to a file and some other functionality to parse the data and extract some meaningful metrics out of that. And initially when I started, I thought it would be like 15 minutes to just have a small tool to read lines written to a log file. And it turned out that it is actually much more complex than I initially thought. And so I want to share with you the experiences I made. When you start, you need some functionality that reads log lines while they are added to a file. Especially if you don't use Go every day, maybe you start by just googling how to read a file line by line in Go. And if you Google stuff like that, you would find a piece of code like this. Like you open a file and create a buffed I own new reader. And then you read it line by line. And in that case, you just print it to the standard output. So this is obviously more like a CAD functionality. So it just dumps the line to standard out. And then it terminates. So how would we go from this example to some kind of tail minus F functionality? Of course, the thing is here, very exit or loop reading lines. What we need to know is that when we read a line and we hit the end of the file, then the end of file is actually returned as some kind of error. So in that case, when we get an error and handle it, we need to make some distinction. So if this error is just the end of file error, then we probably sleep a second and then try again. And only if it's an other error, we can terminate the program because something weird happened. So that's as easy as I thought it would be, but it doesn't really work. I can show you how. I have here a small log file with two lines in that. And I have the code that I just showed you. And if I say go run tail.go, it kind of works, prints the line to standard out. So if I echo date and hello, force them, for example, to this log file, it also works. The new line appears here, works again. But the problem is with log files, usually you have a tool called log rotate. And so you don't want to have your log files grow infinitely. And so usually, once per night, log rotate comes, moves your current log file into an archive, and creates a new log file for the next day. So what happens if I do this here? Move file log to some file log.1, for example. And then I write my next line to a new log file. So I have a new log file here now with my next line. And this is the archive. And this tool here still reads from the old file that was actually moved. So if you are into Linux and file systems, it's probably not too surprising. So you open the file, and then it has an open file handle. And the open file handle points to the data on the disk. So even if you rename the file or close it, and even if you remove it, the file handle still points to the data on the disk. And you're still reading the same data. So there's no way for this kind of tool to know that your file was actually moved or a new file under the same name was created or something, right? So what do we need to do to deal with that? First of all, quick introduction to log rotate. It's actually pretty simple, but there are two configurations, one that I just showed you, that would move the file away and create a new one for the next day. And there's an alternative configuration which would copy the file to an archive and then truncate the original file. So it sets the size of the original file to 0. So the difference is that here the file is actually deleted. And here it just remains there. It's just truncated. So but I wanted to support both configurations because you never know how somebody configures log rotate, right? So if you want to extend our program so that it can deal with these things, what we need to do is like here where we wait a second and then try again if there are new lines. We somehow need to check if the file was truncated. We need to check it if it was moved. And if it was moved, we close it and open it again so that we make sure that our tool points to the correct file that we actually want to monitor. How do you do these things? It's actually quite simple. So there is this stat system call that gives you the size of the file. And with the current size and your current read position, you can know if it was truncated or not. And in order to check if the file was moved, there is a pretty handy function in the goes OS package. It's called os.samefile. And this gives you true if it's the same file and false if it's not the same file. So in Linux, it's probably compares iNodes which represent the file uniquely on disk and on Windows it does other things, but it works, right? So, and if we put all this into our solution, we come up with something that actually works and that's also implemented in some tools. For example, maybe you know this, there's a tool called FileBeat. It comes from Elasticsearch. So Elasticsearch has a bunch of small go tools. They are called Beats. And these Beats are used to take data and transport it to an Elasticsearch cluster to index it there. And there's also a FileBeat that would be used to index log data in an Elasticsearch cluster. And that's implemented exactly the way that I just showed you, right? Actually, I mean, it works. It's reasonable, you understand what it's doing, but I thought it's not really satisfying because you still have this like one second sleep somewhere in the code, right? And you'd never know, I mean, is one second a good value or why don't you sleep like 100 milliseconds or 10 seconds or something? And actually the FileBeat by Elasticsearch, they don't have a fixed value that they just leave the configuration to the user. So you configure an initial sleep interval between the poles and then if nothing happens on the file system, you configure how much the sleep interval increases and then you have some maximum sleep interval and stuff and you need to configure all these things and I guess most users don't really know what to put there. And so I researched a bit and thought, can we somehow get rid of this polling loop, right? And if you research for that, you probably come across about this little package here, it's FS-Nodify, File System Notification Library. It's also used by some similar tools. So if you start doing this, you find out that there are a lot of file tailors around. There's one by HP, for example, it's called Tail and it uses FS-Nodify. There's also one by Google called MTate doing a similar functionality and what FS-Nodify does is it uses like every operating system has some kind of functionality so that you can subscribe to file system events. So your program is notified when something on the file system happens and FS-Nodify is a library that gives you these events, like operating system independently whenever you're subscribed to some directory, right? So how it works is like this, you initialize some FS-Nodify watcher and then it gives you some events channel, you read from that channel and this read just blocks until anything happens and once something happens, you just get an event and you can handle it, right? And the events are like this, like create, some file was created, right, is when somebody wrote something to a file, remove is when somebody removed the file and changed mode when like add file attributes have been changed and so on. This looks exactly like what we need to do. I mean, we don't have this nasty poll loop and we don't need to configure anything and we could just sleep forever and once a log file is written, we get it immediately. However, there are some subtle problems with this FS-Nodify library. In order to understand these problems, you actually need to know what it's doing underneath. What is this operating system infrastructure that FS-Nodify uses to implement its functionality and on Linux, it's called i-Nodify. So that's the Linux system call that you can use to subscribe for events in a directory on a file system, right? And this table here is just for an overview so these events on the left is what actually comes out of the Linux i-Nodify call and the events on the right are what it's mapped to in the FS-Nodify library, right? Interestingly, if a file is truncated with this other log rotate configuration, the truncation triggers an i-N-Modify event in i-Nodify which is mapped to a write event in FS-Nodify, right? And one important thing to note is FS-Nodify watches not a single file, it watches a directory. So you subscribe to the directory where the log file is located and then even if it's moved, a new one is created and so you get events for everything that happens in this directory. Another system, BSD, like for example, MacBook here. You don't have Linux, you don't have the i-Nodify event on BSD, there's something similar. It's called K-Event. This is what FS-Nodify uses there. It also generates some types of events. Interestingly, when the file is truncated, it generates an attribute change event which is mapped to change mod in FS-Nodify. So you have kind of different events in FS-Nodify when the file is truncated. And the other interesting thing is K-Event does not support recursive directory watches. So what happens is it kind of simulates this. So underneath, it subscribes the directory where the log file is created, is located and when the file is removed and a new file is created, it gets the create event from the directory and then immediately it subscribes to the new created file so that you get write events when somebody writes lines to this file, right? But there's a small time gap between that. So if the log file is created and immediately after that the first log line is written, then you might miss these first write events because there's a small time gap between the file being created and FS-Nodify having subscribed the new file, right? On BSD. And the problem is that these subtle differences, these subtle operating system-specific things are not really well documented. And also FS-Nodify doesn't claim to be a complete solution. It just claims to be an experimental package or something and so in order to really use it and understand it and get rid of all these errors, you need really aggressive testing because you need some test that does a situation like that. I mean, in order to figure out that there's something wrong and you also need to read about the actual system points to understand what's going on. And so I thought, I mean, the functionality that I really need is a lot less than FS-Nodify actually does because I don't want any kinds of file system events. I just want to know if there was a log line written or not and I'd need to understand the underlying file system points any way to figure out what these subtle differences are. And so what I ended up with instead of having my file system events and then being them mapped to FS-Nodify events and then I have some code working around differences between operating systems, I just used the underlying operating system points directly, right? And it turns out that this is actually much more easy than I thought it would be. So both are structured in a similar way. I mean, the whole thing works for Windows as well. Windows also has some system calls. I just used BSD and Linux here as an example. So you have an initialization phase where you just initialize your watcher, watching for file system events and then you start some kind of producer loop so you start some go routine that runs in an infinite loop. And on both systems this initialization gives you some kind of file descriptor where you can read the events, right? And you read them on Linux with the regular read system call on BSD with some special K event system call. Then you figure out if the event has anything to do with new log lines written to the log file and if so, you just can provide the new log line to some kind of output channel. And then you have like this piece of code as some kind of operating system specific producer giving you an output channel where you can read the new log lines and the rest of your code is then operating system independent consumer subscribing this output channel and just doing whatever it wants to do with the log data. So this is basically the solution I came up with and it's generally like a lesson learned so if you want to interact with your operating system it's sometimes much more easy to read what the operating system does instead of being afraid and looking for some abstract libraries or so. There's little subtle more thing that I then added if you want to just keep your Taylor running until your program terminates you could actually finish here but if you want to have some clean shutdown so maybe you want to make it into a library and you also want to have a shutdown function in a library so that it can stop tailing this log file you need to somehow find a way to stop this producer loop, right? And stopping this producer loop is the critical parts are the two blocking calls here it's the first one that you read from the file descriptor because this read just blocks until an event happens and so if you want to shut it down but it reads there you need to find a way to interrupt this read and the second place that's blocking is when you have log data and you want to provide it to the output channel but you want to shut that thing down and nobody reads from the output channel anymore you have to find a way to interrupt this and it's actually interrupting this read here is quite easy you just have to read the documentation of this file system not to watch us because on all systems there is some method to unsubscribe the directory that you are watching and once you unsubscribe it it triggers an artificial unsubscribe file system event and this will terminate your read and then you know you can get out of your read loop here so you read and then you say if this read returned some kind of unsubscribe file system event you know that you have been interrupted and you return so that's how you can get out of this read system call last thing I want to show you how can you get out of this writing to the output channel thing this has nothing to do with file systems or anything it's just a standard go thing I think but it's still if you do it the first time it's a bit tricky to figure it out what you do is instead of just writing to the output channel you have a second channel which is called done and yeah once this so and within a select loop you write to the output channel and read from the done channel at the same time and once the done channel is closed reading from the done channel will return immediately and then you know that somebody closed the done channel it means yeah you're kind of signaled to terminate the producer loop right so this is you find a lot of these examples but for some reason you never find examples where you write to a channel and read from a channel in the same select statement so you find a lot of examples where you read from multiple channels but it's also possible to mix writes and reach in the select right and that's what you need to have a safe way out of such a producer loop so that's basically my final solution so if you are just interested in looking at the running code you can just take out this little monitoring tool for log data there's a package called tailor that just contains the code that I just showed you yeah so lessons learned I guess don't be too afraid of using system called directly there tend to be much better documented than some experimental libraries and it's also the lines of code you need operating system specific are not so many after all and yeah so thank you very much for listening um... I think maybe I even have two minutes or so for questions is there a question okay thanks oh yeah network file systems on network file systems uh... as he says if I notify on Linux works on network file systems but I'm not sure either I didn't try I just run it on run it on a local file system yeah okay thank you