 Well, welcome, thanks everybody for coming out. I know it's always fun to have the last talk of the day right before the beer bash. So feel free to, if you decide what I'm saying is boring and you just want to go open a beer, just make sure you got one ready for me when I get there. So I'll be heading there shortly after we get done here. Anyway, so hopefully you're here to hear a little bit about fleet monitoring with IoT subtitled pets versus cattle. I kind of stole that subtitle, but I think it's kind of appropriate here. Just real briefly, I'm Drew Mosley. I've been doing embedded Linux stuff, I don't know, 15 years and 25 plus years in the embedded space. Currently, I work at Toradex as a solution architect for our Verizon platform, which is an industrial embedded Linux platform with a container runtime over the air updates, all those fun kind of things that are kind of necessary for building out these systems today. So briefly, what we're going to talk about today, start with the definition of what I mean by fleet monitoring because it's one of those terms that's way overloaded and it means all sorts of different things to different people. So we'll kind of define it a little bit, so we're all on the same page. Talk a little bit about what I see is the general architecture of fleet monitoring solutions and how it's a little bit different in the internet of things. And we'll review some of the options guided primarily by the systems that we investigated for inclusion in our horizon platform. And then we'll finish up with a proof of concept implementation with a Yachto layer that I've got published and we'll go through some demo stuff and hopefully if you guys are interested, you can make use of the layer that I published and I should hopefully we've got enough time to go through all that. Just real briefly about Toradex, we make hardware, if you're not familiar with us, we make system on modules based primarily these days on IMX six and eight chips. We have offices all over the world and just because I always like pictures of fancy hardware, this is our basic product portfolio. We have three product lines. If you want more information about that, we've got plenty of time tomorrow to discuss it. So question, how many of you guys have heard the term pets versus cattle? All right, so most of you. As I was starting to do the research here, it was a term originally coined by a guy named Randy Bias. The link there is a blog article he wrote describing it. It really came from the enterprise computing space where when people were starting to look on servers as essentially things you can just throw away and bring them back up. That's the whole idea, pets versus cattle. I always am happy to have a reason to put a picture of my dogs up on the screen. Obviously, your pets, you treat very differently than cattle. Pets are each individual, you treat them individually, you feed them what they need based on their medical needs and that kind of thing. Cattle, they're pretty much interchangeable. You can't be doing individual maintenance for every single one of these machines when you've got 10 million of them out in the field. It just doesn't make sense. So to modify it a little bit in the IoT space, you can kinda look at the pet designs as your kind of weekend projects, your home automation thing, my 3D printer where I've got a Raspberry Pi connected to it. When I need to update it, I SSH into the Raspberry Pi, I do the app to get update, and I put my hands on that one physical machine and I very carefully and meticulously keep it up to date. That doesn't work, it doesn't scale very well when you're looking at large fleets of identical devices. So in terms of the large fleets of identical devices, you want them to be identical, you want the machine, the binaries that you're installing on them to be identical across all machines so that if one dies, you throw it away, you get a new one and you put it in place. So where does fleet monitoring come in here? So even though they're intended to be identical and replaceable and all that, you still need some mechanism to determine when things are going bad, right? So you have to have some kind of system built in into your software to allow you to monitor the devices, monitor the fleet, get telemetry, that kind of thing and start to see when you are having issues with certain devices and whether they can be repaired or they just need to be completely replaced. So, and if you have time, I encourage you to read that blog article, it's not very long, it's probably a 10 minute read, but it really goes into a lot of detail and explains a lot of things about the history of the whole idea. So just a little bit more in-depth on what fleet monitoring is. When I started looking at this, I did what I always do and I go to Wikipedia because that's the font of all information. The only problem is if you type fleet monitoring into Wikipedia, you're gonna get a lot of stuff about maintaining your vehicle fleets, your delivery trucks and that kind of thing and not so much about software fleet monitoring. So, we kind of had to kind of make it up a little bit as we go, but in general, the idea is periodic monitoring of some amount of data from all the devices in your fleet. It could be things like gathering log information, CPU usage, memory usage, things like that, pretty straightforward stuff or it could be very application specific stuff depending on what your needs are or your particular workflow. And some ability to analyze and visualize that data and you hear the term a lot of single pane of glass, the idea is you've got a web dashboard somewhere that gives you an overall sense of the health of your entire device fleet. And in terms of, just in terms for this, for purposes of the proof of concept that I'm doing here, things that are out of scope are things like remote access, remote control and then very use case dependent features like machine learning, AI, big data type stuff. I'm not going into that here, but most of these systems obviously can be expanded into those areas as needed. So things that you might want to monitor with a fleet monitoring solution, first and foremost obviously is device health, are the devices online, are they offline, how long have they been up? What's the status of the core services if you're running containers or the containers up and running which system these services are available? Some thermal, that's always interesting. The things that usually, and that we'll see in the proof of concept, things like CPU and memory utilization, how much flash and network usage you've got and then obviously device configuration, what version of the kernel am I running, that kind of thing. Some other things, device status changes, the system's working and all of a sudden you get a failed health check, you might want to get some history of the information and be able to look back and see if you can figure out what happened. If there was an over the air update of some kind and it failed, how are you gonna be able to troubleshoot that? Logging is always something that will eventually make its way into one of these systems. And then just, I did want to briefly mention some non-functional requirements that if you are looking at some of these solutions that you might want to consider, obviously at this conference, I'm pretty sure we all want it to be open source. It may or may not be a requirement for you. These are an on-premise version of the server side versus a hosted version, what kind of performance and resource requirements it has both on the client and the server side. And finally, how easy is it to extend it and integrate it with other services? Because chances are pretty good. Any of these systems that you work with, they're not gonna meet all your needs and you're gonna have to need some ability via over APIs or whatnot to enhance the services and integrate with other systems. So the general architecture you see here, and I know everybody's very jealous of my diagramming skills here, but since most of these systems have come out of the enterprise computing space, you'll see that they're able to take input from a lot of different sources. SNMP is a big one. Local files are just items that are generated on the device itself. Cloud APIs, you'll see a lot of off-board inputs that are pulled into the systems. You'll see on-board inputs where you're just pulling telemetry from the device itself. Then there's some concept of filtering generally in each of these systems. You might be filtering on the device to just limit the amount of data you're sending back. There might be some kind of filtering off the device, say, in an edge node somewhere. And then an important component to most of these systems is the ability to have multiple output sources. So in the proof of concept we're doing here, we actually are using the Kibana data visualization framework, but you could also send it to other outputs. Our Verizon platform takes in some of this data and we're able to display it there, but it may or may not have everything you need. So the ability to actually get into the system and be able to have multiple output streams is very important. Now how is this different from the IoT architecture? Generally speaking, at least for the kinds of designs we work with, it's much more homogenous. All the devices are identical or within a couple of maybe two, maybe three device personalities, you don't have near the flexibility that you would have in say an enterprise, big enterprise deployment. And in most cases, you're gonna see that the inputs are much more limited. You're not gonna be, generally speaking, pulling SNMP data from a consumer IoT device that you buy at the local Best Buy and plug into the environment. Normally with what we see in IoT deployments here, all the data that is coming into the system is coming from the devices themselves. They're not really reaching out and interrogating the local environment to send information back. So it's a little bit simpler in that perspective. You still want the ability to have filtering from within the device as well as the multiple output streams. It's pretty important to be able to do that. So a couple of the options that we looked at just wanna quickly, briefly mention them here. These are the ones that we looked at and pretty quickly decided that they were not a great solution for IoT. Primarily for all of these, it was due to their on-device footprints. They're all very large systems designed for big iron systems where you got plenty of space and memory and that kind of thing. The first one that we looked at was one called Nagyos Chi. It's a very full-featured system. It does use a lot of SNMP. There's also custom agents. It is hybrid between open source and commercial licensing. The demo server, I've got the link there. You can actually log in and you can get an idea of how much functionality is available in this system. Yachto recipes do exist, so it is feasible. You could actually run this in an IoT system built on a Yachto device, but it's gonna take quite a bit of ram and flash to be able to do anything with it. Similarly, ElasticStack. If you haven't heard of this one, it's pretty common. Elastic Search, LogStack, and Kibana are the three main components of it. It has a lot of input plugins. You see a couple listed here. It has both on-prem and hosted options. It is dual-licensed under Apache. It does have a relatively large device footprint. They do have something that's relatively new called Beats that I didn't really spend a whole lot of time researching, but the idea is it's supposed to reduce the on-device footprint with smaller agents and that kind of thing. And we are actually using, in the proof of concept here, we're using the Kibana portion of this for the proof of concept. That's actually running on the server side, so the fact that it takes a decent amount of memory and disk space is not as big a deal. And one of the things I wanted to mention about this is when I started the research here, I kind of assumed there would be client-side solutions and server-side solutions, but in most cases, it's really kind of an integrated thing. You'll find Elastic Stack, the Elastic Search with LogStash and Kibana, they're kind of a combination. They all have APIs, so you can mix and match as you need, but for the most part these systems are fairly full into end-to-end client-server solutions. Another one I'm sure we've all heard of is Datadog. They're the only ones that specifically mention IoT monitoring on their website that I could find. I've got the link there. Exactly what it means, I'm not sure. I didn't spend a whole lot of time with it. Xavix, the first bullet here, that's actually their description of their system, obviously enterprise class. It wasn't really terribly interesting from our perspective, but it is fully open source. Another one that there are Yachto recipes for, so take that as you will, you might be able to make use of that in your design. The last one we looked at is Splunk. They're the quote, data to everything, platform, powering, security, IT and DevOps. A lot of words in there. It's a very big, very heavy solution that can do a lot of stuff, but it was way overkill for our needs. So I seem to have reordered my slides here. So this was supposed to be the next slide. So a couple other options. We looked at one Telegraph and Influx DB. I know I spoke to some of the Influx folks today. There is both on-prem and hosted, MIT licensed. It is written in Golang, which is nice because that means it compiles down to a single binary, no external dependencies, makes it very easy to figure out what you've got to put on your device to actually get this thing working. But 110 mega flash is quite a bit when some of our systems on modules only have 256 megs of flash on board. So it's kind of hard to justify half of your flash for just this fleet monitoring solution. Again, Yachto recipes exist here. So this is actually usable in some of the larger systems. And the final one that we investigated is called Fluent Bit or Fluent D. It's an open source Apache. It is part of the Cloud Native Computing Foundation, which is good, we all like that. There's really two options here. There's Fluent D, which is the smaller of the two. It's written in C and Rust. It's only got, sorry, this is the bigger of the two. It's written in C and Rust. You see, it's got a lot of output and input plugins. It does depend on RubyGems, which makes it take up a little bit more space in the system just due to all the transitive dependencies and that kind of thing that can get pulled in. Takes up about 40 megs of flash. Then there's also the Fluent Bit client, which was specifically written in C and designed to be small for these kind of deployments. It's a lot less choice of input and output plugins, but realistically not that big a deal since we're not trying to be as flexible as a full enterprise class system. It doesn't have any extra role dependencies, which makes it easy. Three megs of flash, 650K RAM, and obviously, Yachto recipes exist. So since I got my slides all out of order, we'll jump back here. I kind of buried the lead here. So we ended up choosing Fluent Bit for our solution with a custom output plugin that basically sends all the data back to our server side over in-band JSON. We do have, at the moment, we're actually pulling in more data than we're displaying and eventually we'll have the ability to, over the API, to actually pull the extra data out and do whatever post-processing you need. But since you can actually choose multiple output plugins for this thing, you could actually, if the data that we're pulling back to our server side and our solution isn't enough, you could actually deliver it straight out to whatever visualization solution you have, and that's exactly what we'll be implementing here in just a moment. So what do we have in terms of the proof of concept? We've got a custom distro and a Yachto image in a public Git repo that's just hosted on my GitHub. It's pretty straightforward. It adds Fluent Bit and the basic configuration of it, delivers data to an Elastic Search instance, and which is able to be visualized in Kibana. It's specifically not, this proof of concept is specifically designed not to be a horizon implementation, but it's very similar to what we have implemented in our horizon solution. And I also, as part of the Git repository, I have a Docker setup to allow you to actually easily run the server, which we'll look at here in a second. So image recipe for those that are familiar with Yachto metadata, you see how simple this is. All we're doing is taking the base core image and we're adding one recipe to it. So it's pretty straightforward to use. So in terms of how we've modified the Fluent Bit recipe, we've actually got a number of custom config files here. You see the highlighted line there. That's the one configuration point you'll need to do if you're wanting to replicate this environment. You just specify that fleet server URI in your config file. For my setup, I just use the IP address of the machine that I happen to be running. And it's pretty straightforward. This is the Docker compose file that I use for the server side. And you see that it defines two services. One is Elastic Search. That's the actual database, the time series database that the Fluent Bit agent is sending the data to. And then there's the Kibana instance, which is the visualization as well as the web front end. And it's communicating with the Elastic Search back end and pulling the data and being able to split. And just in terms of how you use it, here's the instructions here. You just add, this setup assumes that you have a working Yachto config for some piece of hardware. Doesn't really matter what. And then you just add the one extra layer to your system, specify that fleet server URI in your configuration, and then build that custom image that I generated. And in terms of the setup for the server, it's pretty straightforward, just run a single Docker compose command. Any system that happens to be able to run Docker containers can run this thing pretty straightforwardly. So they say, if your material isn't terribly interesting, just beef it up with some colors. So I figured before we get into the demo, we'll get a nice cloudy colored image here. So let's take a look here real quick at the demo. So we're on our server here. You see that this is actually running on my machine back in my home network. I've got the two services that you would have expected here. I happen to have two Kimu instances running, and we see that it periodically is collecting that information out of the configuration file and sending it back up to the server. So real quickly, let's take a look at some of these config files. So these are just text files that get added to the configuration and delivered. So if we want to take a look at the input CPU, this is where we're delivering information about CPU usage and that kind of thing. The host address is the server that it's going to. We're adding a little bit in the filter information. We're matching anything about the CPU, I'm adding the host name in as one of the fields of each of the records. Just gives us the ability in the Kibana interface to actually say, okay, which system is producing this particular information. So with that, we've got our two systems here and let's jump over here. So this will go right along with my mad diagramming skills, also my UIUX expertise here in this lovely interface. It's pretty straightforward. I've got a single visualization here based on CPU usage. It's running, pulling it out every minute or so. Let's go ahead and maximize this so we can see. We'll go ahead and refresh, see if it's able to pull in new data and if the demo gods are with me, we'll see this moving and you see it hasn't changed a whole lot, but just to make it interesting, we can come back over here and we can run our stress test. I guess that's, so we will run a 30 second stress test and so now we should, with any luck, see the CPU usage spike just as expected. So, you know, obviously, this isn't the most interesting display in the world, but we'll let that run for a second and we can also just, so we are aware of how we can add to this. We'll go ahead and create another visualization. In this case, we'll take a look at memory usage. So, in here, we're selecting anything that's got this board underscore star, regx, that's how fluent bit sends the information. So now we come down here and say what fields are we interested in? So we'll go ahead and say memory usage. We're gonna just grab this and drop it here. So now we're looking at a bar vertical stacked, which, since they're independent devices, I don't really want it stacked. I'm just gonna change it to a regular vertical bar. And then the other thing I wanna do, I wanna group it by the host name. So this is how we're able to see the devices up against each other. So we'll go ahead and select that and we save this and now we have added to our visualization. And similarly, if we, I believe, if I remember the syntax, right. What's that? I'm just learning this new, the stress test app, but so now we're starting to see some more memory usage. So that kind of gives you the idea. Kibana, I did wanna bring this up. Kibana, there's a lot of very interesting visualizations out there. I found this blog post, which I thought was pretty interesting. These are supposed to be live visualizations, but I clicked through a lot of them and they didn't actually seem to have any data in them. So the screenshots give you a better sense of some of the things you can do. So if you're just using Kibana to visualize this data, this is actually a really good site to kind of give you an idea of some of the different things you can look at. I think, yeah, when I looked at this earlier today, this was actually pulling live data from a Google Cloud compute situation, somebody's Google Cloud environment. So there's a lot of different things you can do directly with this and obviously you can embed these graphs in other systems and that kind of thing. So there's a lot of very useful information in there. And I think with that, we've got some time for questions before beer if I can figure out why this is not letting me maximize my screen. So, any questions? Kevin, evidently so. I didn't dig too much into the server side of things. Fluent bit is very, the recipe is already there and then those configuration files send data over in a format that Kibana was able to view directly. Not today, obviously in this proof of concept we're definitely not but the question is with Log Stash, you can obviously do a lot more than we're doing here and are we able to do pull things like system D logs and D messages and things like that. It's not something that we have implemented in Verizon. It's not something I have implemented in here but that certainly can be done as part of this. Once you've got the flow of data between the systems then adding more data to it is easy to do. Yeah, yeah and that's exactly what Log Stash has intended to provide from the research I've done on it. It's less time series data and more just log searching, sorting and filtering and that kind of thing. And there are lots of solutions for that and that certainly would be kind of the next step for enhancing something like this when you're implementing that in your system. Mender is working on implementing something like that. We're working on implementing something like that but at the moment it's early days we don't know exactly what it's gonna end up looking like. Yes. Yeah, so the question is what can you do to respond to situations that are detected by this and there are certainly additional packages that you can integrate with API access between the systems, you can always do that in terms of what we're gonna implement with Torizon. I'm sure there will be some sense of being able to respond to these things. For the purposes of what I was doing here it was really about the visualization so I wasn't trying to implement something like that but yeah, certainly you could do that. Once you have the data, the next question becomes what are you gonna do with it at that point? Right, so the question is what's the server side component of Fluentbit? And in this case it's the elastic search database. We're sending the data directly to the elastic search stack running on the server side and I would assume you can send it to just about any database with an appropriate output plugin and filtering and that kind of thing so they work together well but it's not necessarily required that they go together. There are lots of options that you can plug and play with. Okay, so for the recording the comment was that in a lot of cases Fluentbit is actually running on the server side and then the data's being delivered there and it can be then delivered on to other containers or other servers or that kind of thing so I guess that's where the historical where Fluentbit came from but obviously for the IoT space we're more interested in it running as a client side daemon on the devices in the field. Any other questions, Einstein? Yeah, so in terms of the way we're setting it up the metrics that you get, anything basically that you can pull out of the system. So if we take a look at these configuration files, we've got information about, let's see what's in the, about networking and these things are all kind of predefined as part of Fluentbit so we're not actually defining anything custom here. We're just pulling stock telemetry that is generic to all systems but you could write custom input plugins that would pull application specific information and that kind of thing and of course what Kevin was talking about in terms of the logging that opens up a lot more flexibility for you as well. Yeah, fair enough, that's a good point. Yeah, the point is yeah, so we've got the name here ES so instead of Kibana being able to read the Fluentbit data, we're actually using an elastic stack output plugin that's specific for elastic stack, yeah, absolutely. All right, very good, any other questions? How do we do on time, so we're a little bit early so there's a time for more beer. All right guys, thanks for coming out.