 All right, let's get started. Hi, my name is Andy LaPresto, I'm from Cloudera. I'm also on the Apache NIFI Project Management Committee. And I'm here to talk to you today about secure IoT command, control, and Xfill using the sub-project Apache Minify. So we'll go over a little bit of the agenda here. Quick introduction about myself, about Dataflow concepts in general to hopefully get everybody on the same page. Some of the terminology we use might be a little different if you're coming from a different domain. We'll go into very, very brief coverage of Apache NIFI to give you an understanding of what a core system looks like. We'll talk directly about Minify and IoT and Edge control. Unfortunately, I'm not going to do a live demo. I do usually bring a Pelican case that has a lot of loose wires that TSA really doesn't like. And I'm flying to four different cities on this trip, so I wanted to give myself a little bit of an easier trip. I do have a lot of videos embedded though, so we can see that. So let me get a quick sense of the audience familiarity with NIFI. This presentation is oriented to an audience that hasn't used the system before, isn't an expert on it. Some of this may be a review if you have already done that, but it should be pretty easy to level set across the entire audience so that we can all talk about the same concepts. So who am I? I am a Committer and a PMC member for Apache NIFI. At Cloudera, I work on our data in motion platform, and I manage the engineering security team. So out of all the dim engineers, I am the dimmest. I travel around presenting about NIFI, about data flow, at conferences, as well as visiting a lot of customers. I am an engineer, but I have the people skills, as evidenced by my friend here. I'm also working on introducing some embedded hardware and IoT stuff into a brewery, so that'll be a really nice case study example of the software. And it's used in something a little bit more interesting than our healthcare verticals and et cetera. So what is data flow? In general, very vague generic description, data flow is moving content from point A to point B, and the producers, things that create data, could be anything. The things that consume data could be anything. In a lot of cases, it's a storage system, or it's some kind of follow on system. It could be an individual user and a person. It could be another data flow system. The kind of data that we're producing and consuming doesn't really matter in this case. It's very arbitrary. It could be arbitrary bytes. It could be logs, telemetry, video, sensor systems, things like that. So moving data effectively is a really difficult challenge. You look at any particular example and say, I could write a Python script that does that. That's not a problem. But doing that in a repeatable, robust, secure, and manageable way is a challenge that's been around for a long time and is not going anywhere. As always, Randall Monroe sums it up better than I can. But building these bespoke systems, as soon as you get one piece of input that doesn't match what your expected requirements were, the entire thing falls apart. So I won't go through the entire slide. All of these are available online as well. I won't go through here and call out every single thing, but let's just take one from each category, right? So data formats. Handling JSON versus CSV versus XML versus Avro. How many people signed up for a software engineering career in order to write the same piece of code that converts between those formats every day for some new system that's going to come on, right? For infrastructure, I am a security person. One of my biggest concerns is implementing and enforcing security throughout our organization. But at the same time, I recognize there are times where you as an engineer feel, I actually have to overcome the security requirements that my company's leveraging on me, right? They're getting in the way of my deliverable. And if the only conversation there is, we need you to be at 90% of this checklist and you're at 89 and we're gonna block you for six months, that doesn't benefit anybody, right? So overcoming security, having tools that take that into consideration, put you a lot farther on that path already. And then finally, even if we solved all the problems in the first two columns, all the data problems, all the infrastructure problems, we have people problems, right? We have consumers that change. So imagine that you are responsible for ingesting all this kind of data from IoT sensors and feed it into your data science team. And then you get a new CTO and they come in and say, hey, we're gonna reorganize the company. Instead of having this one Ivory Tower data science team, I'm gonna embed a data scientist with every product team throughout the organization. Okay, great. So now we have to send it out to 15 teams instead of just one. And then six months later, new CTO and they say, why do we have just one data scientist in this team? Let's bring all of them together from across the organization. We'll have cross cutting, learnings and knowledge transfer and all this. And I'm gonna make one data science team that's separate from everybody else. So now you gotta go back and send all the data to them. If you're doing this with bespoke scripts and tools, it's really hard not only to update that all the time but also to know what's happening at any given point in time. Where am I sending this data? Who is consuming it? Is anybody consuming it? With a tool that builds that into the framework, you have much better visibility and you have much better control over that process. Anybody in here ever finished a project with the same requirements that they had when they started? If so, I would like to work for your company cuz that's something I have not found yet. Being able to react to modified requirements in a quick manner but also in an intentional and understood manner is really important. So let's talk about Apache NIFI. I've talked about a lot of problems but let's talk about a solution here. So quick one minute introduction to the NIFI ecosystem. Apache NIFI is a server or data center class application. This is a big piece of software, runs on heavy machinery, usually in your data center or regional data center. It provides a graphic user interface, it provides a REST API, and it interacts with hundreds of different services. And we'll get into a little bit more of that in a second. So I would assume people at this conference, people in the room, are okay with this kind of interface. But a lot of the people who have the knowledge and the domain specific knowledge about moving data, understanding what the data represents and how it needs to be used, aren't comfortable with this. So we provide this. It's a graphic user interface. It allows for real time command and control, monitoring, understanding what's actually happening, and then also building out data flows without having to have development experience. So what are the benefits for the end users? Like I just said, they can build a flow visually, right? Dragging boxes onto a canvas and then connecting them with one line makes a lot more sense than writing two different classes, exposing an API, having a martial or serialized data between the two. When the framework can handle that automatically, you get a lot better experience in building those flows and in monitoring those flows. So again, we're reducing the barrier of entry for new users and we're providing a quote unquote no code solution. I put that in quotes because it is extremely extensible. If you're comfortable writing code, if you wanna do something that's not handled by the framework out of the box, we have a number of ways for you to expand that functionality very, very easily. Again, not gonna go through the whole slide, this is available online. But one of the ones I do wanna call out here is back pressure. I think if you've been at any of the other data flow stream, stream processing conversations around this week, back pressure is a concept that gets talked about a lot but maybe isn't explained all the time. So I just wanna give our definition. When you have these components that pull data in a linear format, each one has its own performance characteristics, right? And so you may be pulling in millions of pieces of data and the piece that pulls that in is just copying bytes from A to B. That's pretty fast. But then you have some kind of system that's doing fingerprinting or merging or conversion and that component doesn't run as quickly. What can happen is that you'll get a backup of data leading into that system and at some point we all have hardware requirements, we all have a cap on what we can perform. And that data can get dropped, right? Either cues fill up, actual file system fills up, network can't keep up, whatever it is. Our use of back pressure here is a feature which propagates that information up through your directed graph, okay? So if I have a component at the end of my flow that can't keep up with the ingest rather than just losing data in the queue between those two things, it will send a signal back up the system. And the preceding component will say, okay, I might want to yield administratively for a little bit to let that other system catch up with the data that is in the backlog. And it can do that all the way through the entire graph, up to the boundary of your system. So you might have components that have a lot more processing capability and they can detect and say, okay, this queue is full. I don't want to send data there right now. I'm gonna tell whoever sending data to me to pause for a second. Eventually, when you reach the edge of your control system, you might not be able to affect what happens outside of that. But you can at least make informed decisions given real time information about what's happening on the system. So we have this really deep ecosystem integration. I like to say anything that's been on the Apache homepage in the last few years we have a connector for. That said, this is also very extensible. Somebody in the room probably has a new project they wanna get on to to be a top level project today. We can write a new processor or a new family of processors that interacts with that. We follow a lot of very standard interaction principles. But we can also tailor that to things that are, I think maybe I don't have Heron on here. Heron is another system or Pulsar, right? Where you have Kafka, Pulsar is a little bit different. We can write something that interacts with Pulsar and drop it in. And now anybody who has access to the system can drag that processor onto their canvas. They don't have to know the different messaging, publishing, subscription model that's used underneath. They just get this very nice uniform user experience. So I'm actually not sure if the audio is gonna come through on this. People wanna know about their data. No, no audio, okay. So that's a clip from Portlandia where they're at a restaurant and as most annoying hipsters do, going to far too many questions about the origin of their chicken and how the chicken was raised and how many friends the chicken had and all this kind of stuff. We have the same concerns. We just care about data as opposed to chicken. So understanding the origin, the transformations, the path that your data takes, and all the metadata around that. The time that each operation occurred, who performed that operation? Who changed the configuration around that operation? All of this is given to you in a very easily consumable, both graphically and data that you can analyze later format by the data provenance feature of the system. When we provide that information, it actually enables a lot of new and interesting use cases that people hadn't considered before. I had an intern a couple of years ago write a machine learning model using k nearest neighbors that could do anomaly detection on just the provenance data, right? So any data coming through the system was very customized, very unique. It probably wouldn't translate to a different flow because I'm using CSV, you're using XML, I'm looking at images, you're looking at video. Those features don't compare cleanly across, right? But the provenance data, what's happening to it? How long that's taking? Where it's happening? What kind of processor is doing that? That data is very standardized. And because of that, using k nearest neighbors, he was able to do anomaly detection and actually detect when flows were starting to fail. And that indicated early field failure detection of hardware problems. So he was able to avoid having a server outage because a hard disk failed. Because you could detect when read time started to increase. So we're in San Diego, I'm gonna use healthcare applications as an example for this particular presentation. But this goes and applies across all different verticals. In healthcare organizations, especially in hospitals, there is a lot of disparate data. Every medical device outputs data in sometimes its own format. But if they have agreed on a standard, it might be one from 25 years ago. So pulling all this data in, in these various formats. And then trying to make sense of it, even when the data is coming from two devices that are side by side, is extremely challenging. And like we talked about earlier, nobody got into this field to write an XML to JSON parser 17 times over the course of their career. Having the framework be able to do this automatically and pull in from legacy sources, add enrichment around that data and then protect the data at the same time, makes it very feasible for you to write more advanced applications or logic on top of that that are actually doing what you are interested in. So again, converting those formats, doing all that. This is the dialogue you would see. You have to fill out three pieces of information. The record reader is telling you, okay, the source is an XML. I want the output in CSV. And if I have a flow file where I've been waiting for a certain amount of time and nothing's come through, go ahead and publish that with no records inside. That's a binary switch. This is doing automatic schema inference on the XML data that comes in. So XML is a fairly descriptive format. Each element has some kind of name. It's pulling all that out, detecting that on each piece of data that comes in, and then writing a verbose CSV where the header has all of those fields. If the columns change or the values change between two flow files, it's okay, it can handle that. It doesn't worry that, now you're missing the 37th element that the last flow file had, I have to break now. It goes, okay, it's not here. But because it's CSV, I just won't have that column. You still know what each piece of data was. Pulling from external sources, right? This is all you need to configure in order to pull from an Azure SQL server database. You're setting up a JDBC connector, you're providing credentials, and then you're able to do whatever SQL queries you want. But you can do the same thing on an FTP server or an HTTP endpoint. It doesn't matter what the ingest point is, you're getting this standardized representation of the data flowing through the system. Enriching data as well, this is when it actually comes up quite a bit. If you're in a hospital, you have a bedside monitor, that monitor is pulling out all of your bio information and vitals. That information is very relevant to you, the individual. But it doesn't have all of your personal information there as well, right? The heart rate monitor doesn't know your admissions history and all of this kind of stuff. So it's sending out a continuous stream of data that's very specific and granular, but it doesn't have all of the enrichment information necessary to make informed decisions about that. Maybe it just has a unique identifier that represents my bed in the hospital or my patient record. Now if I can enrich that data in the stream with pulling from a database, pulling from some other kind of data store. So now what I'm actually seeing as a healthcare professional, a doctor, a nurse, I'm getting the combination of those pieces of data so that I can make a much better decision. And then finally data protection, right? How many people when they're writing the Python script or the shell script to move from this database to this HDFS store are building encryption into that or doing transparent AES encryption or the data as it flows through the system. So we've covered the foundation of Dataflow and the concepts around NIFI. Let's talk about Minify and the IoT and the Edge. So some of the challenges around Internet of Things, right? Restricted software library and platform availability. I think everybody at ELC understands this is a big concern, right? This is usually a slide I have to explain to people, but I think that this audience understands what's happening here. You can see on the right, the Roomba is having trouble. The security camera is having trouble. The Roomba pulled out the cable that goes to the security camera, right? So the Internet of Things has this promise to make our lives easier and it's not quite delivering on all of that just yet. So let's talk about the second column in the NIFI ecosystem now. Minify, where NIFI is this server data center level application. Minify is designed as an agent and it runs on minimal hardware. I mean, you can run on a server certainly, but it only has minimal hardware and software requirements. And it allows you to extend your reach out into systems that you wouldn't want to or it doesn't make sense to deploy NIFI on. So the core takeaway is getting the key parts of NIFI out to the edge where the data begins, and getting as close as possible to the origin of that data, right? We're expanding our reach but we're also expanding our visibility. NIFI is in the data center, we've established that. Putting Minify as close as possible to where the data starts, which means embedded hardware, connected vehicles, factory systems on a floor. Putting that out as early as possible expands a lot of the capabilities around what we can do with that data and how much we trust that data. So why build it in the first place? Well, NIFI is big. It's over a gig just compressed to download and then deploy on your systems. And you can modify it in order to run in a restrictive environment, but it takes a lot of manual effort to do that, right? You're pulling up, I probably don't need to interact with Hive, so I'm gonna drop the Hive bundle, all that kind of stuff. Minify is already developed in order to do that for you. It doesn't provide a UI, it doesn't provide an exposed API, but it is orders of magnitude smaller. I won't call anybody out, but sometimes I talk about this and I say, yeah, a couple versions will fit on a floppy disk and people will say, what's a floppy disk? As I saw on Twitter, somebody 3D printed the save icon. That's what a kid said about it. It's a good guess. We wanted to run on systems that weren't designed to be running a Dataflow system. So we talk about flavors of Minify just because obviously both C++ and Java have continuing version development. Minified Java is a modified version of the original NIFI. It does not have a UI, it does not have an exposed API. It uses a YAML configuration, which is much more human readable, and so you don't have to pull back to the core system to modify that in place. And it comes with a limited set of processors, but that is expandable because it's running on the JVM, and you can drop any NAR that works in NIFI into the Minify instance and have that run as well. Minify C++ is actually, it's in parallel development right now, but it's the successor to Minify Java. The goal was always to have C++ be the actual version that's running. One, because of the size difference, right? Again, another order of magnitude smaller than the Java version. It doesn't require the JVM to run. It is performant on some limited systems. And it does still provide this bi-directional control and data plane back to your core NIFI systems. So this is just a graphical representation of what we did to take NIFI with the user interface, the high framework, the large number of components, and trim that down and make a minified Java instance. But again, C++ is the future. It's the way that the project is moving, and it's gonna be more applicable across the scope of applications. Out of the box, it comes with 14 processors. And that doesn't really sound like a lot. But when you look at what the processors do, they're very focused on the core use cases that the minify agents often perform, right? Getting specific information about the host that's running. Getting information from a file or a TCP connection. And then performing some generic activities around that and then transmitting back to a core system. This is what you want your agents doing, right? Low latency operations, small, simple event processing. But then you can also expand this to run the Java processors if you so choose using JNI. So with the Java processors, you have a lot more integration. You have Kafka and MQTT and things like that. You have even the ability to run TensorFlow models. So I have a colleague who will sit up here with his Raspberry Pi's, a Sony webcam pointed at him during the presentation. And that little device is doing image recognition, facial recognition on him the entire time. And putting live video to the screen, all on pieces of hardware that costs, if you really build it out, 150 bucks. That's using TensorFlow at the edge. The biggest problem with it is that it overheats and the S actually stand away from it for a second. But it's still performing, it's still doing all of that on a disconnected system that doesn't need server class hardware to run. And then finally, if the Java and C++ aren't enough for you, you can write custom Python processors and run that in the embedded Python handler there. So how to minify and NIFI interact with each other, right? We talked about that site to site, that bi-directional communication between control planes sending instructions from a data center or a centralized system out to the edge. And the data plane sending data back from the minify instance. But the point I wanna really make here is if you look at the yellow and blue bar at the bottom, it's not a clean delineation. It's not a black and white. This is when you use minify, this is when you use NIFI. There are times you could put NIFI at the edge. If you're a large electronics retailer and you have stores all over the world, you probably don't wanna have your credit card readers talking to a data center in California every time that they do a swipe, right? That transaction time is unreasonable. So you can put a regional data center, a computer in the store, and then have all the credit card readers in that store talk to that computer and have that computer talk back to a data center, right? You're distributing to these regional data centers. Running NIFI on a laptop or a computer is perfectly fine. Minify is built for the edge, right? So running on embedded hardware, connected vehicles, things like that. You can also run a minify agent on a data center node. So if you have a large Hadoop cluster, you have a large compute cluster. You might wanna collect pieces of information from each one of those clusters, tailing certain logs, getting certain information about memory usage, really building out instrumentation around that. But you don't wanna install NIFI on every single one of those machines. That's a complete waste of the software, of the performance, like you're gonna impact your compute. But you can put a six meg C++ application on there that 500K in memory when it's running and saying, okay, yeah, I just need to tail these logs and send that data back over a connection. So it is a gradual change. It's not a clear delineation of you always do NIFI on this or you always do minify on that. And then what does minify actually provide at the edge? So data tagging and provenance, extending that reach of provenance is really critical. This is a Qualcomm chipset that we used on a connected car implementation. And it has two radios, one is LTE, one is Wi-Fi. When you're driving this around and you wanna offload data, right? I'm sampling the CAN bus a thousand times a second. I'm getting this information, I'm getting steering wheel orientation, brake temperature, engine temperature, oil pressure, all this kind of stuff. That might be really important information and I need to send that back as soon as I get it in. So if I'm in range of a Wi-Fi connection, I'll send it over that. If I'm not, I need to send it over the LTE radio, right? I don't care that it's more expensive both in power and cost. But if I'm sampling the radio a thousand times a second, I don't know about you, I don't like every channel, but I don't need to change it more than a couple of times a second, right? So I don't need to have all that sampling happening at the same time. One, I can batch that up and only send it back over Wi-Fi when I'm connected, so I'm saving time and money. But the other thing is, I can drop a lot of that, like I say, I only need a 0.1% sample rate on that. I don't need a thousand times a second, I need one time a second. Making those low latency decisions allows you to save a lot of money, a lot of power in doing these operations at the edge. We have a use case from an oil company. They have rigs all out all over the world. And I think they save something like $2 million a year just by putting NiFi on an old Dell on the rig and sampling information and making some decisions there rather than sending it over the satellite communications all the time. So it really is impactful quickly when you can do that. Security and regulatory compliance, right? If you're collecting GPS data in China, that data cannot leave the control of a computer within the Chinese geopolitical boundaries. So you're working for a large car manufacturer and you're collecting GPS data out of all the vehicles. Well, you can either take a flyer and hope that the AWS instance you're sending that to happens to be in China today. Or you can have this thing at the edge that's making a determination of, okay, am I currently in China? Yes, then either I need to send it to a specific server and make sure that's siloed or I need to scrub this data from the response that I'm gonna send before I ever let this go out. So, quote that I really like from Oleg Shilajep is the S in IoT stands for security, right? A lot of these platforms, you can walk around the conference floor. There's a big question mark between we get the data and then you have the data somewhere in the cloud. And how does that happen? We wave our hands a little bit and it just does. With Minify, we have very strong controls around how that data is transmitted. Site to site is a, it's not a proprietary, probably it's open source. But it's a protocol implemented by NIFI and Minify that allow for bi-directional communication, handling back pressure across those boundaries. Handling provenance transmit across the boundaries. And originally it was done with raw socket, but people deploying in AWS didn't like opening up additional firewall ports. So there's an HTTPS implementation as well. And then it's secured with mutual authentication TLS. There's a heartbeat as well when you deploy these agents that can tell the flow that's deployed, the version of the flow, the last update. Certain instrumentation data around the hardware that's running it. And then you can also choose to do any arbitrary protocol that you want. Cuz NIFI and Minify both handle those. So you can go over HTTP, you can go over MQTT, FTP, JMS, whatever you desire, whatever integrates well with whatever you're working with. So we're getting close on time. I'm gonna run through some of these applications real quick again. So healthcare applications, this one I think is really critical. Recording these sensor readings every second and saving them in a time series database. So influx is what I use for this one. It's great to maintain for historical analysis and aggregation, right? It's great to have these morbidity mortality reviews and go, we actually noticed over the course of three years that when patients with this condition were given this medication, it raised their heart rate by this much or something like that. And it can give you great insight on a large data set in order to make some determinations that you can't make on a sample of one. The problem is, making that decision three years from now doesn't really save me when I'm in the hospital bed right now, right? So the combination of historical analysis as well as immediate alerting is really, really critical. This is about legacy and yeah. Encrypting content, so your edge data protection, right? Being able to encrypt the data as soon as it's created and have that be trustable both in storing locally and transmitting back across some boundary, right? So you can use symmetric encryption obviously, but you can use asymmetric encryption and say, every deployed agent has the public key of my server. They can encrypt the data immediately, but it can't be recovered off of that device. It can only be decrypted when it's back in the data center. And you can do this, you can have multiple keys installed. You can do this for multiple different severities or levels of sensitivity around your data. Encrypt them in different ways. So again, no live demo today, but I will do some deep dives into, you guys vote. Credit card reader or hospital monitor simulator? Hospital monitor, all right, so here we go. So this is using a sense hat, which is a pretty standard Raspberry Pi add on. And the inky P hat, which is an e-paper display. So that's what you see here with the red and black text. It's a three color display, it's super low power. So it takes a little while to refresh the display, but then that display takes no power to stay on. You can even detach it from a power system and it'll keep that up there for days. So this is my flow that I built in NiFi. It looks very simple right now. The three large boxes are processor groups. And it's just an abstraction layer around logical combinations of data, or data flow. So on the three components here on the far left is displaying it on the inky P hat. In the center is actually doing the sense hat data processing. And on the right is collecting that data from the sense hat. So reading from the sense hat is the first step. Just for anybody who's not familiar, the sense hat is an 8 by 8 LED display that also has temperature, pressure, and humidity sensor built in. So it produces output, in this case, in simple JSON, because I wrote a small Python script to do that. So reading from a sensor over I2C, and then putting that out as a JSON flat file. And then I can actually transmit that to NiFi via site to site. So there are four boxes here. Really, the only one that you absolutely need is the one on, sorry, can you see my, the top left read from sense hat. That's running a process that just calls a Python script, gets a piece of JSON data as a response. The log attribute and the generate test data are testing processors I put in just for building the flow. And then that blue box on the bottom is when it sends it back over the site to site protocol to NiFi. For processing this data, I'm ingesting that JSON. I'm extracting certain elements to be flow file attributes. And then I'm performing temperature conversion. So I'm converting from Celsius to Fahrenheit, vice versa. And then this is my display data flow that's running on the Raspberry Pi that's connected to the inky p hat. And so when I first started, I was running both of these on a single Raspberry Pi, so it's collecting data and displaying that data at the same time. As I built out the demo to show remote communication and etc., I actually had multiple Raspberry Pi's each with one or more of those sensors and display units. And so I can say, I'm collecting data from the back of the room, displaying it here. I'm collecting data here. I'm also displaying it here. It's a mix and match what feature you want. So in this case, it actually is down throttling, taking a 10% sample of the data that comes through. Just because refreshing that ePaper display takes so long that if it was doing it every second, you'd never actually would get a display to come through. This is all the code I had to write. This is a pretty simple Python script that just translates from, hey, display this text into the actual code to interact with the ePaper display. And you can see what that looks like when it's hooked up. And then how I was able to put it into a Grafana dashboard and get all of my insight and executive level speak from that. Got just a couple seconds more. This was decided to run on the Raspberry Pi, run on a MacBook Pro, and then run on the TV in the hotel room. So I feel like if you don't want that to be used, you should lock it down a little bit more. Finally, we're talking about Cloudera Edgeflow Management. I've got a minute left. They paid for me to be here, so I should probably talk about it a little bit. It is an open source application as well. So this command and control API that we've talked about is Apache License V2. It's exposed, there's a reference implementation of a generic C2 server on our GitHub page. Cloudera also makes an application called Edgeflow Management, which is also open source and provides for a much more focused interaction with building and deploying your flows across a large number of agents. And actually splitting those up into agent classes based on the capabilities that they have. So when I build this flow and I want it to be deployed, I don't have to say put it on machine 172. I can say push this definition of a flow out to any machine that has these capabilities, right? Or is in this unit of organization, this class. Heartbeat monitoring as well. So I mentioned that briefly a little bit earlier. But I can get immediate status on all the agents that I have deployed and what kind of information they have, what data flow they're running, what they're processing at the same time. And then I can even run my machine learning models at the edge. So in this case, it's a PMML file. And I'm running that on my Raspberry Pi in this case. I'm still collecting data back to the core systems. So I can have my model being trained continuously on heavy machinery, like large cluster in my data center. And then as I get incremental threshold increases in the performance of that model or the correctness of that model, I can deploy that new definition out to the edge and still have that model run at the most remote instances and get very, very low latency, high quality decision making based on that. So to summarize, what does Minify provide? We know that repeatedly and intelligently routing data is difficult. Minify allows non-developers to build data flows using their domain expertise, move that out to the edge and increase their visibility and their control over the data, and then open up new use cases and opportunities based on the ease of use and the lack of having to write those tools over and over around this. So I am happy to take questions. I know we're getting close to the end of our time and if you have other obligations, I totally understand. But I will stick around outside and take questions. This QR code should take you directly to the slides that you just saw. So you can download those and peruse at your leisure. Thank you very much. No questions? Yeah, please. We can? We don't have to. It depends on what use case you have. Do you have a specific? Yeah. OK. So the question's about being able to, in this case, I think, ephemerally persist data because you don't necessarily have network connectivity, right? Yes, so out of the box, that is completely handled. If your site-to-site connection goes down, it buffers locally. And you set a threshold for how much data you want to store. And then you also set a strategy for how should I age out that data if I reach a threshold, right? I can say first in, first out. I can say prioritize these attributes. I can set up multiple queues and say kill this one first, kill that one. And that's all done through the UI when you build the flow in that NIFI experience. Yeah, that was a great question. Yes? Yeah, so that's a great question about compatibility, I think, of versioning between multiple agents, also against the core. So NIFI and Minify follow semantic versioning. So we are extremely strict about bug fixes, fixed bugs. Minor releases do not change backward compatibility. They are always backward compatible within a major release. And then major releases can change backward. Do not necessarily conform to backward compatibility and can change exposed APIs. If you go and look at the code, we are very, very careful about splitting what we consider an API versus what we consider an implementation. And the implementations are always pluggable. You can write as many as you want. You can switch out which implementation you're using. The API will say standard and consistent. As well, if I go back a couple of slides and you look at the heartbeats, you can see here, this will actually tell you what version of the software is running in each agent instance. And so you know that any version of the Minify agent that's been released so far works with NIFI 1.x. So the last five years, I think, basically. The API is going to be the same across both of those. So the site-to-site protocol, you don't have to worry about versioning on that. And then if I want to push a flow out and it can only go to agents that have XYZ capability or XYZ version, I can basically do like it's a filter. It's similar to a SQL query here. And say, I only want to push this out to things that are running Minify 0.6.0 and have a USB camera connected like that. And you can see that at a glance. And you can make decisions about what you want to push out because of those features. Yeah, you got another question? Yes, so you could definitely. The question is about can I pick which agents to update based on some kind of signal or flag, right? Yes, so you can absolutely do that. And you can say, yeah, in your case, I have this class of hardware. And I have to push a hot fix to that. I can immediately, from the heart beating, get back, where do I have those agents installed? Which one of those are on that kind of hardware? And I can target those and say, the next time it heartbeats, I want you to inform it. There's a new version of the flow. You need to download that and run that. So you can push that manually. You can also use a subscribe methodology and say, next time you check in, get this new update. Yep. All right, I think we're cutting into the next person in the setup time. So I appreciate everybody being here. And I will be outside if you have additional questions. Thank you.