 All right, I think I'll keep us on on track today. So hello everyone. I'm Michael Maxi. I go by my last name 4.5 million mics in America if you believe Wikipedia probably a couple in the room So you can call me by my last name I'm one of the LF edge board members So I work as part of the Linux Foundation edge group I also work for Zedita and today I have the privilege of presenting one of our customer case studies So we'll be talking a little bit about Kubernetes running at the well site What are some of the challenges we've seen as we've sort of scaled our customers and then some suggestions around those challenges? So hopefully you can leave us some ideas on you know how to scale in your environment So I do plan to leave some questions some time for questions at the end So hopefully we get some and with that all I'm going to kick off. So If you'd asked me about a year and a half ago What sort of the the life cycle of an oil and gas field looks like I'd say it's probably something like this where you're out shooting for some food and like Oil blows out of the ground you buy a nice big mansion with a cement pond and you're off off and running Reality is it's something more along these lines and this is a chart from I believe Penn State University And this shows sort of the life cycle of an oil well or an oil reservation So what on the front end of that cycle is discovery? So this is when you first find the well There's some testing done you do some initial drilling and then you get that first ramp up to that green bar Which is really full production the green section is really Gated by bandwidth of the downstream services So it could be the size of the pipe flowing out of the well It could be the the refinery downstream, but basically you're gated at a point Which is white sort of flat lines in that green section Until a point where the well starts to deplete and then it goes into this yellow section where you start to sort of be able To pump less out of the oil well at a particular time So we actually have customers that sort of operate in both sets of those environments In the green area. So this is in the reservoir management area Even though you're often Gated by the downstream refinery or the the pipes themselves. There is real work to be done The way it works is as you extract oil out of the ground you need to replace that it's often done with methane gas So we're working with a partner that's using edge compute to really do two things in this production cycle They have a computer vision cameras looking at sort of the burn off on the top of an oil well So if you've ever driven by an oil well, and you see a flame coming out of the top of it What that is is methane gas burning off They pump methane gas into the ground to force the oil out, but if you pump too much You have to burn it off bad for the environment. It's expensive, etc If you don't pump enough then the oil doesn't fell fast enough So getting that that ratio correct is really challenging So we've been working with customers where they're using a combination of sensors from The well itself as well as the pipes and computer vision to sort of look at that flame and optimize methane pressure All right, so that's a way that you can sort of keep that green line at the top and continue to push oil through the system So that's one example a second example, which I'm going to spend a little bit more time on today is in that decline part of the Well life cycle, so that's the yellow bar you see on this slide There are a number of sort of services companies that have really optimized this process like how do you extend that yellow? How do you how do you pour? Extend the time that it you can extract and they do a lot of sort of custom work in the well So in this particular example, we'll be talking about This service provider drives a truck out to the well so the well sites are wherever they may be remote, Texas In a lot of cases they drop a probe down into the well itself It's about a six-foot metal probe that does seismic and radioactive and all sorts of sampling and testing To pull data out of that particular well so they can look at the shape of the ground underneath and all the details around it And then they process that and give you or give the operator Best practices it could be you know drill it a different spot. It could be increased pressure There's a lot of different things and techniques you can use to sort of extend the life of that yellow bar But they're doing that real time on the trucks And the technology stacks look something like this so they're running an edge server think of it is sort of a Looks like a modem in your house, but it's fundamentally in this example It's a pretty high-powered z on box with GPUs in it and on top of that. They're running a couple software stacks At the bottom of that stack is Eve. I'll talk a little bit about Eve in a minute It's an open-source project But on top of that they have really two main applications running behind a firewall The first application is that legacy probe so I talked about the six-foot probe they put down into the well That operates off a very old version of Windows It's not something they're containerizing So that's something that's you know, we see pretty common in edge environments is a lot of legacy not everything's been containerized I think they're on a path to containerization But you know that that virtual machine running Windows written by the guy in the 90s everyone's afraid to touch is real It's out there and you have to run that alongside That's a pretty good example of that That probe streams about a terabyte of data per day into a kubernetes cluster It's a k3s cluster where they're running some custom algorithms and some custom data transformations to Produce insight for the operators. So all of that's done locally on these trucks that roll out to the oil wells You know historically they've done this offline So they would bring the data back to the to the depot They'd upload it to the cloud or to the private data center They do an analytics there and then when the truck rolled out again roughly a month later They could then do the optimizations So being able to do it like on the truck at the well side has brought them a ton of value and really allows them to kind of move Faster by running this mixed workload inside inside this truck We see a lot of firewalls so commercial firewalls running on edge devices as well as SD-WAN People generally want to infer data reduce data something to that effect and then push it back to a cloud instance or a private data center So being able to manage all of those things at scale is is a pretty common Use case for for an edge compute platform like this I Mentioned Eve so Eve stands for edge virtualization engine. It's an open source project. It lives in the LF edge foundation It's Apache licensed It is a Bare-metal operating system really designed for edge workloads, so it runs on a really light footprint Think about a half a gig of memory in our virtual CPU And it has a bunch of characteristics really designed for these types of workloads. The first is It's it's runs a couple partitions So when you do upgrades or when you change anything on the stack, you can always fail back to a known Well state very important when your device is in the middle of nowhere and you don't want to send an IT person out there It also includes an embedded hypervisor so you can run all types of workloads including kubernetes clusters And it has a really good security story for devices outside the data center We'll talk a little bit about some of the challenges around Scaling kubernetes in these types of environments So in general this customer has close to a thousand of these garbage-looking trucks As I mentioned they roll out in pairs to different oil wells, so they travel around quite a bit So they're running this cluster or that sample I showed you two slides ago They're running that at scale across a bunch of different Trucks rolling out to remote environments and as we've done Projects with this customer as well as some others we've we've sort of found you know sort of six or more Challenges that you need to think about need to address as you're thinking about scaling out to these types of Locations in these types of small clusters And we'll talk a little bit about each of them and some suggestions around each of those as we go through this So the first one is distribution and features You know so often we're seeing one node deployments, so running kubernetes on a single node You also support clustered nodes so two and three as well, but you know this particular customer runs on a single node And as such, you know you can't run the entire CNCF ecosystem, right? Auto-scaling means less when you're running on a single server So fortunately there's been a lot of work done around smaller distributions We heard red hat this morning talking about Micro shift there's k3s mini cube k0s. I'm sure there's many that I'm missing But there are a number of distributions that you can leverage which will really help with this The thing I would recommend is more than just choosing the right distribution You have to think about the the features and services that come with it, right? If your application relies on a particular service you're running Istio as we saw in the previous example You want to make sure this particular distribution either has that or has the capabilities to do that So more than just selecting the right distribution You have to think about your workload and your application and is it actually going to run well in these environments But there's a lot of flexibility and scalability across these sort of mini distributions and a bunch of flexibility to choose across different ones The second challenge is been mentioned by I think every presenter so far, which is network suck Especially at the edge So you need to think about like how do you how do you administer that cluster, right? How do you deal with updates? And we'll talk a minute about sort of people costs Certainly you can send people to these sites, but it's usually a helicopter ride and pretty expensive So you need to be able to operate in a model that really supports these distributed networks that aren't there Beyond unreliable. They're often Not made public to the internet These companies don't want to put their assets on the internet, right? So you have to deal with egress proxies or deal with VPN connections or have some sort of way to get into these devices Through either standard networking or through a system sort of built for it You know the techniques we've seen that work well eventual consistency model so rather than a command and control architecture where you're trying to log into a node that doesn't have a Network connection you have the node be more autonomous pull-down configuration and operate on its own So that's a pattern. We've seen it's Implemented as part of Eve, but is also a pattern you've seen in networking and lots of other sort of verticals around Troubled connectivity and a pretty well-run Way to operate your system Alternatives are you know, you sort of bake the system and leave it You know if you Plan truck rolls as they're called in this industry, which is when you send an IT guy out once a quarter or once every six months To update your software. That's an approach. It works. You can do the same with Kubernetes We've also seen local orchestrators So beating the network by being on the network but this also Requires someone that can operate it that can use that orchestration can push new workloads. So that Sometimes bounces against headcount costs and people costs, but there are techniques. It is possible, but just keep in mind You're not going to necessarily be able to Cube connect to all of these clusters when you want to so you have to think about sort of networking a Third is security threat vectors. So these devices are often in the wild. They're running in retail stores. They're running in Dealer car dealerships oil wells they go missing people plug us bees into them So beyond just the Kubernetes cluster and sort of network access to the culture You have to think about the physical device, you know, are you turning off all the ports? Are you making sure your OS boots first if someone boots in front of you and Hits the TPM like bad things can happen So you want to think about that threat vectors beyond the data center you taking these out You know, I have a long, you know longer have to arm guards those sorts of things So you need to you know establish the device as your device using hardware root of trust You need to shut off all the ports shut off all the connectivity You know and then on top of that you want to do the stuff you do in the data center as well So you're going to want to run an advanced firewall Potentially AI to monitor the workflow and is it changing is somebody sort of hacking into your runtime so there is a bit of a security Expansion you need to think about but you can also bring a bunch of the techniques You're already doing from your data center or cloud into that environment So just some more areas to think about like theft of devices physical access Turning off ports connectivity all those sorts of things are really important when you're out on the edge Interoperability and hardware diversity, you know, so Lots of different stuff out there Everything from Raspberry Pi's up to pretty powerful servers running in a sort of an explosion proof box Diversity also includes networks. So you're going to need to support LTE Maybe fail over the satellite depending on your connectivity so having a Distribution and operating system that can cover all of that diversity super important It also allows, you know The end users to select the right hardware So if it costs as a component which always is a good place to optimize this hardware as long as your operating system And cluster can run on top of that And then in terms of software diversity You know edge continues to grow it continues to be an exploding market, but it's super bespoke everybody has their favorite stack Everybody has their favorite database or data technology So being able to handle a lot of interoperability is a really important piece of the solution as well So think about like what do you want to do today? How's that going to evolve over time? What new potential applications are going to come in top of that? Because these devices live roughly 10 years in the field you deploy this piece of hardware It's going to be there for 10 years So you're going to want to think about your software lifecycle beyond You know 12 months or beyond sort of the standard two to three years You really want to think about a long life cycle of diversity Which can also lead to sort of over provisioning So we've seen customers kind of buy a much bigger system than they probably need today Knowing that you know 10 years from now it's still going to be out there and still going to need to operate In terms of the fifth challenge so people people are always the worst part of a system Most of the mistakes happen there, but beyond that you know Operating kubernetes and having that expertise is it's a sought after technology right people want to hire these people They're expensive They're especially expensive when you ask them to go to remote Texas not everybody wants to live in remote Texas or in The Middle East so People costs can be huge So anything you can do to prevent a truck roll to prevent a human from having to go to this edge site Can save you a lot of money You've seen techniques Things like measured boot so very similar to secure boot But measured boot allows you to sort of act act on the device if something bad shows up. It doesn't break the device You know canary upgrades doing backup and recovery things you'd expect But anything you can really to prevent people from having to go to this edge site because that's the number one cost In edge computing is really the people side of this So there's a lot of techniques you can do around that kubernetes obviously has a great tool set to help with that as well So as you sort of move up stack into kubernetes You know these techniques are pretty well known and pretty well Managed at this point. So that's why kubernetes is a good fit And then sort of my last challenge is probably the hardest which is You know the the kubernetes is pretty well designed to support a single cluster with a thousand nodes or more But when you have thousands of clusters with one or two nodes, it's a very different management paradigm Um, and you know, I I built this chart to sort of help guide folks on which direction to run Um, you know, I think to some extent Uh, it depends on like how deep into kubernetes. Are you using it and and what is the skill set in your shop? If you have um good expertise in your shop and you have you're using some core features of kubernetes Then a small k3s instance on the edge probably makes sense and it scales just fine to a certain point where The orchestrators start to schedule and that can be You know As small as less than a thousand sites where these orchestrators start to start to have issues So there is a pure kubernetes approach Works really well if you have the right the right sort of team inside You need the right features and you have the right scale Um, the other approach sort of the green area on this chart Is um leveraging kubernetes, maybe for cloud, but you know picking up maybe a helm chart or Some sort of output that comes out of your kubernetes workflow and applying it to a different edge engine So we heard uh red hat this morning talk about um using the um how the um Using docker or using podman for example as a way to deploy There's commercial solutions out there of ossa for example IBM has an open source solution and open horizon There's a number of sort of edge native container orchestration systems that can consume a helm chart or consume a kubernetes api But actually deploy on something other than kubernetes So if you're running a handful of containers and you know, you want to go to 10 000 sites or 50 000 sites k3s might not be the right solution for you. You might want to do sort of a hybrid solution Now and if you fall into sort of that gray area where you are sort of deep into kubernetes But you have sites, you know in the many thousands You know, that's an area that you know a lot of work going on but not well solved All right, so that's an area where we just sort of say let's talk about it Let's see what you really need to do there Um, I think this is an area where as a community kubernetes is going to start evolve and and really start to bring some value is You know, how you deal with hundreds or even thousands of really small clusters It's a very different workload these things flap in and out as networks go up and down and you know Orchestrating that is an order of magnitude different than what these things were built for So that's an area where you know, it's kind of bleeding edge. We do have customers You know pushing tens of thousands of single node clusters They're sharding their rancher instances so that they can start to hit that scale But even that will break at some point So this is an area where I think as an industry we can come together and start to think about like how do we really solve For this, you know, true kubernetes runtime scaling beyond, you know, one to two thousand types of devices So hopefully those helped some considerations to think through Um, there's a feedback QR code would love some feedback on this presentation Um, I also have time for questions about five minutes. So would love some questions if there's any in the audience Harry I know you always have questions. Yes So I believe the question was have we seen anyone try to use namespaces across clusters to cover the distributed nature of it Is that a fair Try and treat all their customers as one big cluster No, we haven't Um, primarily because of connectivity is challenging around that Um, when nodes drop in and out it It's difficult at scale when you're running a couple thousands of those So we have seen like an uber orchestrator that manages lots of individual clusters, but not one gigantic Say three thousand node cluster In two thousand locations. That's not something we've seen people be successful with Good question though. Thank you All right. Well, I appreciate everybody's time. Thank you