 Hi, everyone. I'm Dave Racheck. I'm here to talk about Baccalaure, a new platform to unleash truly universal compute. I'm CEO and co-founder of Baccalaure. The project has been out for about a year and a half, and we've seen tremendous uptake. And I'm going to tell you a little bit about what our approach is and how we got underway. The first thing I think it's kind of interesting is to talk about what we mean when we say universal compute. And the idea is, and this is an initial stab at us taking a definition out of this, a global platform designed to run jobs close to the demand. Today, you often have single zones where you need to run all of your compute and move all the data into that single location. But that's not where the data is. The truth is that we see data and the need, the demand really all over the place. When you think about just data, by 2025, IDC projects that 175 zettabytes of data will be out there in the world. And it will be all over the place. It'll be obviously growing rapidly with more than 57% of that data growing outside of your core data centers. Second, it's often going to be slow to move it, even with the fastest, most reliable networks in the world. It's going to be much slower than your data growth is going to be. And then third is the reality is that even if you could move it, nearly 100% of it's going to be under some form of governance, whether not as regulatory or competitive or confidential or whatever it might be. And as a result, you see 68% of all data going unused according to Gartner. So that's a lot, probably too much. Now the question is, is how do you address it? And one thesis on how to address this, hey, what if you didn't have to move that data at all? What if you could instead go near the data? And the way to go about that is one of three primary approaches today. One, you can build a centralized system. Two, you can build and manage it yourself. And three, you could do nothing. When we think about building and managing it from a centralized system perspective, you see centralized systems like this all the time. And they're great. We are not trying to replace any of those things. These are incredibly popular, incredibly widespread adopted things. And you might deploy them into a central API server, which is great. And then they'll have their own compute fabric that they will sit over the top of with their own storage. And your data scientist will come along and say, okay, I'd like to run some analysis against this. She hands her job to the central API server. Central API server goes to that storage that it has under its control, gets her her response, and she's happy. So good stuff so far. Now, the problem is, is that as you go forward, maybe you have things in various locations. Maybe you have them not in your on-prem data center, but in a cloud somewhere. Maybe you have it on multiple clouds. Maybe you have it on other places like ships, planes, cars, things like that. And now when she takes her job and hands it to that central API server, the central API server doesn't know what to do. And this is because fundamentally, these platforms are not designed to go across network boundary. The moment you leave that central single zone, the central API or the central services are not going to perform as well as they expected. So, okay, that's great. Our team recognizes that this is a problem. And now they're going to go about and they're going to build it themselves. This is a very common pattern. People say, well, how hard could it be? It's just an orchestration system. I'll do it using Bash, SSH, Go. You know, there are many, many platforms out there designed to make it very easy to deploy. In this case, our person, our data scientist says she's like, now would like to run this outside our core data center. And the system's great. It does exactly what is expected. It understands how to bridge and get over there to that platform. It can bridge that network boundary, as we mentioned before. And it can do it reliably. You know, again, these systems are designed or these networks are not reliable. Just do the nature of networking. You can build it so that it can retry and things like that. Which is awesome. She's very happy. Unfortunately, the first thing she says after that is, well, geez, you know, I'd really like to add a whole bunch of new features. And these features aren't particularly exotic, right? Being able to schedule against data locality, being able to schedule using green power, using single geography or other label information. These are pretty straightforward, but they are very complicated. And when we talk about our developer here, she would like to go on vacation. And because they are maintaining this centralized platform, you're now up a creek. So then the third thing that people do often when it comes to these decentralized platforms and this distributed data is actually do nothing. This is far more common than you would imagine. For example, this is a real use case that we ran into where you had a customer who had video cameras set up all over the place for normal monitoring of important situations. And you had a data center where they wanted to do a variety of processing of that. Now, if they were to take all that data and push it into a single API, or excuse me, a single data center, that'd be more than 4.4 petabits a day, which would incredibly overwhelm the amount of stuff and basically make it impossible to process. So instead, what they did was they would have our, they would set up local people to just watch them on a monitor. You would have someone watching 24 hours a day for variety of different clusters. Maybe you had one near a police station, maybe you had one near a hospital, maybe you had one near a prison and so on. Where they were watching the cameras, the data would go in there, but then they would just throw the data away because there was no reasonable mechanism to process the data or store it at cost. Okay, so the solution is we should build an open source compute over data platform, one where you're able to build these kind of platforms, but do it in a scalable way. We're at opensource summit, we're going to talk about doing this in open source. The way to do it is instead use an open source platform that everyone can contribute to. It is designed to be open sourced and extensible so that you can extend it into the binaries, the applications and so on that you'd like. It's built for multi-cloud and for non-data center kind of situations. And because it is maintained by the community and open source, you can trust that it's going to be there even when your primary developer goes on vacation. And now instead of doing nothing, you're able to take that same data, put it into the distributed data center or compute architecture and then move only the subset of that information back thereby processing it and making it actually useful. So that was our question when we got started. It was could we build a distributed computation system that was built for the distributed world to take on those challenges you just saw? We were in Portugal at the time when we were coming up with this idea and we kept typing in the word compute over data which we abbreviated as COD. COD is the fish, which is how we came up with the name baccalaureate, which is Portuguese for cod. And the key scenarios that we're trying to tackle here are what you see right here. One, transforming your data before you move it. Two, executing over poor networking. And three, using data in isolation. So to transform the data before you move it. Before baccalaureate, we might have a node out there in the world. Each node is producing you know a variety of different types of logs or excuse me data, it could be logs, video, imagery, text, you name it. Now you want to do your processing in a central orchestration system, but before you would have to move it. And in moving it, you're likely to run into one of these challenges. Now these are just the absolute standard challenges that come from moving data at all. And of course it's not just from one machine, it's often from many, many machines or a subset of those machines. And in order to do a command, in order to do some form of computation against that data, before you move it, it often requires quite a bit of work. This is where baccalaureate comes in. Now with baccalaureate, you run the agent on the machines on every machine in the cluster. And you say, okay, I'd like to run just against these subset of machines. And I would like to do a transformation, whatever that might be. Filter, snip, downscale, downsample, whatever you'd like, and only move from those machines into your central cluster. Now you still use the central API server. This isn't a rip and replace for the value that those central clusters provide. But instead this is an augmentation to them to make your overall processing much more efficient. And so you reduce the challenges of transforming your data before you move it. And nearly all data needs to be transformed somewhere. Could be the wrong schema, could be the wrong file type, could be too big, could be too small, could be you name it. Any kind of transformation we can help. Second is executing over unreliable networking. You saw a little bit of that before, but you know, let's say you have any of these, you have a global deployment, maybe you have things that are operating cross cloud, maybe you have things operating on the edge, planes, trains, automobiles, you name it. And today you can deploy to those places, of course. But when those things appear and disappear, because they will, because that is the very nature of network partitions, your system often has to have a much more complicated system in order to queue and process that. With back of y'all, it takes care of that. It is built to handle network partitioning. And only when those things appear will the platform end up deploying and managing that. And we take care of the hard bits of ensuring that this is the job you have described, how you would like it to run and process, and we will get it there. So back of y'all improves executing jobs over those unreliable networks. Third is using data in isolation. Again, before back of y'all, before these kind of isolated systems, you might have a deeply interconnected network like this. But there may be many reasons why these network, these systems can't talk to each other, often because of regulatory or confidentiality concerns. And so maybe you don't want OR-A and OR-B to see each other because that might be in violation of the law of your particular geography. And worse, let's say you have a center of excellence around something like ML, that even then, because it's outside of one of those organizations, you're not going to need to be allowed to have those communicate either with either of those organizations. This is where back of y'all comes about. With back of y'all, you deploy an independent isolated network to each location. Your center of excellence produces a model, but all the processing takes place within the organization itself. And you have a committed audit long showing that the data from those particular organizations never moved, thereby keeping you within compliance. So that is to say back of y'all processes the jobs in place without moving the data and reducing leaked information. Okay, so that's a lot of talking. Let me get to a quick demo. Okay, so what I've set up here, oh shoot, let's get a little much bigger. Oops, sorry. Any questions over? Pretty good, sorry. Okay, so what we're going to start here is I have a bunch of machines set up on Google Cloud. We work across compute instances list. Okay, so I have many, many machines set up on Google Cloud. In fact, just to raise the challenge here, I have one VM set up for each zone that Google Cloud offers. Now we work cross cloud, cross AWS, cross everything. I just chose a single cloud to make it easier. But you can see us running here in all every single zone, you know, top to bottom. Okay, so on each machine, I have running a simple thing that's creating a fake log. So if I SSH into one of these, I'm going to grab you here. And I do cat, log, logs to process. Actually, I don't remember what the final name is. I'm going to grab you. Okay, so here we are. We're creating a bunch of logs, pretty straightforward stuff. And now we're going to look at the log just so you see what's inside it. And there we go. So this is an example that you probably have run into many, many, many times, right? You have your log. This is a JSON formatted log. It's really straightforward in each one. You know, pretty straightforward. It's got timestamp ID version and some logging information, you know, just to show you what's going on here. Now, in addition to this, what I've done is I've set up a backup yaw agent on every single machine, setting it up is incredibly straightforward. It's a single command, you just say backup yaw serve, that starts your first node. And then all you need to do is have every other node rally against that first node. That's it, presto. There's nothing complicated. There's no I love Kubernetes to death. There's no ncd. There's no networking. There's no anything like that. Each node just talks to each other very, very straightforward. And so just to show you that I have in fact set it up across every one of these machines, right here, you see me up this didn't doesn't look very good because it's all spread out. But you can see here that every single one of the machines has it going. And just to show you, I'm going to grab this and I'm going to say list, grep region. There we are. And comes back a hundred by notes. Okay. So now I have a single cluster running 105 different nodes, each of which are talking to each other. Okay. And so what I want to do first is let's just show you how easy it is to run a single job. I'm going to do this. I'm going to say back a yaw Docker run, and we support Docker wasm, we support arbitrary binaries right out of the hood. I'm going to run this on Ubuntu. I'm going to say hello. Japan. Okay. So what it's doing right now is it's going out to the node, it's going out to the cluster, it's finding one that matches that. And it's running on top of that. And you can see here that after running it, it will say I'm going to describe the entire job just so you see what spits out. Pretty straightforward stuff. Here's the stuff. Here's how it communicated back and forth. It picked one of the nodes and there you can see it said hello. Pretty straightforward stuff. Now the truth is, is that I, I picked it, it just picks an arbitrary one at random, because you know, I don't want to, you know, I didn't know which one I wanted to run on. In this case, let's say I wanted it to run on this last one. Oops. This. No, that's activity longer. All right. Well, I guess I closed it. That's weird. Code. See live demo promise. It's real. And all this code is out there and open source so you can go check it out yourself. And I just need to you wouldn't normally need to do this but I need to point my make sure I'm pointing at the right cluster. And so now we are going to make sure that we can still list our nodes. There we are. Great. So now I'm going to say I'm going to do the exact same thing I did before except I'm going to run. And I'm going to use a selector, right? So now I'm picking a name and a value to run next to each other. And I'm going to do the same thing. Echo. Hello. Except I'm going to do this one. And so there you go. So now I'm running it. And it's going through. It's finding across the entire network. It's finding that selector. And then it did that in 2.9 seconds. And again, just to show you nothing on my sleeve. Here you can see it ran. And here you can see behind the scenes. It used a selector. It said region equals and then the region that I specified there. Pretty straightforward stuff. All right. Well, that's so so that's easy to do. I could do that with, you know, random SSH command. How about I run on every single node simultaneously? Presto. So now I'm targeting every single node in the entire cluster. It's creating the job and running it. And in just a few seconds, it's going to go across every single one and run that same command. It's going to take just a few seconds longer in order to do that. But what it's doing here is it's communicating across the network as a whole. And it's seeking this out without a central control plane that you normally have. It's taking a little bit longer than I would have expected. There we go. And so we're done. And like I said, it does it without a central control plane. But you can see here that it's produced many, many, many things, each one running across one of those individual nodes. And if I do this, and I grep for job ID and nothing up my sleeve, this should come back 105, 106. Actually, because there's a header one. So there you go. Just ran in 26 seconds in every single zone across all of Google Cloud. Okay. Pretty straightforward stuff. All right. Well, these are just little toy examples. Let's start to get to good stuff. What does it look like when we actually start processing those logs? So I've created you don't have to do everything from the command line here. Let me just show you real quick. You can see here that I'm going to do that same selector for a zone right there. And in this case, I'm now going to run an arbitrary query against that log file that you saw earlier. So now I'm just looking for selectors, and I'm using DuckDB to do it. Okay, so pretty complicated. I have containerized DuckDB. I'm going to push it down to that node. I'm going to run it over the data on that node and and finish it out. And in order for me to do that, all I need to do is this. You don't have to do things just on the command line. You can use the create command there. And this one takes a little bit of time to because it does have to download the tater. But there you go 4.9 seconds off it went. And now if I go and look at this, what you're going to see here is that it's selected out of that file just the security elements. So previously where you had all sorts of different data in there, now I'm downscaling and sampling that log file like on the fly. So far so good. That's pretty cool. What would it look like to do that same command except against everything? And so that's what you're going to do here. Here, you're going to do the exact same thing, the exact same command, except now we're going to target everything in the entire, this is called targeting mode. And we're going to target every single node in the entire cluster and do the same thing. And this one actually does take a while because it does have to do a fairly big join across all these various nodes and then run from there. But what it's doing is behind the scenes, like I said, it is communicating via peer to peer to every single node in the network. It's downloading a container that contains .db that I pre-built and we support Docker or Wasm or any arbitrary binary that you would like that's already on the machine. Everything is allow listed in, including the directorates able to read, the binaries it's able to execute, the type of networking that it supported. You set those criteria at the point of starting up those servers and it's very straightforward to do that. And then it is running this .db against the container itself and then finishing it and producing it. And so now we're going to do that same thing and look at the results of the job. And you can see here it's got many, many, many results. And again, just to show you nothing up my sleeve, we're going to run and we're going to grab for job ID and we're going to see that there are 106. Press down. And it's 106 because of that header. There are only 105 nodes. It's just looking for that first command. So far so good. And so behind the scenes additionally, it's not just within the network. Each node has the ability to upload as well. And so in this command in particular, I'm actually also uploading each single one as it produces it. As it produces it, it is also uploading to a specific bucket for each one. And if you see there when I just ran that, I uploaded a file into that bucket. There you can see that. And I'm going to read that file that we just uploaded. Press down. So now what you see is this is that's all the security data that came out of that file under the hood. So again, very straightforward. I was able to reach out across all of these individual nodes and process that. And just to show you again one more thing that's possible here in addition to uploading it to the, what do you call it? Uploading it to a bucket for that. I also uploaded every single node, every single node also uploaded to a single global bucket so that you could just archive it all together. Does that make sense? So now we have a truly global compute that spans across every single zone that Google Cloud offers. And again, it could be anywhere, it could be on AWS, it could be on prem, it could be IoT. Doesn't matter. As long as the nodes have the ability to see each other, you're able to run these sorts of processes. Makes sense? Any questions? Yes. When you talk about, it's basically doesn't use a control plane. So is the idea that there is an agent running on every node? So when you say without centralized control, who's issuing the commands, right? Like there is some node issue, queering everybody else. Is it like an ad hoc center, your execution node? So the way it works is it actually bids out to the entire network. So what you saw there is like, for example, it said, here's a label, whatever, US West 2. Each node will have a label that says, I am a fit for this or I am not. And the first node that wins that bid responds and says, Hey, everyone, I've got this. And every other node says, oh, I'm later than that node is. So I'm going to call off. If there are multiple nodes that fit and they all fit within the context of the system, then the system will allow multiple nodes to do it. But each node is responsible for itself. Instead of there being a central order book, it is across the entire network as a whole. Now, that said, that's one of the reasons why we are explicitly slower than some of those centralized control planes. That's a good thing, right? Because we are far more resilient about the network. So for example, you could submit a job to the network, that the node that you want is not even there. And it will sit there in the queue, according to what you want, right? If you say, Hey, you know what, I want this job to be out there. And I wanted to sit there for 10 minutes. And if the node matches during those 10 minutes, great, it'll run. If it doesn't, it'll report back to me that it times up with a central control plane. Again, it's not bad. It's just different. The moment you submit it, if that node isn't there, it's going to tell you, Hey, there's nothing that matches. Does that make sense? So it's just a different way of approaching that problem. Any other questions there? We have been moving extremely, extremely fast. We just announced 1.1 in August. We're going to announce 1.2 in December, in just two weeks. And our goal is to really listen to the community. This is an open source project. Back of Yao is open source. We would love to have you come and contribute. I am the CEO of Expanso. We are a commercial company that is backing open source. But I also like have a deep experience in Kubernetes and Kubeflow and so on. We deeply believe in open source and what we want to make it possible for all of you to like add, contribute, whatever you think so that it meets your needs. Some of the things that we've heard already, operational traces, you'll be able to trace inside a job yourself as well as the overall job. You're going to be able to do things using a queuing system like we talked about, where you say, hey, you know what? I'd like these jobs to run, but only when the spot price goes to whatever, 31 cents or this particular region is green, which happens over the course of the day, so on and so forth. Having Kubernetes CRDs, private data secrets, so on and so forth, external credentials, data catalogs, and so on. And like I said, or better, you tell us. We do have a community roadmap. We have bi-weekly community calls and workshops and monthly newsletters, lots of stuff coming. We have a blog that we're publishing on very, very regularly. And all of these things are out there plus a whole bunch of use cases about how to save money logging, how to run EDGE ML and machine learning and things like that all through our platform. This is exactly what we're designed for. And with that, I'm all done. Any more questions? Yeah. I'm wondering about data integrity. What if I did something wrong with the command and you already distributed your job to different nodes? You can cancel jobs after you've submitted them. It will be slower, so you need to be thoughtful about it because it basically, the cancellation process is effectively a new job. It's not a new container, but like you have to distribute the cancellation command in the exact same way that you distributed the job in the first place. But yeah, absolutely. You can absolutely cancel a job once you get going. Is that answer your question? Yes. And also, what if I, you know, making the command like a batch command, you know, including, you know, several, I'm not quite sure how to pronounce it. The backer, yeah. Yeah, commands. Maybe one, two, three, four, five commands at once. And then the job, the commands already submitted. Yeah. And maybe at the middle of the command, I would like to cancel. It's going to be just because the job is going to be sequentially, you know, executed. Yeah. And you can, you know, intervene in the middle to cancel. Can you do it? So you bring up a really interesting point. And let me see if I can restate it. You're asking about chaining jobs together and, and workflows and things like that. One thing to be clear about is here we don't offer a workflow engine. We have thought about it a lot and we will like have some ways of going down this, but we recommend the best way to do workflows right now is with an external platform. So we integrate with platforms like airflow and flight and so on. And what you do there is that you'll have a series of workflows based on, you know, in those external engines, each one will have a unique back of your job. So what you'll do is you'll say, okay, I'm at step one, I want to go find this data on the network, go run it, you know, whatever on 10 nodes. And then off it goes. And then once those 10 are done, that's when you pick up the next job. And so if you said, Hey, you know what, I were now it's a five step process and I'm at step three, I don't want to go on to step four, you would tell that to the workflow engine and then say, Hey, whatever you're doing, stop and cancel everything you have, that'd be the workflow engine. We aren't like I said, we are thinking about chaining those jobs together. So please give us your feedback about exactly what you'd like to do. Because we do see a lot of the time people want to do a series of jobs next to each other on the same node. And in that case, that's where you might have a little mini workflow, but having a centralized workflow engine, that's where we want to plug into other people generally speaking. Any other questions? Okay. Well, thank you so much.