 So Sintalk is about serverless 2.0 using closed state.io and stateful functions with Python. Yeah, really interesting. Yeah, I'd like to talk about, oh, should I start? Yeah, you can start, but I don't know if you are sharing your screen, that's all. All right, cool. So yeah, I'd like to talk today about CloudState.io. It's an open source around what we like to call serverless 2.0, which is actually stateful serverless versus the stateless serverless we've been used to with things like Amazon Lambda function as a service. I'm Sean Walsh, I'm field CTO, and Cloud Evangelist would like Ben to is behind this effort. So Berkeley recently made a prediction that serverless computing is going to dominate the future of Cloud, and we agree. So why serverless 2.0? Why the next iteration? Function as a service was a great start. It gave us the mechanisms, a way of thinking around creating these components that we can begin to manage and take away the operational difficulties on behalf of developers, but it was only the first step and we need to iterate. Function as a service is not equal to serverless. Serverless can be much more. We need to be able to allow course screen what we call general purpose applications to exist in serverless. So not exactly what you would call a little fine grain function, but maybe an entire application might be able to be deployed to a serverless platform. So function as a service to revisit, great for embarrassingly parallel processing, orchestration, stateless applications, job schedule, orchestration, things like that. Especially things that are very low impact on the database, quickly being able to retrieve data and make a decision and write data back. What it's bad at is reasoning about as a holistic application, making any sort of serverless platform guarantees around two reactive tenants. One is called responsiveness and the other one's called resilience. You need to be able to make these assumptions that these characteristics exist to be able to have any kind of a serverless platform. And again, general purpose applications. So function as a service gave us the abstraction of communication and it works great as long as everything is fast flowing and smooth and any given function isn't probably trying to do too much. So the message is input, the function is hosted somewhere, it does some thinking and then a message comes out, simple as that. And the operational concerns are handled for us. It's the first steps of being opsless. So here's a little bit of a beginning of the problem. So message in, the function now is doing something in the middle, it's reading from a database, maybe more than one database, maybe it's doing joins and then a message goes out. The big problem here is that that database interaction is a really big black box. We have no idea what's going on. There's no guidelines to manage it. That means if you're equating one function to another, you really can't do it because they do very different things. There's no systematic way to reason about what each one's doing. The function as a black box. What is missing here is state. So far when we talk about stateless applications they really are stateful but that state exists in your database. It's a little bit unnatural because things in space like us and our cars and our phones, those things have a current state. They're not separate from their state. I think that's a problematic concept from the beginning but something we're very used to as developers. So serverless 2.0, what we propose is that real-time database access has to be removed to allow this sort of autonomy and reliability of our functions to be able to reason about them in a way that's uniform. We can't make these guarantees if we're passing an entire dataset to a function saying, hey, here's all the dataset, do what you need to do because we're trying to create an abstraction or to allow unbridled reads from within those functions as can exist in function as a service. So function as a service again, abstracting over communication, the message comes in, the function does some thinking, read some data and the message comes out. Stateful serverless, we do the same sort of thing. The message comes in from a user or another system, the function does some thinking and the message comes out but also we are sending state in at some point in time, probably at initialization time, we're sending state in and the user function is then holding its individual state on behalf of whatever domain it's serving, it's able to make the decision without having to talk to a database and then when it makes a decision, the new state goes out somewhere because it will need to be reinstantiated at a future point in time or it might have to be reinstantiated because it's on an unhealthy node and it has to be reinstantiated somewhere else. So we've really just introduced this concept of state but that's not quite enough. Again, we can't pass the entire data set in as part of this flow, we have to figure something out. Center cloud state. So cloud state is this, I'm gonna read this wrote. Cloud state is distributed, clustered and stateful cloud runtime providing a zero ops experience with polyglot client support. What we'd like to say is essentially serverless 2.0. It's open source, best of breed, harnessing all the power of open source technologies while removing the complexity as much as possible from things like Kubernetes and whatever database you're gonna be using, be it Spanner, be it SQL, no SQL. We really just lift it up to make it so developers don't need to think about the ins and outs, all these things. You leave it to this platform. So you wouldn't worry about the complexity of distributed systems, high scale systems, managing your service meshes, your databases, the state, how does the state get to the function? Those things are all matched for you. Routing, recovery, failover, all those things are inherent. And then operationalizing and running your applications. It's really just a matter of hooking into a CLI into your build process and it automatically will go into whatever environment and be running and then you'll have all the benefits of a stateful platform that is elastic and scalable and all this. So some of the technical highlights of cloud state is it's polyglot. Any computing language that has GRPC capabilities is fair game to be a client for cloud state. So no longer do you need to have a team that is comfortable in a language and you need to find platforms for that language. This is a language agnostic platform. Everyone should be able to play. And I think it's important enough that that's a really important concept. It's got really great state models. So event sourcing, that's really important for us. I alluded to the fact that you can't pass the entire dataset in. There's one useful constraint that we found to make this all possible. And that is event sourcing. I'll talk more about that in a few. Command query responsibility segregation, which we're also calling domain projections. So your reads are separate from your rights of your system. So your events are modeled in the events or the events. And then any number of interested parties could take those events and paint whatever picture across the system they want, the synchronously. Key value store, create read update delete. And in advance, one of the advanced topics I find is CRDTs, conflict free replicated data types. If you're not familiar, they're a highly available distributed sort of a read source. And it's a multiple structures that keep in sync. So when you go to read something, it'll be in memory. And if you're talking in a cluster, it'll be very highly available and almost every single node that you're running. And we're also PolyDB. So it's whatever database you choose. It'll hook into cloud state seamlessly. So at a high level, the technologies that we're using are ACA, ACA open source concurrency toolkit, GRPC, which is the way we're able to have a low level communication between the cloud state mechanisms and internals to whatever language you're implementing it in. As well as a contract with the outside for your users, you could have people call into cloud state services using GRPC or REST or anything else. Knative growl, growl is important because across these different languages, some of them are JVM languages. They have a little bit of baggage. There's garbage collection, things like that. We need to be able to compile everything with a native image that will be able to guarantee sub-second startup of pods and Kubernetes, which really again gives us these guarantees that we could be elastic and quickly scale up new nodes and all this, of course, running on Kubernetes. So I alluded to the fact that we've got a useful constraint and this is by a theologian, believe it or not. And he said that freedom is not so much the absence of restrictions as finding the right ones, the liberating restrictions. So sometimes our restriction can actually set you free as we think in this case. So this constraint for us is event sourcing for cloud state. So some of the benefits of event sourcing, it's a single source of truthful history. It allows for this memory image, this durable state running inside of some encapsulation. In this case, it's a cloud state we call entity or CRDT or a CRUD entity. And it allows the building of the state from the events over time, because events are a time series. It avoids object relational mismatch. A lot of that's also in combination with CQRS, which is the way to separate your reads from your writes. I don't know how many of us have gone and designed a system and we've laid everything out according to domain and we're very proud of it. And then the UI people come over and say, hey, I need this and I need that and we just pollute our domain. The read and write concerns of your system are two completely different things and they were equally important. One shouldn't affect the other. And it allows subscriptions to the state changes so you subscribe to events and the event is useful for different parties for different reasons. I like to use the term state is in the eye of the beholder. You could have a state of something, let's just say an airline flight and there's all kinds of the characteristics of the flight. But ground control cares about very different things in flight control. So it's important to bear that in mind. State is not something to be shared across different processes. It has mechanical sympathy. You're only ever appending with events. So this is how event sourcing works with cloud state. We have our user function. We were also calling that an entity. The entity is that holder of state. Now, when you instantiate it, the event log is replayed. So all the events in the past on behalf of this entity are replayed to it and it's building up its current state. I don't know if I talk about snapshots here. I probably don't if you have the question. If there are a real lot of events where there's a concept of snapshot. So you start with the state snapshot and then you overlay the events since, just in case. So the events come in, build up your state and now you're ready for business. Your commands are coming in. Somebody saying, hey, add a contact to a customer. And you're looking at your state. You're saying, okay, I can do that. You add the contact and you say, hey, I added the contact or I'm about to add the contact. And then the event is contact added and it's now in the system of record. And when you instantiate again, that'll then I'll be played back to you so you can then build up that state. So you'll see that contact in memory in the future. And the state will also be reflected inside the entity that has just written the event. It'll update the state in memory as it does so. So the happy path for one of these functions is the user issues a command to do something on the domain. It goes into a mailbox. All of these entities are bounded by a mailbox. So there's no issues with concurrency. There's no blocking at all. These functions fully process a message and event it out before even thinking about going in and getting in on the message off the mailbox. And so that command does some thinking, looks at its state, issues a new event which goes to the event log which may be subscribed to through some event bus. Now let's talk about the unhappy path. So the sad path, this is recovering from failure. So we have our event log. We're replaying our events. It's actually building up state in the function. And now we're ready for business again in comes our command and now goes events. And you can also do CRUD. So in some cases event sourcing, CQRS, CRDTs, they're all pretty advanced concepts. You might have just a subsystem which is just a user, maybe a user and a phone number or something like that. How many things do you really need to do on that? Does it really need events? You could use a current state model and you can use CRUD, we can handle that. So in that case, we just use snapshots. Your snapshot comes into you, you put it in memory and then you're processing messages and then you're sending the snapshot back up, back out every time. So what is the architecture for something like this? So again, we're running on Kubernetes and we've got a series of pods that represent your user functions. So you've got replication here. You can have any number of these as required. If you're running one user function, you probably would never need more than a couple pods, one pod. But you can host multiple user functions in one image and therefore it's useful to have more than one pod and you can scale it up and down. So your user functions live on these pods in whatever language you've implemented, communicating via GRPC. And then we have the cloud state proxy which is the ACA sidecar on these pods which spans these pods. And so ACA is actually receiving the messages from the users and it's communicating to your user code via GRPC. Your user code is doing all the logic, all the thinking. It's just all of the traffic control and the ability to write the events and to be able to play back your state. All that is on the left side. User functions doing all the business logic. And the ACA sidecar is also communicating in real time to the data store whenever necessary or synchronously. So that ACA sidecar lives on each pod alongside your user code, it spans those pods. But it is also a cluster in itself. So ACA cluster exists for your application which is a series of these functions. You've got a cluster that could be expanded and contracted to do its work. And so that's how the communication location you might have a user function on pod three that all of these are singletons really. They might be represented by your user function in multiple pods but in ACA you've got a cohort which is a persistent actor which is also a singleton. And so it needs to be quickly located within that cluster across these pods. ACA takes care of all of that. So again, GRPC communication, gossip, location routing to wherever things are located, talking to your data store, all that's happening. So if we look at cloud state as a managed service now, you could pay as you go as you can with the function of the service. On-demand instance creation, passivation failover, auto scaling up and down of pods, only paying for what you're using at any given time just like function of the service. Zero ops, so automated, really automated everything. All that state failure, provisioning, routing, deployment, all the upgrading, canary deployments, things like that all would be part of the platform. A little bit about multi-tenancy. My opinion function of service says inadequate bulkheading. Maybe not in all cases, but I know it has happened in AWS where your neighbor's function can hog resources. It's not in cloud state, if you're doing Kubernetes correctly, if your hardware's set up right, you really got this clean separation of things via the pods and really good bulkheading. And even at the data level where you're assigning different databases to different tenants, you don't share big databases cross-tenants. I think that's probably the wrong way to go. And complete security to the extent that Kubernetes has security do these clear separations. So quick look at what a three-tier architecture looks like, what we're so used to, what we call a stateless application, looks a lot like this. You've got the middleware in the back end running in the middle there in a number of pods. You've got a load balancer in the front on the left there and you've got a big database on the right. Every single request will have to go to the load balancer, to one of the pods, one of the nodes in the middle. It's going to definitely have to hit a database at least once. It's probably going to have some chatter in the middle. It's probably going to hit the database again after and then it's going to return some data. So it's very noisy, noise equals risk. Reactive architecture is a lot different in that your database is still there. It's needed as an event log, but it's not needed in real time. The database interaction is never needed for your functions to do their job in real time. The data is already in the functions. The state's already in the functions. They're doing all the work. And when they've done some work, they say, hey, by the way, database here for next time. So you really have a much reduced risk in a noise factor here. So just a very high level of the architecture, from the bottom up, you've of course got any number of, if it's a database, it can be part of a cloud state instance. Spanner, News SQL, NoSQL, SQL, all running on Kubernetes. Knative, I'm going to have to take that box away. We're not actually utilizing that right now. We are utilizing Grail VM for native image and very hard utilization on Aka because Aka's clustering capabilities and its persistent actors are the underpinnings of the cloud state entities. Now, the above that are the actual methodologies we're able to create with that, which is the event sourcing, CRUD domain projections, which are views based upon events happening across your system that are kept in sync and built for you by cloud state. That way, when you go to read for display, it's already waiting for you in memory or in a record in a database somewhere. Key value store, similar to Redis and these count click free, replicated data types, if you have the need, you know what it is, CRDTs. And then all these languages plus, plus, across any language that has GRPC can be supported. Istio for your load balancer and then any mainstream communication protocol, GRPC, HTTP REST, Kafka, what have you will work. So let's look at some code. I'm gonna make a little admission before I go into this. I did a lot of Python. It was in health and wellness. It was for loading a new system with billions and billions and billions of records. That's the extent of my Python expertise, but I did take some Python sample code from our Python cloud state library so you can glean something from it a little bit. But please do check out the GitHub repo when I'm finished. I'll give you the link. So to be able to have a cloud state application, regardless of what language you're gonna be implementing, you've got to set up your protobufs for the GRPC protocol. This is gonna be all the behavior of your application at a functional level. So in this case, we're talking about a shopping cart and the shopping cart, we're gonna model the messages for interacting with a cart. The one thing you'd like to do with a shopping cart is add a line item and it would have a product ID, a name and a quantity and a user ID. If you see that, we're also marking it as a cloud state entity key. So any entity in cloud state needs to be sharded with a unique key. So what you're doing is instead of having a database on disk that has a unique key, which would be user ID, you've got these distributed functions in memory in a cluster that are sharded by user ID. So it's a similar concept to a database table. If you're not familiar with protobufs, you'll see that when you say string user ID equals one, you're saying that the data type of course is string, what am I calling it? And then what is the ordinal of it? One, two, three and four, the ordinal of these attributes. Remove line item again, you tell us the user ID, which is the ID of the function and then what is the mandatory attribute for removing a line item? It's a product ID in this case. There's the ability to get the shopping cart if you'd like to view it. In that case, you just need to be able to give the user ID key. You have line items here, which are part of a cart. And we modeled that here with line item and then it's used as a repeated collection of items inside of cart. And so now we can have our service that uses those messages in our GRP service called shopping cart. So we can add an item and we can remove an item, we can get our cart. If you're not very familiar with Protobufs, this is a really cool feature. So out of the box, if you do a cloud state implementation using Python and you know the language, you're gonna implement this GRPC backend for this service. What you're gonna get for free here is if you include this optional section, you're gonna get rest just by default. So you get GRPC and rest at the same time. That's what the meaning of those things are. And then there's also another file here, which is actually your domain. So you can model your domain objects also again in Protobuf, makes them much easier to share and return as data when you have callers. So again, line item, an item added events. So I talked about event sourcing. This is our event for item added. It's got a line item inside of it. This is our event for item removed below that. And then we also have our shopping cart state. So we'd like to be able to return that to a user. So we just have a message called cart, repeated line items. So this is what it looks like in Python to actually model one of these now. So it's not a real lot of code. It's a shopping cart. You're gonna be able to have billions of them in memory given that you've got enough cloud hardware in place and cloud state installed. It's fairly guaranteed that you could model the world. There's no limitation if you've got enough of these nodes. And all you need to do is market appropriately as an entity, a data class. There's a little more code in the sample that I think you should probably take a look at where it specifies what this underscore shopping cart and the file descriptor are. It's a little bit busy. But this is some of the code what it looks like to create a shopping cart in there. You'll see that you'd like to snapshot it. This is a callback function that is going to actually call for your snapshot when appropriate. If we look further, we'll see that there's a snapshot handler, handle snapshot. This is your callback from the cloud state saying, hey, here's your cloud start, your state. And you'd set your state in your cart, your internal state to that. And then you'll have the event handlers item added. You would then, every item added would be called back into you and you would add them to your state. So it's all very callback based. You just implement these annotations and you're in business and remove item. If you're interested, that's what remove item would look like. You take your item out of the cart and you return empty. So on behalf of the cloud state team, we'd like to say thanks. We'd love to see your interest. We are always looking for contributors. Any questions you might have, we'd love to hear about them. The full sample, I'm gonna leave this up for a few secs. This is where the Python support and with the widget shopping cart sample exists. I encourage you to pull it down from GitHub and take a look and run it and play with it. So that's all I have. And let me count my volume here. Hey, thank you very much, thank you. Okay, so we have time for questions. There's a first one is for Andre. He's asking if it only runs in Kubernetes. Yes, yeah. Kubernetes in our experience is the way the world has gone. We made a little bit of a bad bet using DCOS some years ago. And it's very clear that Kubernetes is where the world is going for cloud. Okay, so any other questions? If anyone wants to use a microphone to ask a question, we can also do that. Just raise your hand, click the raise hand button and I can enable that. Or just click in the Q and A and ask a question. Let me check on the channel, just in case, okay. So let's call the takers. All right, thanks everybody. Yeah, so thank you very much. Thank you for presenting. I'm enjoying the rest of the conference. See you. All right, thank you.