 Hello, everyone. Welcome to the Open Source Summit. We're excited to be here to tell you about this specific case about serverless data storage. I'm going to start with a few questions. How many of you have heard about serverless workloads? I bet a lot of people. How many of you have heard about stateful workloads? I suppose a lot of people, too. You have the most typical ones like databases. But how many of you have heard about specific serverless workloads just for stateful applications? Probably not that many. So we haven't seen that many. And this is why we think this is an idea. So we're trying this new approach or we think not many people are doing this. And this is the whole premise on this talk. A little bit about ourselves. I work at Rakuten and I'm in the CNCF. I'm a co-chair in the SIG runtime. And Amarjit works in Keoxia, America. So next. OK, so some of the history or what we're going to be talking about today is we're going to talk about a little bit of history behind the serverless applications and some of the storage technologies. Then we'll talk about what stateful applications are and how you can see some of these existing applications. Then we'll dive into some of these new storage technologies like MBV over Fabric. And then our DMA over Converged Ethernet. Then we'll talk about how you can put it all together with Kubernetes with container storage interface or CSI. Then we'll talk about some stateful type of workloads very minimal and that you can do with serverless or the serverless paradigm. Then we'll show you a brief demo. And finally, we'll give you some takeaways and what it looks like in the future. So what's in the books for these technologies, for serverless technologies and for storage technologies? What can you go back and see how they came about? So if you look at serverless in 2008, Google App Engine was released and that's when you basically took an application and just run a script and automatically pushed it over to the Google infrastructure. And Google basically takes care of everything including scaling and the endpoints and everything you need for that application to run. And then basically the developer forgets about how that runs underneath. Then in 2014, AWS released AWS Lambda and that's when they took it a little bit further where they said, okay, now you can run these specific functions but you don't have to care how they run in the back. Then in 2015, Jaws was released JavaScript AWS and later renamed to serverless framework. And this was a way to say that you can run a full blown application using all these different functions. You could combine with different services in AWS like AWS API Gateway and Amazon S3. That coordination. So this framework allows you to do that. Then that same year Kubernetes was released 1.0 and that's when a lot of people started talking about deploying all these different workloads in containers and orchestrating and serving them behind low balancers and a full blown set of infrastructure using containers. Then in 2016, GCP released its own version about functions, how you can run functions just like AWS Lambda. And then it followed on with some of the other cloud providers like Microsoft Azure. Then in 2018, Knative and other open source projects were released as a way for you to run serverless on top of Kubernetes. Then what can we say about storage technologies? A long time ago, SCSI was released as an interface to interact with storage in a faster way. Then much later in 2011, NBME 1.0 was released as a way to talk to all these different solid state drives. So faster storage emerged and then it was a way to kind of address some of the interface between your computer or your host and all these different SSD storage devices. Then in 2016, ROCE, which is RDMA overconverged internet, V2 was released and that's a way to use this internet as a medium to access memory directly, like RDMA, direct memory. And that same year, NBME over fabric, the standard was released and that's a way to connect to storage devices using NBME over different media like ethernet or fiber channel or some other different fast type of media. Then in 2016, that same year, the Linux kernel added support for NBME over fabric. And then last year, just the Linux kernel 5.0 added support for NBME over TCP, which is the most widely, TCP is like the most widely standard in networking. So it could work with multiple net cards or multiple hardware interfaces. So what can we say about some of these stateful applications? So let's add a little bit of context behind the stateful applications. So we have different kinds. So we have the basic relational databases. You have the MySQL of the world or the Postgres of the world and then MariahDB. So you have a master slave configuration. Then you have the cluster databases that allow you to have multiple nodes. And then if you want to expand your storage, you would just add additional nodes. You would add just like, you know, in example, you have a 10 node database or cluster and if you want to add more capacity, you would just add like five more nodes. And then the database automatically balances out all the data. So this is kind of like what this cluster databases would do or do and then some examples are like CassandraDB and Silla. But then you also have some of the asset type of databases like CockroachDB and FoundationDB that allows you to have that consistency across different regions. So you could have a region in the US East or US West and then these databases allow you to have that consistency. And finally, there's the broker type of databases that allows you to subscribe and publish or clients to publish and send us streaming data. So very popular when you're processing lots of data and lots of streaming data and examples of that are Kafka and Nats. So what can we say about Kubernetes and all this stateful application support? So before 1.5, they had a Kubernetes resource called Petset. And that was a pretty bare bone resource for you to just kind of bring up maybe these containers in a sort of coordinated way. But then after 1.5 stateful sets as a resource was released and that's when people were able to just coordinate say master and slave or when you wanted to to bring up that specific container. So you could actually do that unique type of workload coordination that you need for stateful applications. But some of those challenges still remain. I mean, you need to still need to do that coordination between the master and slave. You need to take care of that replication across different nodes. You also need to make sure that you're not corrupting your data. So say like when your container goes down and you need to come back up on a different node or Kubernetes node, then it doesn't have to do a crash recovery or it knows about that container is doing a crash recovery. So you notice all those different things. And also testing for stateful applications is quite a challenge. There's this nice tool called Jepsen. It's an open source tool that allows you to do testing for stateful applications, but yet still pretty challenging to do that testing. So now, Amarji, we'll talk about some of the storage technologies. So let's talk about some of the storage technologies. To put it in perspective, as Riko talked about storage interface history, when SCSI was formed, when SCSI came into existence and started to get popular, our storage needs or that time storage needs, if not in kilobytes or were at least in megabytes. Today's storage needs are in terabytes or petabytes. That's the difference. Also at that time, storage was, or the drives were on servers, especially large servers. Today, we are talking about containers which come up in seconds or microseconds and then need storage. So that's the difference. But in all those years, there were not much advancements on the interface, not much changes. One big change came was how storage or how data is stored, the underlying media, which originally was basically spinning magnetic disks. Today, the most popular media is flash memory. So in the modern drives, magnetic disk media is being replaced by flash memory solid state. So with those changes in the demand, in the market direction, from media of spinning disks to media of flash, there was a need to come up with a new interface and that's what is NVMe. So NVMe is a new interface which came into existence almost at the same time or maybe a few years earlier than the containers. So containers, Kubernetes and NVMe, they from the historical perspective, they came into existing at the same time. So what NVMe does for you is basically it offers tremendous speed, enormous speed to send data at a very fast rate. To compare it to SCSI, SCSI offers sending IO commands or storage IO commands in one queue and that you can have at any time up to 64 storage commands. This was good at the time SCSI was introduced, but today with the change in demand, we need much more than that. And hence NVMe came up with the new standard, with new interface and also new capabilities and that capability is NVMe can handle 64,000 commands, storage IO commands versus only 64 in SCSI, not only that, NVMe can also offer such 64K queues. So you can have 64K commands in 64K queues. So that's how fast NVMe can be. Most of the modern applications, they are written with the perspective of older storage interfaces, such as SCSI. So they still don't demand that much speed and performance from the new storage. Hence, if we have to think that, okay, my storage disk is much faster, is my application ready to take advantage of it? Probably not all, but not few modern applications, but not all of them. On top of that, if I have many NVMe interface drives in one, let's say 2U server, which can host up to 24 drives, can that server and application on that server handle or can consume or throttle all the fast drives know? So that's where basically the need to have a storage which basically can be centralized in one place and can handle that workload came. So that is basically is NVMe over fabric. So NVMe over fabric is that you disaggregate storage on one node and then all the client nodes on the other side, multiple client nodes can basically use that as their storage. It offers same type of performance that a local storage would offer because NVMe protocol is designed for that. It's fast and the only bottleneck would be underlying network media, but which is not. Today we are talking about 100 gig or 200 gig network speeds. When we say that, okay, NVMe over fabric can take storage from one node, central node to multiple client nodes, there are different medias. Ethernet is one of them, there could be fiber channel. So for this session, we'll focus our discussion around Ethernet. So Ethernet basically is very standard in most of the data centers is widely used. We can leverage two technologies directly to send NVMe or storage commands over Ethernet. One is RDMA, the other one is a standard TCP. So RDMA is very fast, very efficient. It offers direct connect between the memory, between two servers, but it requires specialized hardware. On the other hand, TCP is a standard networking protocol used for everything that we communicate among servers and nodes. So if we send NVMe commands over TCP, then we don't need a specialized interface. However, this requires more CPU power and hence the performance is lower as compared to RDMA over a converged Ethernet, but it's a design choice, which you as a end user infrastructure admin will have to make. Now let's talk about how the overall stack fits into operating system, especially Linux. As I said earlier, there are various ways to send NVMe commands over different type of medias, fiber channel, Ethernet. So at the very bottom, those medias, they have technologies to connect with each other. I think this is outside the scope of this talk, but when the network connectivity is established between client and the storage node, target node, the NVMe commands, which are on one side bundled and the network frames from target node and then being sent to the client node, they get unbundled at the kernel level or the module. And in fact, even there are a couple of options. You can have kernel based NVMe over fabric, also a user space based, that's also kind of a little bit outside the scope. But however, this stack shows that once NVMe over fabric, connectivity is established and NVMe commands are unbundled on the client side, the volumes or the drives are available as usual raw or block volumes to the client. Hence, the drive or the volume which client sees, client sees is no different than a local drive. This diagram also shows how two nodes, one client, which doesn't have a local storage for the data store. However, it may have operating systems disk. And then on the other side, there is a target or a storage node which offers storage to multiple client nodes for the simplicity. I'm just showing one drive here. So the underlying hardware NVDrive basically is used by a software that you need a specialized software on the target node. There are many in the market. And then that those software, they take hardware on the one side, virtualize that, basically offers you multiple volumes and other advantages. And then send those virtualized storage volumes over a fast ethernet with the choice of technology that you want to use whether RDMA or TCP. And on the initiator side or on the client side, those drives or volumes are available as raw or block volumes which kernel can start utilizing. So in the world of Kubernetes, how does this fit into Kubernetes infrastructure? So Kubernetes basically entities such as pods, they require persistent volumes. Persist parts need to basically access persistent storage via persistent volume claims. Persistent volume claims are further bound to persistent volumes which are like physical volumes. So physical volumes, persistent volumes, they can come from various storage technologies. It can be from local storage. It can be from a network storage to make this job easier. The job of attaching persistent to storage to pods, wherever it comes from CSI driver is the one which helps here. CSI driver has a controller which interfaces with the storage, central storage or the local storage, depending upon what the CSI driver is written for. It takes a provision storage, they're attaches, detaches to the pods. When pods move from one worker node to other node, CSI driver also takes care of moving that storage. So now Ricardo will talk about serverless computing. How serverless computing and that framework fits in Kubernetes world. Yeah, so what about the code that you can run for functions, right? So, and if you wanna run a stateful type of applications, this is just a simple example where you just write into the disk, hello go, hello world type of application. And then it's just as simple as just driving into a mounted drive here. So you hear your instantiating a variable with your string and then you assume that your NBME storage is mounted on MNT1 drive and DAT1. So yeah, so that's in essence what a very simple function would do. But then obviously when you're running a serverless type of paradigm, you would do this many, many times. And if you wanted to extend some of this functionality, you wanna find out when you have events and when you want to trigger these functions, there's a project called Cloud Events from the CNCF. And it's a way for you to describe these functions so that they're more portable and they have more consistency and you can share them across your organization. Yeah, in this case, it will be something to describe when you wanna start in reading something and decide what you wanna do with a specific function. So Cloud Events has bindings for many different languages. So in this example, it's the Go binding for Go lang and we have received function here. So that means when we receive an event, we can instantiate or we can write a specific something to a drive, to something to some data to an MBA storage. So you can see here at the bottom after you start the receiver, so it's kind of like listening for these events and when that event triggers, you can say you can write like a string or you can write like a larger amount of data. So in essence, this is how it would read for events. So what if you wanna add more complexity to all your different workloads of serverless type of applications and paradigm if you wanna connect all the different things? So there's also a project called Serverless Workflows Packets and it allows you to identify all these functions that you're gonna be using for that specific workflow. So for stateful applications, you can define, okay, all this read and writes on how you wanna read and write data and you can also identify what type of events would trigger some of these type of storage functions. And then you also wanna or can identify the different states where that workflow can be in, for example, it can be like in a waiting state, it could be in a running state, it could be in a blocking state, whatever the type of serverless or in stateful type of application you are actually writing or creating. So now, Emergy, we'll talk about how the orchestrator works and how you can put this together with serverless frameworks. All right, so far we have talked about storage technologies, we have talked about Kubernetes and we have talked about serverless framework. Let's put everything together. So serverless framework are basically frameworks to serve on-demand kind of needs when a function basically executes, it quickly does its jobs and terminates. So serverless functions are short-lived functions. Right now, their popular use cases are with stateless applications. Stateful applications, if they ever have to use serverless functions, which are inherently or are supposed to be short-lived, need fast quick provisioning of storage. So let's see how all it fits together. There are multiple serverless frameworks available. Many of them are still not, like depending upon our research, they don't have capability to leverage persistent storage. One such framework, which has the capability to leverage underlying persistent storage is called Fission. So for the purpose of this session and then also for a demo, that which we are going to show a little later, that for that purpose, we have selected Fission. And then the way basically Fission works is, so it has a central controller and then Fission, it runs basically everything on Kubernetes. And it uses Kubernetes native custom resources and controllers. So basically Fission, serverless framework, and then you can know more about this by visiting their website. But basically at a very high level, it has underlying environment, which is basically pre-built for type of functions that you are going to run on that environment. That pre-built environment is always ready to execute functions. And then the second part is the function. You can have one or more functions leveraging one type of environment. So in one way that Fission takes advantage of persistent volume is to store its own state, like type of environment, a compiled code of libraries, which functions are going to use so that when functions execute, it's readily available. That's one small use case, but that's not the only topic of this discussion. Our discussion is around using persistent volumes in the functions itself. So that's where Fission serverless framework is very basically helpful and it allows us to take advantage of underlying persistent volumes. There could be many use cases of having a persistent volume in serverless. So one or few use cases on the same line that we could think about was, let's say you have a large stream or a large set of data coming in and then you need to process that data either individually as it comes or you basically buffer that data until it is large enough to be processed. So it could be a data coming from sensors or any other place, or it could be a media also, like photo upload just for the simplicity sake. It could be a photo upload that uses upload and then there is a processing at the background or a video upload. So that is, I mean, there could be multiple use cases but this is one of the use case and in those use cases, you want to still keep the nature of serverless functions which is short lived. That type of nature, you want to keep that but at the same time want to keep the capability to process that data later on. So in those cases, short lived functions, they can just receive the data and then exit and then after that, you can have other ways to process the data either via other functions which gets triggered through some other triggers or you can have usual Kubernetes jobs or other applications which are already running there. But the idea here is that the underlying persistent volume media will serve as a cache or as a buffer which basically is going to be shared by the other functions or the jobs. So putting this all together, this is what our use case of, or a proposal of a use case is. As I said, this is just one of the use case. Assuming that there is a photos upload or a media upload website which also need to process media. So in that kind of application, one function which I'm showing on the left side here in this diagram, can receive media that gets triggered from HTTP request. Once the media is received, it can be stored on persistent volumes. And depending upon the processing needs, if every request or every received file or data stream need to be processed at the same time, there could be another trigger to trigger another function or an application which basically process at that time or it can process at later when there is enough to process. So let's basically show this in a demo. I'll show you our demo here, how we set this up in Fission serverless framework and how we use that demo, please. Oh, okay. So we have a Fission serverless framework which we have multiple functions. So this is what we are showing that it receives media which basically the function to receive that media is upload when a user sends media upload function gets triggered. Here in this demo, I'm just doing it manually from a curve. So we'll see that this media file now gets stored on data which was earlier, not there. And then once one or more files are ready to be processed, we can trigger another function or another job outside the serverless framework, however you want to do. So that job or function in this, in our case, it's a process function. We see that it basically processed from beautiful to more beautiful picture. Then you can process multiple pictures and that's basically one of the use case as I discussed earlier of using serverless framework with persistent storage. So now, Rico will talk about some of the other trends or happening things in this space. Yeah, so what about other type of work that's been going on with serverless and stateful applications? So a lot of it happening maybe between 2017 and last year. Obviously there might be something this year. One of the examples is Pocket. It's a storage system for serverless analytics type of applications. So it's a multi-tier storage system. The serial research project, it's available on GitHub if you want to take a look. But essentially, yeah, it's basically you have many servers in multiple tiers and it allows you to have this way to attach storage to different serverless functions. Then another paper in research going on is this paper specifically from Berkeley it's called serverless computing one step forward two steps back and it talks about how serverless is a trend of the future and a lot of developers like it because they don't have to worry about that infrastructure in the back. But yeah, there's still a lot of challenges when it comes to processing lots of data. So typically you have a serverless function doing asynchronous type of work where you pick up maybe an event and then you tell some other server or service behind a lot of servers that needs to process some amount of data. For example, reading from a queue and then sending over some large amount of data to Amazon S3. So that is a typical use case but if you want to do functions or code that actually needs to pass a lot of information inside the function. So say you wanna pass a large amount of variables inside a function for some reason maybe you're streaming data. So they mentioned that there's a lot of challenges around that aspect, right? So but yet there's a lot of promise there and lastly Aurora serverless is the service that resembles more to what we're talking about here which is pay as you go. So you're paying for all the reads and writes and there's a couple of papers available on Amazon Aurora and you know, they show a lot of the architecture that's behind Aurora but yet they don't show you how they specifically implement the serverless part. How they make it so that they're only charging for reads and writes, whether they use an NBME storage or what are they using in the back? So that actually is not a share by Amazon but yet this resembles the most to what we're talking about here. So what does it look like in terms of these workloads in the future? What can we see going forward? So we talked about the phishing framework and we talked about PVs, physical volumes and PVCs. For these to be able to be accessed by functions we had to pre provision this initially. So maybe we'll see some of this automatic provisioning from these frameworks. So that allows you to like automatically pick up this physical volume and PVC from Kubernetes and attach it to a specific node and then just run the function whenever you want to and then maybe read or write depending on the operation you're doing. So maybe we'll see more support from some of these cloud providers for their own serverless type of offerings. So maybe Amazon EBS mounting for Lambda functions. So you can read and write specifically but you don't know whether EBS will be using something like NBME or Fabric but then maybe possibly you can have fast storage. Just recently AWS EFS storage was announced and then this is a way to mount EFS storage on AWS Lambda. So that's exactly what was announced and Lambda attachment of EFS storage. But one of the downsides of EFS storage is that it's kind of slow, it's not very high performance. So it's good for kind of slow type or maybe batch type of applications but then if you want like high performance type of applications, it may not be the best fit. And then maybe we'll see more integrations with like some of these stateful applications that we talked about in the beginning like MySQLs of the world or Postgres where they specifically allow you to attach to a function or have a function do a read and write operation. Obviously you can do this with libraries now but then there's no specific kind of integration or built-in integration or friendliness between these two type of technologies. And then we talked about Fission and so PVC and PV attachment as of today we didn't see that it was available and can be even open fast some of the most popular serverless frameworks. So possibly we'll see this support in the future. So now Emergy will wrap it up and talk about some of the takeaways. All right, so let's see what we have discussed so far. Put everything together into perspective. As we started our session, this is more of an idea and then we are all discovering use cases or we are more than use cases we are all solving problems that we otherwise have by putting various pieces together various technologies together. Many technologies they are basically new they came almost at the same time and now if we put everything together they are ready to solve modern day infrastructure problems. So the serverless framework basically it offers you to run functions or the course very short lived and then you can choose the language that you like. Keep it simple and then the serverless frameworks somehow it should be or it is possible to run it outside Kubernetes but Kubernetes offers objects, customer resources and other things that it's hard to think that serverless framework can be run outside Kubernetes. So it's better to leverage what Kubernetes offers. So fast storage which offers desegregation of storage or centralization of storage so that it retains flexibility. Those technologies NVMe and NVMe over fabric and fast ethernet or the media transport technologies they all are basically here to help and put everything together. Again, this is a beginning of putting all this together there will be many trial and errors going forward and remember serverless is a pay as you go it's an on-demand framework very short lived functions. So I hope that all of us together we will come with more use cases in before we have a next session. So now we will basically kind of show you all the references that we use for this talk or when you can go for more details this presentation will be available online you can refer to these links later on and if you have to reach us these are our Twitter handles we are basically excited would be eager to talk to any and all of you so we'll open up for questions please this is a virtual event so please type your questions and we are ready to answer all of them here. All right, so we have some questions we have a question from Frank it says are there any projects focused on high availability for NVMe over fabric? So I think I will hand this off to Amarjit because he's most familiar with NVMe over fabric I'm not aware of any projects doing that specifically. So maybe Amarjit are you on? Yeah, yeah, thanks Rico. So that's a good question. I'm not aware of any specific project that is being worked on to do redundant your limitedly maybe high availability of NVMe over fabric but I would like to comment here on this question in couple of ways. One is NVMe over fabric it brings volumes from one centralized disaggregated storage to many clients those clients or initiators or hosts they run applications those applications could be one of the modern applications like MongoDB, Cassandra and all that. So such applications they offer high availability right from application all they need is fast storage but they can and they can offer high availability out of them. So the second way of looking at it is if application doesn't offer that natively many Linux projects such as MD or maybe there are fewer. When you bring multiple volume from multiple storage nodes into one client or initiator native Linux technologies or some of the preexisting technologies as I said such as MD they can be used to bring a high availability into infrastructure. That's what I will basically I think that's what I see in the market or in the community. I don't see any more questions I only see two. Frank I hope that answered your question. So if you have any more questions feel free to type them on the chat. All right so I think we have a couple more minutes so we'll remain here if somebody has any questions. Oh I have another question here so from Alejandro he says what's the name of the serverless framework that was used in the demo? Yeah the serverless framework that we use in the demo is Fission, the website is fission.io you can look at it and then all the documentation the way to set up is available. Let's see if there's any other questions. Support, yeah so what I was gonna say that that's the only framework that we found that supports Kubernetes, PBCs and PVs right now. So we expect some of the other frameworks to support that in the future like K-Native and OpenFast but we didn't see that support there yet. Okay we have 10 seconds I think thank you very much. Feel free to reach out to us, we're available on Twitter so we're happy to talk.