 Good afternoon everybody. My name is Siddhartha Mani and I'm a software engineer at Minio. I'm here today to talk about one of the most exciting features coming to Kubernetes, Object Storage Support. In the last 10 years, the amount of data in the world has grown faster than ever before. This is due to the new age of data collecting applications that we all use and have come to love. Take for instance Google Maps. Google Maps has mapped 98% of our planet. Google Maps continues to grow this data by collecting traffic information and maintaining an updated list of businesses and food activity at their locations. Another huge data collector is social media. We spend a third of our day on the internet using social media. We tweet half a million times every minute and we view half a billion Instagram stories every day. And it doesn't stop there. Content creation networks and streaming services like Netflix, Hulu and YouTube take up another eighth of our time on the internet every day. The amount of data today is truly staggering and it continues to grow at this accelerated pace. So, how did we manage to scale to such heights? If you've noticed the trend here, almost all of the data described above are either photos, videos or some kind of point-in-time event like location information. These kinds of data are also called as unstructured data. Unstructured data constitutes the majority of data today and also the majority of growth. One of the reasons we were able to scale to such heights is because of some inherent properties of unstructured data. Before going into the properties of unstructured data, let's take a moment to also look at a parallel trend that accompanied this data growth trend. Let's talk about the evolution of hardware and software in these 10 years. In the 10 years, the highest end commercial network switches went from 40 gigabits per second to 100 gigabits per second. Note that I'm talking about commercially available switches, not just the maximum possible at that time. Networking speeds have gone up 2.5 times in just 10 years. Just like improvements in networking, storage has also shown impressive growth, both in terms of size of drives and also in terms of throughput. Drive sizes have gone up from 600 gigabytes being the maximum size in 2010 to 16 terabyte drives available for use today. And then there's drive throughput. Today's drives can sequentially read data at 5 gigabytes per second. That's 40 gigabytes per second. The best data I could find about the drive speeds in 2010 were about 300 megabytes per second of read throughput. This was again from SSDs back then. Comparing networking and storage today, even though storage seems to have shown much larger growth, networking throughput speeds are still higher. Over the network, we can transmit data at 100 gigabytes per second. To the local drives, we can transmit data at a maximum of 40 gigabytes per second. Obviously, networking is faster than storage. One thing that's clear from this trend is that high-scale applications should put their data over the network rather than store them locally on the drives. This does come at a small cost, which is latency. Now, let's go back and talk about properties of unstructured data. And let's see if we can tolerate this latency. One of the great properties of unstructured data is that it is much harder for applications to edit this data. All the edits that do happen are performed on a copy of the data and then saved back as a new object. At first, this might seem like inefficient usage of storage space. However, there are far-reaching advantages to this approach. To start with, the lack of edits makes the data much more durable. It is much harder to corrupt data if we don't make changes to the middle of it. So, the first property of unstructured data is immutability. As you can see, immutability leads to higher durability. Now, applications that store data over the network also gain another property. They become stateless. The great thing about stateless applications is that it's easier to manage, it's easier to deal with when it does fail and easier for people in process around it to scale along with the application. And all that needs to be done for applications to become stateless is to use this form of storage that we described, which is called object storage. It's no surprise that object storage is the fastest growing trend in storage and has been for the last 10 years. Now, given all this, object storage should be a part of Kubernetes and supported natively out of the box. Today, Kubernetes does not support object storage out of the box. It only supports file and block storage. It supports file and block storage using a standard called CSI, and CSI stands for container storage interface. CSI became generally available in Kubernetes in the version 1.13. Since then, currently, we are on version 1.20 of Kubernetes and it's been two years now. What I'm currently working on is adding object storage support in Kubernetes. We're bringing in a new API called the bucket API. In object storage, the unit of abstraction is a bucket. A bucket is a collection of objects held behind a single endpoint. We're modeling this bucket API. It's very similar to CSI. The bucket API is going to allow any vendor to integrate their object storage service with Kubernetes. And any application workload that wishes to utilize a bucket can simply just request that bucket through standard Kubernetes objects and be able to use it. The bucket API supports four different calls. The first one is create bucket. The create bucket call expects the backend of the object storage provider to provision a new bucket for any workload that's requesting it. Once the bucket is created, the workload then needs access to the bucket. Grant access creates a set of new credentials and provides it for the workload to access the bucket. Once the application is done using a particular bucket, then the application's access to that bucket should be removed. So the third call is revoke access. The final one is once that bucket is no longer in use and will not be in use by any application in the future, when it needs to be deleted, that part is also automated by this bucket API. And that's the delete bucket call. Any vendor that satisfies these four calls can become bucket API compatible. The bucket API also introduces six new objects into Kubernetes. These objects are custom resources, namely buckets, bucket access, bucket request, bucket access request, bucket class, and bucket access class. These symmetric pairs of objects between buckets and the access for those buckets represents the two different types of bucket-related operations we undertake, which is creating, updating, and deleting buckets, or creating, updating, and deleting access to these buckets. Our proposal to add bucket API support into Kubernetes was accepted on the 20th of October 2020. So it's still very new. However, our team is very dedicated and we're moving super fast. We're planning to reach alpha status by version 1.21 of Kubernetes. That is about four to six months from now. By the end of 2021, we hope to achieve general availability. I welcome everyone from my audience here to participate in the design and development of the bucket API. You can join us on our weekly calls on Thursdays at 10 a.m. Pacific Standard Time. The Zoom link for these calls can be found in the Slack channel, sig-storage-cosy for container object storage interface in the Kubernetes Slack. All our code contributions are made into the GitHub organization, Kubernetes-sigs. And all our repositories have the prefix container-object-storage-interface. You can reach out to me or any of the bucket API workgroup members to ask questions about how to contribute or anything else related to our efforts. With this, I thank you for attending my talk and I look forward to seeing you all on the Thursday meeting. Cheers!