 Morning everyone. Thanks for coming. This is our subject for this morning, and we are with oh Hey, I'm paul loose from from Intel. I'm a software engineer from Intel I have not been contributing to storelets. I'm one of the Swift core developers, but I think storage are way freakin cool So I'm up here to give you a couple of slides on Swift to make sure we're all level set there That's make sure work. Hey, I'm handi Romani from IBM. I wanted to really thank you all for coming out I know it's kind of early today after the HP event yesterday, but I'm also from IBM and I've been working on storelets for maybe three or four months and doing switch for a little bit longer And like Paul said, I think they're they're a really neat thing. So I hope you guys enjoy And I'm Iran. I'm also from IBM I'm the IBM technical lead for the stuff and I really like it. So I hope you like it, too So let's get started So this is like the usual agenda will talk about the concept the motivating use cases Then Paul would give the Swift overview and then Hamdi would give the storelets overview and the store it Open stack project. Yes, we have we have it as an open stack project now And I'll end up with a vision All right, so the concept imagine you have a storage system that can hold petabytes of storage many storage and The system is being used as a service. So what would people put there? their photos There might put their IoT data archived IoT data in the format of CSV files They might put their 3d designs, this is a bit futuristic Think about the 3d printing here. I'll talk about it later and They might put their like digital media files huge files usually Now what happens if one wishes to do some computation over that data without moving it around from various reasons that I'll show next So the concept here is to collocate compute in the storage system And the reason that I'm drawing here a docker container as the compute engine is because we're not talking about any computation We're talking about computations that are uploaded by the users So we want them to be well isolated. So I could have drawn your like KVM or anything else But since we're working with docker, there's the docker logo there So why would we want that? Let's assume that the user put his photos in the storage system now We know that photos have Embedded metadata called xif metadata in jpegs and that xif metadata is quite rich You can get there all the camera settings where it was photoed and so on and so forth This means that the user would probably want to ask something like How many pictures were taking in Tokyo between those dates where the easel that was used was 400, right? Valid question. He would probably want to use his favorite analytics engine spark. Here's the problem The problem is that sparks know how to how to process semi-structured data whereas The metadata is saved is embedded in binary files, which are the jpegs, right? So there's a problem here So I call this use case data preparation where we can use compute on the storage to extract the xif metadata from the pictures before we before we do the Instead of actually Downloading the pictures to the Swift cluster. Sorry to the spark cluster So that the spark only gets the xif metadata So we save on bandwidth we save on memory in the spark in the spark cluster and of course we actually make this use case of Work Otherwise we couldn't do that so This use case was explored by was Michael factor and Gil vernik from IBM gave a talk in the Paris with tech summit about this here is link The data preparation use case I'll move to the next use case which I call it predicate push down So suppose that we had all this information about the picture taken in a CSV file rather than in the JPEGs so we have a huge table there And we want to make the same query right so let's zoom in into the CSV file It's a huge table containing lots of rows one for each picture and for each picker picture We have all the xif metadata like the location the f top the iso focal length and so on and so forth For our query. We're not really interested in all the data, right? We're interested only in the Location column the iso column and the date according to the query more over. We only want the line coming from Paris, so Sorry, that's the only I see question mark on the faces. So the Paris line is the only one that matches the the query here, okay? So why not push this filtering to the storage, right? So we don't need to download all the CSV to the spark cluster again We can just use toilets or to use the compute I didn't say stole it yet to use the compute near the storage to actually do the filtering inside the storage system So this is predicate push down We have seen significant reduction in the overall time taken to process a query I shoot I mean from from from end to end like from the spark point of view Although I should say these are very initial results but it's promising I Do want to mention one more thing about his use case so on the Tuesday talk We were talking about another kind of push down which was used by metadata search So what have we done with metadata search with metadata search? We've shown the capability to narrow down the list of objects that were interested in using metadata search We have a bunch of objects in the object store that we want to process, right? However, our query is such that only small part of them are offer our of interest So we can do use metadata search to narrow down to that small group of objects and on them do these stuff, right? So these are like complementary solutions Right, let's talk about data security And this brings me to the 3d printing era So it is said that in the 3d printing era the ability to manufacture becomes commodity So people are not longer going to pay for the ability of one to manufacture something But rather more on the design By the way the right hand side on the right hand side. There's an illustration of a concrete concrete based 3d printer They have it working. They're like nice new YouTube Showing that. Anyway So assuming people would still use object stores in this era People would probably put their 3d designs in the object store They would probably never want them, sorry That was too early. They would probably never want those design to leave the object store However, they would be very happy to sell a printed version of the design What do I mean by that? Usually when you want to print something some 3d model You need to do some lossy transformation that allows it to be printed and that lossy transformation is dependent on the actual printer You're using right so again using storelets one can Download the transformed version of The 3d design that fits his desktop or hers desktop printer But we don't really need to wait for the 3d printing era think about this one where you have like Medical records inside your storage system and you want to make it available to researchers But since this is medical records, you probably want to de-identify them before you give them to anyone else, right? So the personal information is being erased there So this use case was further explored by the forget IT EU project They have done their face blurring so in the storage system There was the actual picture and once downloaded the faces were blurred I hope you can see this in the picture. So there's a link to YouTube that chose this demo. It's quite nice I'll skip about this one really really fast. So this one we've talked about in In the open stack Perry summit. It has to do with digital Films so the RAI the radio television Italy National Broadcaster was using storelets to actually employ Algorithms once the objects was already in the stores to think about it this way. These are large files Very large files. So suppose you've already put them in the storage and after putting them You came up with a new algorithm to extract feature for them for example the loudness of the film So instead of downloading downloading the object calculating it and then uploading it again or whatever You can just do this in place with compute on storage Last use case inspired by a comment that was gave by Paul at the time So I call it the super user use case So in recent years we're seeing more and more storage systems. There are not appliances rather. They are software Commodity hardware and there's an operator there that does the installation of the software on the hardware operates the hardware and sell it as a service, right? So if he could use compute on storage he could add more value or generic functionality to the storage system Using this compute, right? It doesn't need to wait for the For the software developers to do that for him. You can just add this alone so examples are like antivirus Compression encryption and so on and so forth like generic Storage stuff that you can just add to the system Perhaps Paul would show next how easy it can be done with our swift. Oh This idea was taken further by a project called I'll stuck that's a European project that by the way finding them a trip here What they do is that they have added a policy layer on top that says something like all right Every time there is a request for a certain for from a user belonging to a certain account or a certain tenant To a compression or encrypted so it's kind of an automating layer to use those computations So with that I'm oh sure. That's most importantly. So all this stuff is now available as an open-stack project in Github How do you will talk about it? And of course any help is mostly welcome So I'll hand the mic to Paul. You hear me the clicker Excellent. All right, so that was awesome. Awesome. I'd love to see these use cases They really do a great job of identifying the motivation for the work these guys have done It's so much more effective. I think to see real use cases, especially a variety of them as opposed to saying What if and painting some hypothetical stuff? This is all really cool stuff. It's got a lot of us really excited about it So before Humdy gets into the details of starlets I wanted to make sure we were all sort of level set on some very basics of Swift I know there's a lot of Swift people in the audience But if you don't have the basics, it's hard to understand how all the stuff bolts together and really where things are making connections Okay, so a couple things about the Swift community as probably most folks know we're one of the first two projects In OpenStack, we're up to somewhere around 40,000 lines of functional code and another 80,000 lines of test code So we're somewhere generally between two to one and three to one on functional versus test code, which is pretty good We've got just a fantastic community. I see there's a lot of core folks in here We all love to come to each other's talks and support us. So so thanks for coming But John our PTL in one of our sessions yesterday gave some really cool graphs showing The increase in the number of contributors in the Swift community and how often they come back and and how long they've stuck with the program And it really tells just a fantastic story of if you're a developer. What a great community this is to work in So besides the the long list of company names here I did collect a few staff from last three summits and you can see the number of Presentations in the main conference not the design summit just was Swift in their title, right? And I didn't count it twice when they use Swift twice in the title In Paris we had 14 and then Vancouver 19 and then up over 20 here in Tokyo So we just continue to see more and more interest and more people voting to hear more about Swift So it's just a really exciting program to work on a fantastic community to work in Won't go through all the timeline below Just to be clear. These are things of sort of focus areas for these releases They're not actual features in the release for so for example, we're not releasing encryption right now But this is a big focus area for us here at this design summit lots of Discussions on the encryption work that's going on Okay, so a couple high-level bullets on Swift just so you get a picture of what it is Obviously, it's an object storage system. We use a container model very much like s3 buckets to group things together for light characteristics Easiest one to grasp of course is ACLs right securities also storage policies is a big one if you've been following Swift that's Year before last a big feature that we added Everything is through a restful interface right stateless restful interface. So very easy to use very strong API Lots of other storage systems have layers built on top so they can actually support Swift So we've sort of become a de facto standard in that area And then of course built on standard hardware and highly scalable and efficient So you don't have to go with any specific vendor lock-in to go out and buy something and build yourself a Swift cluster Pretty much you can build it with what anything you've got right stitch it together and make it work and Eventually consistent if you've been in any Swift talks you understand the value of eventual consistency and It's funny when you talk to people for the first time and they haven't heard this. They think maybe it's a bug It's not a bug. It's designed to be that way and there's a really good reasons for it Anybody wants to talk more about that afterwards. There's plenty of us here that can explain that to you So before before how many gets into storelets I wanted to do a high-level Architectural overview of sort of the main modules within Swift based on how they're tiered and this is important So you can see where storelets bolts in okay, so on the top the top box here This is the proxy tier. So if you're familiar with Swift, it's a two-tier architecture The proxy tier is where we scale for concurrency So we can scale in two directions if you need additional concurrency add more proxy services if you need additional Capacity you add more storage services. You don't have to scale them at the same time The main blocks on the top is our whiskey server our proxy application And then the one that's highlighted there, which is important for this talk is middleware, right? So extremely cool extensibility feature within Swift There's been lots of talks over the last couple of summits Christian had a great one in Paris But fantastic talk we wrote middleware from the ground up right right during the session and and showed it So really shows the power of middleware and what storelets does for middleware if you haven't already seen one of these talks It'll blow your mind. It's so friggin cool But but that middleware capability exists in the proxy tier and if you look down on the bottom It also exists in our storage tier and our capacity tier So the the high-level software architecture is the same. We've got a whiskey server down there We've got the middle work the middleware framework and then we've got multiple different proxy applications or multiple different Applications, I should say that run on the storage node to handle all the various Swift stuff And we've got more detailed slides on this stuff, too And if anybody wants to get into you know more of the guts of Swift I don't want to take them any more in the store. It's time to do this But we can certainly do that because it's it's some really cool stuff But that's the key point here is the the middleware capability is where storelets bolts in and And then just to give you a visual of You know sort of what it looks like to put an object into Swift and what it looks like to put an object out of Swift And then after you see the store that stuff you'll see how that intercepts things and does its magic When we put an object into Swift it comes in through our load balancer in our access tier where load balancer Auth services and proxy is at so that's where that little object is flying in It's going to get routed in this case. We're using Replication actually should have put the EC one in here And shameless plug. There's a talk on EC like four rows down right after this so come down and hear about the progress on that stuff But there we are Shuffling this thing into three locations within the storage node and then on the get side We actually only get it from one place. There's actually configurable how we do the get but but you can see it comes in Hits its three nodes and on its way back out It's a one-node so that's the that's the general Swift overview. I'm gonna take it I want to start by saying I feel bad We didn't include Swift in the title of this presentation because I would have been a plus one for you I think in terms of your account So I apologize Go change the slide make it 25. I'll go ahead at the title after so just to start I mean Aaron gave a really good overview and so did Paul about what sort of tar and what Swift is and I'm gonna give it kind of a more in-depth detailed explanation of exactly how storelets fit into the Swift architecture So what exactly is a store? Well, it's pretty simple storelets are compiled Java code now I want to start by saying Java code is currently what we support But it's something that we're looking to actually extend in the future to other programming language is maybe Python But I thought I would just throw that out there So Storelets reside as middleware and the really great thing about that is it essentially implies it required zero changes at all to core Swift We needed no changes. We didn't touch any of the core Swift code whatsoever I really want to point that out because it's a lot of complex Algorithms, but it all resides through middleware, which implies it's completely transparent to your Swift object storage And you can plug it in extremely easily So more specifically how do storelets work? so storelets utilize Docker and Docker may potentially reside on the same server or different server where your proxy or object server are and that's what's going to drive the compute engine So as a really kind of high-level overview, what could happen is you'll start by uploading the store lit and all the store Loaded is is code and it's going to simply be an actual object sitting in Swift directly Then you can actually inform Swift through as you'll see later a header that you want to execute the store lit on a specific object We then kind of do all the magic of taking the input stream and output stream wiring them together and then letting that Java code execute on the Swift object. So it's actually really nifty The reason why we use Docker is it really does I mean like Aaron mentioned provide a lot of good security and Multitenancy as well. So we essentially have a Docker engine per Swift account and that kind of provides the isolation and multi-tenancy that we have across the system so Paul went into a really good overview of what middleware is and what this slide is really pointing out is that's really all Storelets are they essentially Reside as middleware and the interesting thing about it is that they can actually live either on the proxy side or the object side So what that means is you can intercept data coming in at the proxy layer or even potentially at the object server layer Which is all the next slide actually shows So how can you actually upload this compute to Swift well, it's it's pretty straight forward I was trying to think of an example Aaron had a bunch of good ones but I was trying to think of a silly one on the way in today and I was thinking okay Let's say you Swift as Your code repository system, maybe that's silly it probably is and let's say you store Python code in there And let's say you you really love Pepe and a lot of folks don't so you want to pepify stuff on the way in so what would you do? So you found kind of a nifty Pepe library lying around somewhere in the ecosystem First step is you write a simple Java program. I know it's silly You're using Java to Pepe fight Python code, but I can't think of anything else and You take this library you write the Java code Hopefully you test it you compile it you get it reviewed and then the very next step is essentially you create a package a jar file for for the Java code and then you can take this dependency as kind of a separate thing and you upload both to the Swift To the Swift object storage system and there's a container you can put that in there's a default one called the store Let's container, but you can create your own and modify all that We actually have a lot of nifty tools that will actually allow you to to do all this to automate the entire process All you have to supply is a jar file and we'll upload it automatically to the Swift system another really interesting thing is that As we mentioned earlier store let's actually execute in a docker container and an image and you can actually adjust That docker container to have any dependencies that you need So in this example, it's just one simple Java package of Java library that does a Pepe code But for example say you wanted to pull in a lot of other package Your better bet is actually to modify that docker image that will use to run your store. Let code So here we just kind of go through the exact steps of the put data flow in terms of executing store And the really key thing to note is really all you really need to use is this new header X run store Let an X run store it essentially says for this object I want to execute this particular store it the store It will reside in a different container in a different account and this will just say this is the exact code the compute per se You want to run on the specific object? It's exactly like uploading any other switch object really so this kind of just goes through all the steps And really the key thing I want to point out is really all we do in the put flow is We take the input stream, which is what the user is supplying as data Why are that to the input of the actual store lit the Java code in this case? And the output stream is actually to go to the object server and it's going to end up being written to disk So just kind of a simple example if your store lit did nothing at all It just maybe printed some logs and actually didn't do any compute whatsoever The result of running that store lit will be as if you had no store it's at all It's there's absolutely going to be and no change is done to the actual object you uploaded to store storage So this kind of goes over the get data flow and the one thing you should notice is it's essentially exactly the same The only difference here is the input stream is going to obviously be the incoming object From the object server, whereas the output stream is going to go back to the client So in this slide over here, we kind of go over the more detailed store lit architecture I'm not gonna spend too much time on it But if anyone's interested feel free talk to me after and we can get into more details But the one thing I want to point out is essentially we have a docker container per swift account And that's how we do the multi-tenancy aspect and the security aspect and as you can imagine We need to have some kind of communication mechanism between the two the two being the swift system And then docker residing on a potentially different system And this is kind of the way we actually enable that so we have the store Let's middleware itself and then we have these buses and then just domain sockets that we use to be able to Send commands and messages and the data itself by the input stream the file descriptors Which is what we ended up using between the swift potentially proxy or object server and wherever your docker container resides So how do you actually write the store lit? So I just wanted to show a really simple example to show It's actually very straightforward You look at the actual interface to this function this Java function It's really simple and what you can imagine is there's an input stream and an output stream and then parameters and parameters in This case for simply the query stream that you that you supplied for example If you were doing a put or I guess they'll be actually supplied to your store lit as well So you can communicate data as well to the store lit and the actual code sorry It's a little bit messy, but the only thing I'm trying to show here is that Really all you have to do is kind of what you would do is if you were doing anything else in Swift which is the very first step is you get all the metadata and then you set that into the output stream So you're essentially flowing back the HTTP headers After that you're free to play with the input and output stream as you wish and this is where all your compute resides So really all kind of the fun and magic of store. Let's lives in this space here regardless of what you were doing So really writing store. Let's couldn't really be any more simple than that Yeah, sure No, this is it was made extensible for things we could change in the future for example We had multiple input streams, but the way it works now is simply on one object But it's a really good question because that's an idea that's came to us in the past Which is is there any way we can actually work on multiple objects as opposed to just one That's a little bit more difficult to do because as you can imagine when you're working with Swift itself generally speaking There is a specific object you're always dealing with in the system whether it be a get or a put so but very good point And if you'd like to work on that we'd love you to Okay, so for example say you actually want to start helping with this toilet project What exactly can you do? It's actually really really simple So the very first step is you can get a you bun to you bun to image and we're gonna probably expand this in the Feature to be compatible with other operating systems and in terms of yes And in terms of actually the install process really all you need is a pseudo list a password list account that you can end up using Clone the code from the github account, which is yes an open stack project now And we have a nice script S2 AIO, which it will actually set up everything for you This is Swift and the store list ecosystem So it's kind of an ansible script that'll put everything together and install store lids But I can imagine many of you already have Swift systems out there and in that case And you still want to play around with store lids It's actually still just as easy and there's essentially an ansible script you can run There's a configuration file that'll point to where your proxy resides with your object server resides and a bunch of other details like that And you can use that to essentially allow the script to configure store lids and install it on your system So this is actually my favorite slide I know we've mentioned this many times but yes Store lids is now an official open stack project for us That's actually really great because it makes development a lot easier because it gives us all the magic of open stack And by that I mean we have the Garrett review system Jenkins for testing and we actually have tests there out now They will actually execute and on any changes that you submit Better yet documentation documentation is a lot more clearer now We have an IRC channel as well and you can actually submit bugs and feature requests too But the real key thing I want to say in the slide is we're very much looking for help and any contributions or even operators So if someone wants to try it out reach it reach out to us on IRC. There's some emails you can use there as well I think that's it Aaron. Do you want to take over for the vision slides? Thank you very much. I'm good All right, so here's my vision for store lids. So clearly we're currently focused on integrating into Swift using Docker and Running store lids in Java, but there is no reason why we couldn't add more languages there I think that Scala is in particular Interesting because many spark developers spark has a nice feature called user defined functions That can run over the spark data during analysis And I think they're mostly using Scala So if we could write store lids in Scala and push them down automatically that could be like a huge ecosystem thing Next I mean why just use Docker why not, you know go 0vm kvm or Even integrate with OpenStack Nova and get whatever OpenStack Nova works with right and I should also mention Magnum And then perhaps more futuristic is to look at other storage systems such as the CF object the gateway or databases that have Blobs attached to them So this is the vision and I'll end up with this thought Here and before any I just want to really really thank Doron Han from our team in Haifa Who did the tremendous work of making this an OpenStack project all the integration with Jenkins and Garrett It's not trivial and then Gil Vernik and Yosef Muati working very hard on the push down On the spark push down of store lids to Swift And that it's thank you very much and we'll be very happy to answer us before the questions Thanks I'll go for it. Oh, yeah, so on the Intel side if anybody here's participating in the marketing thing called the Intel Passport program, please see Mahadi. Maybe you could stand up raise your hand Please see her and she'll take care of you and make sure you get stamped or whatever they're doing Thanks, and really quickly from IBM and again I'm forced to say this But we're actually hiring so if anyone's interested at all feel free contact me and I'm gonna leave it at that So thank you all so much Thanks So any questions Thanks So, thank you for a good presentation. So just My question is for the developer side, but so how it's working for the learning docker container on the copy case. So how they explained the The local case on put and the get and sometimes we want To make a new object from the Original blow data into the some temporary Method case. Yeah, so absolutely good point. So in Paris, we actually gave we had a post verb which is Which I would say is not the right verb to do that copies the right verb to do that and this is on top of my feature list We actually have use case in IBM that that needs that so I'm gonna anybody wishes to work on that He's very welcome. In other words. That was a really good question. So thought about that. Thank you Anybody else Thanks, Kota So my question is about a resource isolation So for example, if a user executes something that takes a long time what happens in case for example for time outs What about If the load on one of the storage notes gets too high because sometimes for example after rebalance You have to shuffle a lot of data to another node and that might interact with the loads that you See from the docker containers Yeah, we now know why Paul is very supportive of the future. Yeah, so So basically this is something that is not yet implemented, but we're trusting that Docker would have the tunable it has some tunables. They're not as rich as what other container Technologies gives you or what C groups gives you but at the end of the day We're gonna have to add the man those tunables as managed by the operator So at the end of the day the user can control the docker image But the operator would control the tunables of the isolation Does it answer the question or I miss something? Yes basically, what about if you want to Isolate different accounts and give different accounts a different level of resources that they can use right? So yeah, yeah, so so this falls into the same category something that we haven't done it again Help is So just just to quickly add though That's a really good point and the way it works today is you definitely need that Storlet to respond within a certain period of time otherwise you hit timeouts on the proxy or object server side So it's actually really excellent point We do clearly document that though in the code for now But it's something we're actually looking to in terms of resources another really good question actually because it's very important You have spare CPU resources So just an interesting point is if you're using something like spark spark actually ends up hammering the Swift object server with hundreds of range gets Literally hundreds even for a simple job and the key is the way store let's work I didn't actually mention it in detail But for example if you have a single store let what we'll do is we'll create a thread pool and we'll actually run these all Concurrently but the important part about that there is you need the CPU cycles to accomplish that So it's a very good point and an interesting part about that is it it's another trade-off to whether do you run? Storlitz on the proxy side on the object side and it all comes down to if you have spare capacity Okay, so you very much In the proxy pipeline. Yeah, I would say you insert the the middleware Do I because store let's run one per account do I have to The the actual middleware can it handle multiple accounts and multiple store that sir Do I have to insert it for every account? No, absolutely it can handle actually multiple accounts So you only need to insert that on your one what for example proxy pipeline and there's actually a lot of configuration You can do that will actually determine the details in terms of what permissions are for which accounts Which accounts can even run store let's as well So it can a single account use multiple store let's as well Yes, oh absolutely so a single count can upload hundreds of storelets and all of them will execute in that single container for that account So and obviously though it's multi-threaded so we have different threads for each one of these so thank you. Thank you Okay, thank you all so much. That was really great. Thank you