 Okay, so hello everyone. We're going to break the barriers of technology today and actually do an open stock presentation over Skype So my name is Iran. I'm from IBM Research and with us over Skype is Alberto Messina Alberto Yeah, hello everyone. I Have to apologize because I would have very much like to be with you there in Paris But unfortunately my son decided to pass me a virus and so this happens with small children I'll do this via Skype Okay, so we're going to talk about Docker Midswift the broadcaster's experience What we're going to talk actually about is Alberto to the magic. Yeah Yes So what we're actually going to talk about is how to move the media workflow Near the storage or near or near where the data is so we're gonna first Start by saying what that means Then Alberto would say would would elaborate on why we want to do that and give a demo and then I'll go and Try to look under the hood of what actually happened there. So moving the data near the storage basically means Moving I'm sorry moving the workflow of the media the media workflow Near the data actually means co-locating compute inside storage And to do that we're going to use what we call a Docker powered Swift cluster So an important thing to say here is that when you are going to run untrusted code from External user inside your storage system. You want it to be sandbox, right? And by Docker powered Swift cluster. We're actually using Linux containers to do the sandboxing And as I'll show next we also of course makes use of the Docker functionality to distribute and to Actually manipulate images, etc So Alberto if you can move us to the next slide, please So just few terminology items before I hand it over to Alberto who would talk. Why do you want to do that? etc, so Here are some Terms we're gonna use a store it so a store is essentially the computation That runs inside the Swift cluster Think about a piece of code that does filtering Transforming or whatever and this toilet as I've mentioned is going to be executed inside a Docker container as we want sandboxing Stoll it engine stall it engine is the software that invokes a storage Which runs inside a Docker image and it basically takes care of connecting all the input and outputs to the compute Then in the demo, we will also mention metadata search with this which is a Swift extension That basically provides the ability to look for objects according to their Swift metadata So with that said I'm transforming over to Alberto who will talk about this from the media side Yeah, thank you very much Iran so yeah, what does all this mean in the current Media domain, so we are actually facing a couple of challenges recently Maybe not all of you know that right is the Italian public broadcaster, but so These these are some of the things that we are currently investigating not only as our research, but also From the say from the industrial perspective So first of all what we are seeing more and more is the increase of content quality this means not only a resolution so not only Switching from standard definition to high definition to ultra high definition Also frame rate and dynamic range Which are also parameters that increase the quality quite a bit if you have experienced those The other thing that we are Facing is the increase of distribution and publication channels They for example digital terrestrial satellite, which are say okay commonplace but internet and mobile are more and more becoming of The same importance for media companies as the traditional broadcasting perspective Together with this there is another key element, which is the restoration and digitization of archives And this means millions of hours really mediums so this this of course introduces a lot of Problems related to the how to store how to manipulate how to process all these digital media In this context management of media is becoming of course critical for at least three aspects first one is quality assurance and enforcement so being able also to Enforce that a certain quality level is met Then second content accessibility, so we are talking about millions of hours So we have to be able to access this content rapidly and Last but not least definitely is the storage and processing capability that has to grow at the same pace So let me give an example of this which is rather simple But at the same time very important loudness what is loudness first of all is the perceived oral energy of an audio signal It is different from several other ways to measure Audio energy like for example RMS Because takes into account psychopsysiological aspects so on the right side of this slide you see the the ISO 226 curves that Say illustrate how the human hearing system is sensible to different frequencies with different With different magnitude. So this means that loudness of an audio signal has to be measured taking into account this So this is a very important quality parameter for broadcasting because of course, you don't want your audience to be shocked by sudden change of loudness level in both directions So around this topic a lot of work has been done from the normative perspective And so a lot of specification exists in in this domain one of the most prominent Standardization body in these cases ITU but together with ITU also the EBU which is the European Broadcasting Union also provided quite a list of technical provisions to make industry and users aware how to measure and how to assess whether audio signals are in the admitted range for a Certain level of quality So loudness is a very important thing. But what is the problem then? I mean, it is a very simple to understand is just measuring some energy of an audio signals in some In some way the program is that Programs are not normalized to the admitted level of knowledge. So you can really get whatever level This this is because of several reasons for the fact simply for the fact that programs and media are produced by different parties So these apply Different criteria. So these loudness measurement and normalization is not yet. Let's say Accepted worldwide So this may cause very serious quality problems at the client side because if you are if you are someone who's hearing to a television program and Suddenly these changes to Advertisement spot and then if the loudness is not normalized and aligned You yourself would have experienced some some let's say annoying effect And this is something that we don't want. We want to have let's say a Normalize a uniform loudness measurement and loudness level for our broadcasting channels This of course also implies Issues in the media production because you have to check the conformance of these files when they are ingested when you for example purchase new content or when you digitize your archive and You have also to check conformance after editing because if you edit content with a for example with an editing Machine, you don't have 100% assurance that the loudness is treated appropriately. So you have to check also That loudness is okay after Editing so To do this as it is illustrated briefly on the right side of this slide. There is a specific Meter that is called LUFS a loudness unit full scale, which is the The unit with which we measure loudness and we have to uniform the loudness the integrated loudness value to this to this To this parameter to be sure that the quality is okay. So then I say which is the the scenario the crowd the typical Technical scenario in which we are is a lot of file staging. So file staging is basically having different sources of of of material of digitization trading and purchase or international feeds and Everything goes into a file stage in area where files are are put for Processing for being processed and this includes quality checks metadata extraction and trans coding So in the demo that I'm going to show there will be three steps So the first will be about filing just the second file check and the third one loudness Renormalization so this will be around these these use case and we are developing this this demo we have used The systems core functionalities as Iran has introduced a few minutes ago so the solids and the ability to find objects based on metadata so let me switch to the To the to the console what we have seen here is the console of the system. So I I log in with with my colleague username and password and What I see as a first screen is a list of what we call projects So a project for that for us is the unit of production So a project could be an editing project or trans coding project if you want to transcode the media or simple ingestion project behind each of these projects a Count a container in the swift Backend exists. So we use containers the swift containers to implement the projects. So let's let's open one of these What happens is that for example, there are already some files here You see that the extension of this file is MXF. So MXF is the material exchange format Which is the professional format for media exchange in this in this In this dashboard, I can do everything with these files including for example Searching by metadata deleting stuff and so on. Let's let's for example Delete a couple of a couple of items. I do this because I want to reinsert the same items to the demo Okay, so what I I can do is for example, I'll pull the new new files Let's for example use these five seconds MXF file Which is somewhere on the network and I want to upload this file because for example someone has As both in this this this program in my company And so I have to check if this program is is correct from the loudness point of view. So I Upload these this file. Let's hope that this is working. Okay. Yes after a few seconds This is on so we are talking here of a data rate of 50 megabits per second. So the master quality for production is 50 megabits per second So something which is quite unusual in the let's say normal internet distribution of content As you have seen the upload of these files be quite quick So this means that the system is as a very good network. For example, so let's refresh the Situation. Okay, here it is. So if I go here What happens is that while I'm putting this file on the in the container already a first couple of storelets Have run the first one is a store that which extracted the technical metadata of the file So something like expert ratio the number of audio channels The depth of the components a lot of other parameters like height widths and so on so these parameters are being extracted by a storelet at the same time that the content of Was being uploaded in the container. So there was a storelet which was reading the stream the incoming stream Calculating the metadata and finally writing the file and the metadata in the container As you see here, we have a value of loudness that is Integrated loudness which value is minus 20. So the normative value should be minus 23 So this means that this file has an extra 3db of loudness which can be they say critical Because if I if I edit this file together with other content, then I may I may feel a mismatch I may feel a step which could be annoying in the in the loudness is the lowest Level of this file. So I have to normalize this in this in some way So imagine that you don't have just one file, but we have maybe 100 200 files So the first thing to do is being able to search all the files that have that problem So I search for integrated loudness Float greater than for example minus 23 Okay, so what happens? Okay, I find I find two objects in this case I used the metadata base search of the swift Backend and so I found these two files This one who has been uploaded Before and and this one which has been just uploaded. So I open this So I look again at the metadata just to check. Okay, it is minus 20 the same as before So what I do? I can do an action here to normalize the loudness of this file I launched this and what in the back end what is happening another store that which is the Loudness normalization stored as downloaded the object from the back end swift calculated and renormalized the audio level and saved a new file which with With the right longest parameter. So this should be If this has worked, I should be a scene in the metadata list that now the integrated loudness is Minus 23 so actually the file has been normalized. I Had also a nice way to To show this maybe maybe Not all the audience will notice the difference, but if I have if I play the file before normalization So we can't hear that but if there any indicator we should be looking at Yeah, this should be the indicator should be the These bars here you see these bars Yep basically this Okay, so if I open the normalize one these bars should be normally lower Okay, normally maybe This is not really This is not really an exceptional level because it's just 3db But but there are cases in which this difference is much higher and we can find files for example with a minus 10 Loudness level, which is really really loud. So we have really to To adjust this otherwise we have for us. So this was basically the demo So I can switch back to the presentation give the for back to Iran to explore the details of what I've been Demonstrating. Okay. Thank you a better. I'll I'll I'll need your help clicking So let's move to the next I think oh, okay, right so I'll first try to explain what is a docker powered swift cluster. So I'll need a click here So a docker powered swift cluster is first of all a swift cluster. We can see the proxy layer We can see the storage there then it is augmented with click With docker and also we have a docker registry private registry. I'll show next how do we use that and then another click This is like a now a year of work is is appearing in a click now We have two pieces of middleware, which is basically the storage engine We have a middleware on the proxy layer and then a different middleware on the object servers You can you can tell the difference by the colors Another click Okay so so once again a Docker powered swift cluster is a swift cluster with docker and our pieces of middleware So far we've been talking about running stolets inside the swift cluster Namely, we're bringing code from the outside to be running inside docker containers running on the store storage system However, a very basic functionality here would be to allow the user also control The underlying image where the store it runs so in our case for example, there has been a Usage of ffmpeg to actually code those toilets So what I'm going to talk about now is how does the flow of the user? Updating the image where his toilets are going to run is working. So Imagine that we have this setup we've got Docker deployed with With images that currently consist only of Ubuntu 14.04 and the stolet stuff We don't have the ffmpeg yet there. These are the small squares that are Drown here So this is the basic image that we provide and we have an image pair swift account, okay so now we want to the Swift account manager, which is basically the customer of the provider wants to To add ffmpeg to that. So first thing he does he does had docker pull out of the registry He gets the default image, which is basically Ubuntu 14.04 and our stolet stuff The stuff that needs to run in the docker site in the docker container site and then using a standard Docker tools docker file for those of you who are familiar with that. He adds ffmpeg on top of it then a click would actually demonstrate a Docker push to the repository and Then the swift stolet manager, which on the provider side needs to deploy that to the swift nodes So what's nice about docker here is that if we already have the base images deployed We only need to deploy the layer of ffmpeg. This is one of the nice feature we get from from docker So once the ffmpeg is there we can actually go and run stolets that uses it inside it So let's move on to the next slide and talk a little bit about How to write a stolet and how to deploy it so writing a stolet is extremely easy you basically need to Inherit from an interface. We currently support Java based toilets, but it's fairly easy to add Any other language binding see Python or Fortran might be problematic Once if we if we have a quick look at that that API we see that we've got in streams This is where the data would stream into the stolets. We have out stream This is the streams where the stolets could Push back or send back the computation results. We have parameters, of course, and we have a logger So once we've got this written The user can the developer can pack it into a jar. Let's have a click there and And basically upload it as Regular swift object into a designated container. Here's the put Here's how the put looks like so we do an HTTP put To a container named stall it under the account 111 in this case and this is the name of the the store it mxf technical and the extraction Extractor store it this is the store that was used in the demo during the ingest mode Where during the upload it extracted the mxf functional features and place them as a swift metadata Okay, so we and from there When the store it will be asked to be executed on a certain node or middle over fetch it from there and will Execute it inside the docker container. We'll see that in a minute. So let's move on to the next part So this is running and deploying a store. So now I'm going to talk about how to invoke stallets So there are three ways. I'm talking. This is the first way that I'm going to talk about is during put again in the demo What we've seen is that During the put of the mxf files. We've extracted those metadata features and put them as swift and swift metadata the general case would be That we don't that we want to upload some data But this is not the data that we want to actually save inside the storage system But rather we want to save a transformation of it. Okay, this would be the general case for invoking a stall at input or to enrich It's metadata, right? Think of encryption compression Transcoding whatever that you want to do before you save the you save the data as an object in swift so Telling the system that you want to store the store to run There is extremely simple simply add this header to the put request Saying extra on stall it and the name of the stall it as it was uploaded in the previous slide so Let's do if I Albert would click would help now Okay, so now we're zooming into into the proxy server, right? So we've done a put the put hit the proxy server and now we're inside the proxy server on the left-hand side We see our middleware on the right-hand side. We see the docker with the image Per the account that the request was made to so first of all we intercept the request then a click Where What basically happens now with you that we establish a connection with the actual docker container for that account One thing to say here is that on the first When we first For the when we run the stall it for the first time on that node We're gonna invoke or spawn a demon inside that docker Container that will wait for additional requests. So we don't need to Spawn new container or spawn new processes where new requests are coming. We're doing it only for the first time Alright, so we've invoked the demon. We've made a connection. We now pass the input and output of these So the point to make here is that the docker container is completely isolated No block devices no network devices nothing all the data is being streamed in and out by those file descriptor that we pass Here from our middleware to the container. Let's click on So the stall it does whatever he does in our case It looked at the mxf file Extracted those metadata stuff and then he sends back the metadata and the object data stream Which is basically exactly what was uploaded if we're thinking about the demo, right? We just extracted additional metadata and then We just complete the proxy put flow with no other changes So this data is gonna be replicated and all the good all the things that had needs to happen on a put path in Swift Right, so let's move to the other use case or the other way to invoke stall it's so In this case we've all the stall it upon get So in the in the demo this was where we actually Calculated the loudness feature So we've done it upon get whenever we got the we asked to do get plus this X run stall it loudest calculator To stall it what actually we got back was this loudness value, but in the general case this would be when you want to You're not interested in the actual data be saved on the on the Swift Storage system, but rather you want the transformation over it. So again Examples might be again decompression compression de-encryption whatever An example that we've been playing a lot is anonymization Think about a use case where you are where you have Medical records inside those objects and you want to de-identify them before they're being uploaded for a research or something So this is again something that can be done with a stall it upon get it's like a classical case Right, so let's see what happens In the get scenario so the request hits the proxy server The proxy server as it does with any get request is looking up a storage server Where where there is a replica of the object once you find such a? Such a server the request is being routed to that server again nothing new here so far Let's click and now we're going to zoom in into the storage nodes right, so once again the Object server middleware on the left-hand side the docker container on the right-hand side intercept the request For those of you who have more deep familiarity with swift we basically intercept the request After the object server application ran, so basically we're getting the content of the object We intercept the request when it's on its way back, so we already have the data there We over already have a stream that with the data of the object. That's like side comment So once again similar to the put we invoke the demon if needed We passed the the fd's The stolid sends back the object metadata and data in this case by the way all we passed back was the loudness Number and we continue with the get flow as if it was just a regular get Okay, so let's move on to the third way to invoke stolids which is on a post So in the get scenario and in the put scenario we were actually acting on the object that appeared in the uri Right in the put scenario we went and created an object according to what was there in the uri in the get scenario We were looking at the object that was in the uri Here we can actually invoke We can actually act on on one object or more that are already inside the system We can run a stoll it over them and that stoll it actually can can create other objects as a result of that so a typical body would look like execute the loudness to realization stoll it over the 15 seconds I mixf which resides in the demo project container and place the output inside the demo project container and In fact we can in that body we can list more than one object In which case there will be a stoll it instance running pair objects that is listed there And we can even do this for different stoll it's Right, so what would happen here is that the proxy? One one one on back. Thank you so The proxy would gather requests would parse it would understand how many invocation of stoll it it needs to it needs to be done and In this case there are three invocations. It's not it's like an Optionally there can be more than one as I've mentioned and it forwards three requests to three different object servers Okay, so don't don't get confused. There is only one One one object that I've mentioned in the body here, but the errors shows as if I've actually placed there three Okay so Next slide please So I guess a lot of people in the audience are asking okay. What about zero VM? So I'm sure many of you are aware of the OVM Which is by the way a very cool project which I really like and So what's the difference so I'll talk about two aspects here one aspect of course is the underlying sandboxing technology, so The zero VM guys are using the Google knuckle based zero VM Which basically means that anything that you want to run inside that sandbox need to be compiled in a specific tool chain that Probably gives you more security On the other hand using Docker containers there is much more flexibility in the ability to use off-the-shelf Software stack you can put it on the image of the Docker container and then write something in Java or in Python Which I think makes life a lot more easier for developers so I'll be out of my league if I'll try to compare like the security features of features of both technologies So I'll wrap I'll end here my first comment regarding zero VM The next comment I have about zero VM is that they've they've really I think they've done it a really nice Engineering work in actually turning Swift into a compute platform. They can actually They can actually make requests that are being translated to a whole workflow between Install it between in their case it's zero VMs that can communicate and sort of doing a pipeline They can do map reduce jobs with that. It's really nice And although this is something that we can do I mean there is no like theoretical or any reason that I can think of why we can't do that here We do believe that that we probably need a slightly different model I'm not completely sure that we want such a flow to actually run inside a Swift cluster This this also changes a lot in the runtime behavior of Swift Studying lots of threads. They need to communicate etc. etc It might be that this orchestration actually needs to be done outside of Swift in a way that also by that allows To use maybe other compute resources or being able to run Analytic work, which is which would take data not only from Swift, but perhaps also from I don't know how do traditional databases whatever So this is what I had to what I had to say about store it's in zero VM Perhaps we'll jump to the Next slide about future plans So we want to make this Available for experimentation If anyone is interested or want more more information is welcome to contact either Alberto or myself These are our emails Surely we want to increase the number of toilets introducing more processing step to produce metadata and build an advanced search Application on them continue to enrich the metadata prototype with additional storage base workflows. So Alberto believed that we do need workflows there. So I might be wrong here Okay, so before before I'll jump to Take questions. I want to thank the technical staff here was very helpful with setting up the sky And then setting this all up. So thank you very much for the technical team and I'll be very happy if you have questions Right. So the matter the question was how we did the metadata search implementation. So It's it's it's two pieces of middleware one of them would Would take the swift metadata as it flows in and and send it to To what's what's what's the name of the database? I'd remember to some kind of database and then there is another middleware that would intercept search requests and Go to that database and fetch the answer. Okay. There was yep John correct, I have I Know I have no I have it I have it per account We're not sure that this is the right decision But on the other hand, we have no like real life experience. So we're starting with that So we're gonna have a docker container running for each account and each account is gonna have their Unix pipe socket through which we can do all the magic That's is correct. And and the nice thing about Docker is that If you as a provider are going to limit the number of the distros you're gonna support then you're gonna use you're gonna do You're gonna reuse the underlying layers of the docker images, right? So so if you were concerned about the Storage, so so they're gonna use there. So this is kind of on the upside More questions. Yes Yes, actually, it's an array of input stream and output stream, but I kind of simplified it here So what do you mean by you? Oh, you mean by random random access? Right, so in some okay, so in some cases you can do that in other cases you cannot do that With the current prototype you can do this at all But but potentially with the input stream of the object You can probably do that because under the hood actually there is a file descriptor file there Right the file that actually holds within Swift the object, which is Run can be random access But of course the output cannot be because this is going to be streamed into the response That's gonna go back to the user on the socket right, so So I should have said that this is of course tailored for stream processing, right? if you want to Merge sort the file that's that's not the right tool to do that, right so In fact, so this is kind of an open question between ourselves We haven't blocked the ability to create local files Right, but then for me it seems to defeat the purpose I mean if you're gonna write a local file and then stream it back there So you basically done nothing you can at the same I mean it would probably take the same latency to just download the file and do this on the client side You kind of missed the whole purpose Anyway, this is the way I see it. Okay, so probably need to give this more thought. Yes Yes, and Unfortunately to be honest you probably won't get a very fast answer saying okay. Go ahead tomorrow You can start playing with it. It would take us some time. We're having some internal difficulties We don't want to get into that now, so The open source question, of course. Yeah, I'm ready for that one so So let us separate let us first separate between the question of open source and the question of making this available for experimentation It's not the same thing, right? Okay, so the open source question again a very delicate issue It's being in internal Internal conversation within IBM. I cannot unfortunately. I cannot say anything and I don't have any any good news at this point And I cannot elaborate of course, and this is a time where I switch to my own personal view Or I say that personally I would love to open source it I would love to to build a community around it I would love to collaborate with zero VM on on converged APIs, but again, this is out of my hands Hopefully before that will have a positive answer, but that again. I really do not know Yes Can you talk about erasure coded on how where the compute would happen? right, so In fact Yeah, we've discussed this just an hour ago with the erasure code team here, so I think that okay, so I Think that there is more than one answer here So the the short answer would be okay We're gonna switch to do all the processing on the proxy where the where the Where the erasure coding code has reconstructed for us the file and we pay for an extra hop here On the internal network. That's the easy answer. I guess the more complex answer is that if you're using a systematic code and You somehow know where the data really resides you can and you can distribute your solids to probably work Independently on each chunk then you can probably do something another observation is Maybe you can start with the story nevermind, okay, let's These are the two answers I have right now. I actually I brought this question in in the zero VM design summit last time and The answer was let's not worry about it now But then it's gonna be real soon. So perhaps we should start worry about it Yeah, right. Yeah. Yeah, so so the same answer goes here, right? So we can either move it to the proxy code or if you're smart enough to know How your chunks looks like you can get from the assembly object where they are and actually Go and run the toilets on each of the chunks and then probably combine the answers I'm afraid that The little I've said about the toilets. I can't even say about that. So so the the So I wasn't part of the team that actually developed this this and so I really don't want to Say anything. Okay, but then Right, so so Albert did you get the question? Well, I heard the question, but I'm I have my voice turning back. Maybe it's Mike Okay, so I heard the terms of the question, but I'm not really sure to have understood that So I guess I'll try to to refrain that or okay So I guess the question is why go and complicate Swift adding all these functionality where you can probably Run this Alongside of Swift Perhaps in inside data center or even on the client side or you were referring to inside open stack Before the okay, so to act basically why not have a layer on top of Swift that does that which yeah God that's got that so there is a I think there is a good reason for doing this because metadata enrichment Which is a category of processing steps loudness normalization is one, but there are many others Is not something that starts and ends They say when you have the file and then you forget it is something that really Come seen again and again during that content life cycle So actually having say stored These bulky files in an object store like Swift and from time to time on demand Extract new kinds of metadata is something which is really Day-by-day in media production. So actually you don't have a one-shot metadata structure and forget and store but is very Lively and continuous operation of metadata structure. So it is really important to have the stuff stored there ideally forever and just make the code run and Extract new metadata as they are available as the new algorithms Rise as maybe you you can you can just re-run the same method destruction with a new tool because of as a better quality Or as a better performance. And so this is something that really happens Day-by-day, so content is not really dead once it is elaborated But it continues to be Processed access and enriches So maybe I can edit on top of that. So we had earlier a talk about Swift and and I Need help what was it? Swift and spark. Thank you So the point there was that you can basically filter a lot of the data as a spark actually, okay, I got it Okay Thank you. Okay. Are we out of time any more questions? Let me I think if there are more questions. I think we can all right So, thank you and thanks again to the technical team Alberto, thank you very much. You're welcome. Bye. Bye. Bye. Bye. Bye. Y'all