 Thank you very much and welcome everybody to the second day of FOSSTEM. I hope you are enjoying it. I have to be honest, I had issues finding like all the interesting talks yesterday because there are so many, which is a good sign obviously. So today we'll talk a little bit about WAPITS, which is an initiative by NVIDIA, an open source initiative for doing data science. As you probably are aware, NVIDIA is doing sort of neural network based machine learning for quite a while now. With WAPITS we are also now looking into more traditional machine learning algorithms, so they are more on our focus now. And also sort of everything that goes around, you know, not only the machine learning itself, but sort of the pre-processing, the post-processing steps. I hope to give you sort of an idea of what WAPITS is, how you can contribute, so it's open source. And in the end we will have a couple of minutes for questions, I've been told. So if you have questions or anything, you can come to me either after the talk and we do it offline or you can ask obviously. So why is this interesting? As I said, NVIDIA is investing a lot of time in machine learning and artificial intelligence. The reason being that it's been sort of shown at least by example, I mean there's no proof obviously, but by example that AI is transforming industry sort of across the board. So there are really products coming out that are actually doing things that have been unthinkable a couple of years ago. You know, if you take healthcare alone, you know, doing prediction and analyzing x-rays and MRI scans, detecting cancer early, which has been if possible then only with a lot of effort before. So as in human people looking at all the MRI scans and now sort of machines take over a bulk of the scanning and then the doctors in the loop of course for the final diagnosis. In normal industries, you know, defect detection, just material science, all those things is sort of disrupting a lot of their industries for making lighter weight cars, for example, lighter weight airplanes, finding new composites for fuels that burn bad ends on. I don't have to go through all of them. You can, I hope, imagine what the other ones would be. But of course, consumer people, automotive, self-driving cars is a very popular or very heavily researched area where now car manufacturers are actually coming out with the first announcements for cars that are not fully self-driving yet but have self-driving or close to self-driving capabilities. And then for rapids, probably more interesting is more on the right side, so retail, financial industries, ad tag and so on. Those are all industries that used machine learning already for quite a while and they tend to use the more traditional machine learning approaches, right? So clustering, DB scan, things like that or just, you know, linear regression, those types of algorithms that are not really necessarily jumping on neural networks yet and that's fine. But with rapids, we hope sort of to help those industries in addition. Now the thing is that data science, what makes it interesting is that data science is obviously not a linear process. I'm not sure who does the actual data science professionally or as a hobbyist in this room, but you probably know that it's not, you know, it's not that you sit down, you write a code, you press play. I mean that doesn't work with code anyway, but for data science you have like this big loop that you're going through all the time, right? So you have data, you sort of have to prepare it. Then you try it out, you know, your network and machine learning approach. Then you have to tune the hyper parameters and then you, you know, you try it again and then maybe you prepare your data a little differently or you get more data in or you try it without the one piece of data because you have the feeling maybe that's disrupting your prediction and so on. So you have, it's not like linear, it's like a lot of going in a circle. And now the thing is, if you look at those green boxes, which is data preparation, model training and visualization, those three model, so the training of the model and maybe later the inference of, or the predicting of outcomes is of course a big piece in the circle, but it's not everything, right? And if you reduce the size of this model training box, that's helpful, but it doesn't get you all the way because the data preparation can be, as we'll see in a second, very significant. So how, if you became a data scientist and you've done it for a while, I'm not sure how many went through this experience and if you have a different experience feel free to speak up. But from my experience, it's, you know, you became a data scientist because you want to work with data, you want to discover new things, right? But in reality it looks like a day of the life, so you started nine, let's say, overnight you sort of ran latest prediction from yesterday. Well, you know, you download new data sets or whatever you're doing. And then you start out with like doing a little bit of analysis, what happened overnight sort of, but then those green parts are the ones where you do ETL, it's called. So extract, transform, load, so that means you do maybe the training, you do maybe do the preparation of the data, you have to join it with other data and so on. And so starting at, you know, 945, you first can go to your first three or four coffees while you're sort of waiting for the one round to go. And then maybe, you know, after 12, you realize, oh dang it, I forgot those five new features. The idea had less height and I forgot to change it in this one file, right? And then you sort of have to start from the beginning and then so on. So it can be a very frustrating process, right, because it's low. And obviously data scientists are very valued resources that are highly sought after. So it does make sense, I think, to give them the tools they need and they deserve and require. That gets even more and more important and you probably see when you work in data science, you know, how much what this sort of uptake is and how many companies that may not even have thought that they are doing data science or suddenly looking for data scientists. Most likely they all did data science before but not in that scale. And one of the reasons is that on the right side that just the data, the amount of data that we collect is continuing to grow depending on who you believe, potentially exponentially but at least it's, you know, so massive that it usually cannot just sift through all the data. Now my assumption is that most of this data that is produced, I mean that includes, you know, like, I can't really read that but it says like voice of IP and social media and web and so on. So a lot of the data that is produced in social media is maybe not the most valuable that you need to necessarily analyze but the problem is there is valuable data in there. So that means you still have to go through all the data and to find the valuable data. So even if the signal to noise ratio is going up like crazy that still does make a data science trap easier because it still have to find the valuable data in the noise sort of. And then at the same time of course the famous Moore's Law is sort of at least slowing down if not depending on who you believe. It came to an end by now. So that means that this automatic, it just gets faster by waiting another year for the next computer. That's not working anymore. So if you look and you sort of see this effect in a traditional data science basement, like you may have seen those, sometimes you can probably do your university ones. You see wax after wax after wax of server computers and it's loud and noisy and so on. And this would be essentially the equivalent that can handle a data set that's from Fannie Mae, the mortgage data set. And those would be 300 servers worth like 3 million dollars runs at 180 kilowatts. And you can replace like this whole machinery with essentially this, a DJX 2 box, a DJX is sort of an immediate appliance. You can buy similar appliances from other vendors that also use Nvidia GPUs in them but that is sort of our model platform. And this whole thing sort of replaces this whole wall of computers in speed and it's even significantly faster to look down here. So it's not only speeding up but even you cannot really get to the same speed no matter how many CPUs you add because they're just flattening out in your performance. And the reason is of course if you have, so this DJX 2 box has 16 high performance, you know, server grade GPUs in them so that is a very dense compute unit. And you can imagine that at some point just data communication takes over if you can't scale out arbitrarily. And with the DJX 2 box because it's such a dense compute piece of compute machinery you sort of stay in this one machine. And the GPUs have a very high speed interconnect that is called every link in any switch where the GPUs can communicate extremely fast. That's why you get this sort of super linear speed up. If you look at it in numbers from partners that they're experiencing and not necessarily saying that all machine learning, you know, if you put it on a GPU will suddenly get 200x faster. But the ones like we have partners that actually get those speed ups already in production. So from Oak Ridge for example, they are using XGBoost. XGBoost is a framework for grading boosted trees, open source and we are heavily contributing to them to make sure that it runs well on GPUs. They experience a 250x speed up. There's a global retailer Walmart that not only was able to increase its prediction error but they can also run like improved, improved the prediction error so reduce the error I guess. But they can run more often and faster to do the predictions and of course, you know, and every little... So if they have to throw away fewer produce because they can predict better how much people will buy the more they save obviously. So how does a data science workflow look with rabbits? It's hopefully not different to a traditional CPU based workflow that's the whole point behind this. So in the ideal case as a data scientist you wouldn't even know sort of that you're using rabbits with the exception that it's significantly faster hopefully. So you have the three steps, data preparation which is you can imagine you have data sources from various... Or you have data coming from various sources. You know, you may have like your company database that has, you know, how... Let's take the supermarket as an example, you know, how many bananas did I sell over what period of time it's on? And now you may have a suspicion, oh maybe that has to do with partially the weather, you know, like if it's nicer out maybe people buy more bananas because they go picnic. If it's more cold outside or maybe they buy fewer bananas and it can have to do with maybe events, you know, like there might be a big holiday coming up by any of those. So this other data is most likely not... Like the weather is most likely not stored in your supermarket database, right? So that you would maybe use a web service to get all the weather data to the predictions over the next two months, let's say. And so you have like different data sources that you then sort of have to first combine, right? We have to make sure that you match sort of the date format that your weather prediction service uses to the date format that a database uses and maybe data is incomplete and noisy or usually it is sort of means you have to fill in missing data and you have to handle all those things. That's all part of the data preparation. That is usually done... I think I have a slide about this but I can say now you have sort of an abstraction if you're using Python, an abstraction that is called a data frame. A data frame is sort of almost like a... It's a table of formats or almost like a database table, can you imagine it? And they have different operations that you can use on those data frames. So that means you have one table of your product and one table of your weather, for example, and then you may want to join those two data tables into a new data table so it's very database-y, as you see. And then you may want to group individual entries in there by certain categories and then you may want to sort it and fill in the blanks. Those are typical data preparation pipeline. And then once you have all the data in sort of a numeric format that's sort of the goal of your data preparation pipeline is to get to usually some form of a matrix with numbers, right? All the strings then they have to be replaced with some form of a number representation so that your machine learning algorithm can work on that. Once you have that, you start whatever machine learning algorithm of a choice. You know, could use... Actually boost a principal component in an analysis, k-means, k-n, db-scan, tsvd, any of those. You could also try a traditional neural network-based architecture if you wanted to. And then, of course, you have to sort of explore and visualize the data because unless you're sort of in the matrix you probably won't be able to look at the numbers and know what's going on. So you may have diagrams and, you know, presentations and, you know, the development of an error loss, for example, over time and so on, so that all those visualization features are important for you, the data scientists will then sort of estimate and judge and get a feel for where your data's science algorithm may get improvement from. So if you look at the effects of all those and that's one of the big pieces and rapids is that we not only look at sort of the model training block with neural networks, that's definitely what NVIDIA focused on but now we also look at the data preparation pipeline and so we call that an end-to-end acceleration. So in the ideal case, data comes out of the database or the data source, let's call it, and it's immediately going on to the GPU, right, distributed and you then stay on the GPU and we'll do everything on the GPU from string processing to database operation to the actual machine learning. If you look at sort of the effects that this has is on the very top you may have the machine learning and training part, which is a significant block but if you reduce that, let's say by 4x, 5x, whatever, this whole upper thing would get a little smaller so that's good, of course, but you still have all the querying and you have to write to a distributed file system and so on, you have to do the ETL, so ETL is again the pre-processing of the data and all those blocks would remain if you don't move that onto the GPU too. And because we run distributed and stay on the GPU sort of front to end, we can get rid of a lot of those blocks and get like an overall significant speed up from that. Now, of course, I'm not sure if anybody tried to program GPUs yet, it's not hard, so if you do parallel programming in one way or another, I would even say it's often easier than a lot of other parallelization strategies to just write CUDA right away. However, there's still, you know, you don't want your data scientist to be a parallel programming expert, obviously, so you sort of want to make it as easy as possible and so there are certain challenges and especially you have to deal with sort of the data movement. Data movement is always bad because it doesn't do anything useful for you, right? You're just moving data and especially it's then, you know, that the slow down of your data movement operation is then always the weakest link and especially if you go distributed in a whole data center and one node over there has to communicate with the other node over there over multiple hops. You can imagine that that just crashes your speed up overall. Then another problem is that there are too many in the data science world, too many sort of makeshift data formats that are very close to each other but just not binary compatible and of course that means in every step you then have to transform from this ND array format to this other framework's ND array format which is almost doing the same but you know, they manage the fields a little different and so you go across and then you have to transform your data all the time and that is again a performance problem because transforming data is essentially a mem copy operation with a little bit of a, you know, switching a couple of things around a little bit of metadata and that is also useless essentially. If you don't need to do it, you would prefer not to do it. And then of course you wouldn't want your data scientists to actually write CUDA, right? So that means you have to have all those operations also algorithm implemented and then which NVIDIA does not have as much experience as we probably should. You of course want a Python API to that, right? And GPUs traditionally, I mean there are open source projects out there that make the GPU available in the Python world but it's none of them is NVIDIA-backed. So for the data movement, you can imagine, you know, you have a CPU and you have some application A and B and you sort of need to get the thing over if you now move things onto the GPU if you don't, and you are in a multi-node environment or at least even if it's a single node but you have multiple GPUs in there if you don't make the GPUs talk to each other directly in the application, you do a straightforward porting and say, okay, this piece, I move from the CPU onto the GPU and I just copy all the data over and then, you know, I copy all the data back and then CPU does a little bit and then I move everything over again and move everything back again, you can imagine that it's going over the PCI Express bus a lot, right? And that is not the fastest option obviously. So what you would like to do, ideally, is that you don't... that essentially for as long as possible keep the CPU out of the loop the CPU is still sort of the main driver so the CPU has the main program and has the GPUs what they are supposed to do but you want to keep the bulk of the data on the GPU and so on in the GPU main memory and as I said, the GPUs can talk to each other and be linked and then we switch with like super high speed like way faster than the PCI Express bus will be so the GPUs can still communicate with each other sync with each other, move data back and forth and the CPU is sort of the main driver telling the GPUs what to do, so the coordinator but you try to avoid the actual data copies over this narrow link. The second part was too many makeshift data formats in my list, if you remember. So here's an example for that, right? and how Apache Arrow sort of is addressing that and we are sort of using Apache Arrow piggybacking on Apache Arrow and so on the left side you see we have a lot of different applications or frameworks and they want to talk to each other and each of them has to sort of copy data back and forth with each other and everybody would have to understand everybody else's format, right? As I said, they're often very similar but they're just not binary compatible so there's some work that you need to do and now with Apache Arrow the idea is that Apache Arrow defines an in-memory layout of your data, a columnar layout and for all applications that sort of adopt Apache Arrow, they can then seamlessly talk to each other because we go through this common format. You don't need to use Apache Arrow sort of as your internal while you're processing format but at least you want it sort of at the boundaries, right? So you can access data and where you give data out whenever you use Apache Arrow at this connection then the other applications can just adapt and attach to you and then the third one was like the data scientist shouldn't need to or shouldn't be required to write CUDA and you want it Python sort of accessible so all of Rapid's has an exposed Python API so we try to be, I have a little bit about that later I think but we try to be compatible to well-known not compatible but at least inspired if not even drop-in replacement to famous or popular data science APIs in Python so for the ETL part we have a data frame approach similar to Spark or Pandas so data frames if you remember would be those tables that you can then like on the left side you see you have different operators you can run on those tables to select and to filter and to mutate so in a SQL like operator that you have sort of something like that so you're sort of writing interactive SQL in your Python, I Python notebook on the right side you would see sort of how that looks in code you would say some data frame with column you pick some column so you select a column and you do some little operator in there and you multiply the age by the fair and then you show it and then you see it in your I Python notebook like that's how you work with that and if with our data frames the data is actually on the GPUs and you're typing your I Python notebook and stuff happens on the GPU so that's why I wrote GPUs at your fingertips and for the machine learning part we are very inspired by Scikit-learn the API so in the ideal case most of the time it's really a drop in replacements instead of importing Scikit-learn you import rapids and then you should be fine so what does the rapids sort of the software package contain so we decided to go open source with rapids so you can go to rapids.ai I have a link later and you get all the code and so on the left side you see sort of it's like a whole software stack obviously you have algorithms, machine learning algorithms and implementations of all those database operators and so on that ultimately of course runs on top of CUDA one way or another because if it runs on GPUs it needs to run through CUDA so the components those three things on top the data preparation, model training and visualization are the same blocks I started out with if you remember so that's sort of the cycle that a data scientist usually goes through and for each of those we have sort of software packages those are the green pieces that sort of implement this step let's start out with those two QML and QGraph those are essentially libraries that we provide for doing machine learning so that would be QML those are implementations of the different machine learning algorithms and QGraph provides implementation of graph analytics algorithms so for QML and QGraph you have essentially a lot of algorithms so here's a list of algorithms in the second column that we provide so you have decision trees, random forests linear regression, logistic regression, k-means k-news, neighbor, DB scan, Kalman filtering PCA single valid visualization and vision some of them are under development at the moment so if you keep up to date you'll eventually get them some of them are out there already one version another some of them, so it's a wide field obviously some of them like Kalman filtering for example has so many variants that we of course focus on one, two, three important ones for now but there are so many other variants so that is a ginormous field but we sort of try to pick the interesting pieces that we can contribute or you have a need for a specific algorithm we can obviously always contact us and collaborate for graph analytics we have a sort of page rank and a breadth first search and so on which is also in the rapids.ai repository and if you just as an example for XGBoost so we also we try to collaborate with open source frameworks when they exist and we try to implement it ourselves if it doesn't exist for random forests and decision trees or gradient boosted trees we are collaborating mostly with XGBoost and contributing code back to that code base to make sure that they run as efficiently as possible and we have a good success with that but it's not only XGBoost for example we are also like working with catboost and in contact with their developers right to top them we are not singling out the individual one but we try to pick the good pieces and combine them into a software package again that is sort of a very similar graph to what I showed before so in interest of time I don't go into too much detail but here you see again even if you scale up to like hundreds of CPU nodes you are still not getting to the point where you can beat a single DJX2 in those examples that we are showing here and so the reason is again because you just get overhead when you add more CPU nodes you are not necessarily at some point getting faster anymore and those are a couple of benchmarks for initial algorithms where you see the speed up in the y direction and sort of the size of your problem in the x direction for TSVDE and PCA and so it looks very promising for a lot of those and then again you have the synergy effect with staying on the GPU because you do the ETL pipeline on that too so the overall speed up then is potentially even super linear to that I probably also skip that but those are as I said we are working on a lot of those actively if you follow the Rapids AI repository on GitHub GitHub or GitHub I'm not sure either one sort of when the new things come in and you can also contribute to the bug tracker and feature tracking and so on for Kugraf, similar so we have a whole bunch so that's for Patreon specifically now with a couple of benchmarks we are obviously always looking for real world data sets so it's more interesting to solve real world problems than benchmarks if we don't have anything else we go for benchmarks so if you have a need or good ideas of what you could do with that and try with that then feel free to sort of get involved and practice so those were the model training part now we look a little bit at the QTF so the Qt data frame implementation which is sort of the API that you would use in an iPad notebook interactive session right the one so that's the thing you talk to really about QTF we adapted a patchy arrow as I said as the in memory database sorry as the in memory data format and we provide a Pandas like API for QTF because there's also many data frame implementations Spark has one, Pandas has one, the couple others there's not the one standard so that's why we take a little bit more lenient approach and say we don't have to be a drop-in-the-prison we just switch Pandas to Rapids and it all works but you may have to adapt a couple of syntax syntactic things here and there but it's essentially it's Pandas like so if you know Pandas then you should be able to use Qt data frames and the important operations that we're working on are unary binary operations of course joins merges, group buys, filters and so on so that means you can write for example if you have two data frames and you want to join them or apply a lambda to a column in the database you can write it in Pandas sorry in Python and it then will run on the GPU on your data so we have for QTF today it's essentially an API that runs on CUDA so it's written in CEC++ because that's where CUDA runs but we have provided all the Python bindings potentially if the need arises we could also look at other languages like Julia or Lua, whatever and but at the moment we are focusing on CUDA with Python bindings all the low level pieces are written in highly optimized CUDA CEC++ and we use the Apache error format to import and export the data and inside like when the GPUs talk to each other we use CUDA IPC that's the inter-process communication mechanism so that the GPUs can sort of change data, exchange data talk to each other yeah and from the on the Python side the interesting line is probably the second to the last so that means you can just say take this numpy array and make it into a QDF data frame and then all the management of the GPUs and the GPUs that are available and who gets what data and so on that's all that we've done in the background you have like you can pass lambdas for example if you want to do a point wise operation on each of the data elements for example you can pass lambdas and then we're using chit compilation through numbar another big piece of the data preparation is obviously string processing so if you imagine as I said in the beginning data is usually noisy so that means when incomplete so if you imagine you have like database that contains some form of company name so you sell stuff to people and you write in what company sold to for example some of those may be very noisy so you may have bmw and all uppercase and the next guy writes bmw or lowercase and the next one writes bmw ink and the next one bmw AG many different spellings somebody may spell it out completely and you sort of have to combine them all so that the machine learning algorithm knows that's all the same bmw you're talking about so that means you need to parse strings you need to make strings to uppercase lowercase you may combine them you may do substring matches and those things and we are providing a library now that's called QString with NVString that can do all this so that means it's a string processing library of course there are so many things you can do with strings that we are not there yet we're still building up the library but for a lot of those operations actually we see a significant and promising speedup the reason being that it's often very memory bound but the GPU has very high bandwidth main memory access so the GPU DRAM and the speed that the GPU processing units can talk to this DRAM is usually very high higher than a CPU would have and that's why if you can leverage this you get actually also very good string processing numbers so the last I want to talk about on this thing I'm not going to talk about visualization that's usually making some graphs and things like that but the last interesting box on here would be DASC which is sort of the the process library we use to scale out to multiple nodes and we are collaborating very closely with the DASC developers so DASC you can I'm not sure who is familiar with MPI and does not know DASC because it will be a little imprecise but if you are familiar with MPI and don't know what DASC is then you can imagine DASC to be the MPI run or MPI exit executable that sets up processes across multiple machines that's where the analogy ends so the rest is very different but it's essentially a way to start many processes across a cluster for example of machines and then make the machines fight the nice thing is the scheduler is like written in Python and it's very modular so it means it fits very well in our whole ecosystem that we are working with so it's essentially you're doing sort of a MapReduce style computation when you use DASC and so QTF our QTF implementation is designed to work well in this setting and you can especially easily once you go from the ETL pipeline on to the ML pipeline when you do the machine learning you may like your communication library has sort of different requirements that you need to do so on the ETL side you say I want to be failure redundant I want to be resilient those guys once in ML it's much more traditional HPC application where you say I need fast you know reduce some all operation that's all where MPI comes in or it's very helpful or MPI is similar to MPI library which we call NICO and CCL which is sort of an MPI inspired library also open source that has high performance GPU collaborative GPU operations like all reduce and things like that and so we have sort of bridge when you want to go out of DASC into sort of the ML style communication which is not DASC anymore that you hand over or you instantiate from within DASC instantiate sort of the communication between all the participating GPUs and then you can switch over to sort of a compiler MPI style communication so if you look into the future for the next few months as that like it's a continuous process so if you follow the open source project you'll see sort of when new commits come in obviously but our goals are so a lot of those things are still very single GPU not all of them but especially in the data frame side and we are working hard on sort of getting it into a single node multi GPU so a DJX boxes is actually a single node that has in a DJX1 case 8 GPU and a DJX2 16 GPU but it's still one node so there's one CPU and it can talk to all the GPUs to an actual multi node so a cluster based approach and then string support of course is getting gradually better and better and we have accelerated data loading in the pipeline so parsing file formats can often like in some of our benchmarks by now 90% is actually reading the comma separated value file that the benchmark comes in and then the rest of the 10% is all the rest so we are going to look at those file formats like parsing and per key so CSV parsing per key and so on to ingest the data faster we have a couple of preliminary numbers they look very promising again as I said like one of the biggest effects is not only the speed up of the parsing itself which hopefully will be very significant but also if you have a big pipeline and that is sort of left over piece if you are familiar with Amdrel's law you know that's sort of what you need to attack to get a good overall speed up there's also a very interesting development now which is a Python CUDA array interface so an n-dimensional array essentially so we talk about before about Apache Arrow which is a columnar layout in this case it's now dense ND arrays where a couple of people sort of decided on a CUDA array interface and collaborating to sort of come up with a specific specification and the first project already is including those that brings it to my concluding remarks so in the beginning I had this why did I become a data scientist if you think back and then you are actually doing your data science and you see that a lot of those green parts are essentially sitting around and possibly drinking coffee with rapids hopefully we are getting more to the point where we have more of those red pieces which is actually work things, try your ideas and iterate much faster because you don't have to wait this long so that means when you know what you did in a day or two sort of the development cycles you can now do within a day so the interesting piece of GPS of course before I conclude for real is that we are as you may know good in graphics obviously but we also have a pretty long history but now in HPC and now more and more with neural networks but also with rapids now also data analytics and those three things is sort of my understanding, my prediction and my hope will sort of grow together into a bigger single one so now there are HPC applications that do simulation and then run data analytics on actually the simulated data for example for scientific discovery and then of course they have to visualize it in order to tell the scientists what's going on so that is sort of the new area that we are working in and then all the individual components GPUs are actually very helpful and provide a lot of benefit if you want to get involved how to get the software is that if you just go to rapids.ai that will lead you to everything and and go to rapids.ai and that should lead you to all the other places and if you want to join a patch arrow the GPU Open Analytics initiative at goai and then rapids.ai the pieces you should have a look at and get involved, get in touch and thank you for your attention with QNAT or what do you know what it is or Kupel I've only got it half myself I'll come a little closer there was somebody standing upside so again yeah some library or rapids for Kupel if we plan on integrating with that essentially I think so too I'm not familiar with the details but whatever makes sense and is helpful we are very happy to work with it's work going on but I would lie if I say it's definitely interesting I mean we're working on the neural network side very much with Kubernetes now and with also Docker container sort of our container thing of choice because it like every other container vendor is able to read Docker formats that is sort of our deployment strategies Kubernetes and then Docker for things so it's definitely will become very important once people start rolling out that more exactly first you have to add the code running yeah this system into a large pipeline so for example you start with Spark and then you might need this system because of parallelization is it then easy also to get results and write it do something else or use it as an intermediate step how easy is it to so the question was how easy is it to integrate rapids in existing pipelines like for example if you have Spark using Spark already and then for a certain part want to go parallel and use rapids and then maybe you have to write it back in a way that Spark can read so we are collaborating very closely with Spark too so there will be announcement sooner later to how the actual integration looks like we definitely our goal it's all seamless so if the data scientist writing python in your notebook you should be able to switch back and forth essentially maybe you have a line in here and there where you have to do some conversion but it should be completely seamless it's most likely I would assume not seamless yet it's all still currently we're at 0.5 I think version so it's not production quality thing yet where I could say switch over and everything will run faster so we're not there yet for the ones like all our collaborators are very happy with the speedups they see so it seems to be good enough for now to be integratable in existing pipelines but I would assume since it's new you know realistically there's always a little bit of thing involved Thanks we'll have to figure it out next time we'll figure it out