 Live from Las Vegas, it's theCUBE, covering AWS re-invent 2017, presented by AWS, Intel, and our ecosystem of partners. And we're back here on the show floor in the exhibit hall of Sands Expo, live at re-invent for AWS, along with Justin Warren. I'm John Walls, and we're joined by a couple of executives now from Weka I.O. To my immediate right is Learons Vibal, or Zvibal rather, who is the co-founder and CEO and then Mayor Ben Diane, who is the chief architect. At Weka I.O., gentlemen, thanks for being with us. Thanks for having us. Appreciate you being here on theCUBE. First off, tell the viewers a little bit about your company, and I think what about the unusual origination of the name? You were sharing that with me as well. So let's start with that, and then tell us a little bit more about what you do. All right, so the name is Weka I.O. Weka is actually a brick unit, like Mega and Terra and PETA, so it's actually a trillion exabytes, 10 to the power of 30. It's a huge capacity, so it works well for a storage company. Hopefully, we will end up storing Weka bytes. It will take some time. I might take a little bit of time to get there, anyway. We're working on it. One customer at a time. Tell us a little more about what you do in terms and your relationship with AWS. All right, so at Weka I.O., we create the highest performance file system either on-prem or in the cloud, so we have a parallel file system over NVMe, like previous generation file system did parallel work over hard drives, but these are 20 years old technology. We're the first file system to bring new parallel algorithms to NVMe, so we get you lowest latency, highest throughput, either on-prem or in the cloud. We are perfect for machine learning and life sciences applications. Also, you've mentioned median entertainment earlier. We can run on your hardware on-prem. We can run on our instances, I3 instances in AWS, and we can also take snapshots that are native performance, so they don't take away performance, and we also have the ability to take these snapshots and push them to S3-based object storage. So this allows you to have DR or backup functionality if you will look on-prem, but if your object storage is actually AWS S3, it also lets you do cloud bursting. So you can take your on-prem cluster, connect it to AWS S3, take a snapshot, push it to S3, and now if you have a huge amount of computation that you need to do, your local GPU servers don't have enough capacity or you just want to get the results faster, you would build a big enough cluster on AWS, get the results and bring them back. You were explaining before that it's a big challenge to be able to do something that can do both low latency with millions and millions of small files, but also be able to do high throughput for some large files, like median entertainment tends to be very few, but very, very large files, but something like genomics research, you'll have millions and millions of files, they're all quite tiny. That's quite hard. But you were saying that it's actually easier to do the high throughput than it is for low latency, could maybe explain some of that. Do you want to take it? Sure. So on the one hand, streaming lots of data is easy when you distribute the data over many servers or instances in AWS like Luster does or other solutions, but then doing small files becomes really hard. Now this is where WECA innovated and really solved this bottleneck, so it really frees you to do whatever you want with the storage system without hitting any bottlenecks. This is the secret sauce of WECA. Right, and you were mentioning before, it's a file system, so it's an NFS and SMB access to this data, but you're also saying that you can export to S3? So actually we have NFS, we have SMB, but we also have native POSIX. So any application that you could up until I only run on the local file system such as EXT4 or ZFS, you can actually run in a shared manner. Anything that's written on the main pages we do. So it just works, locking everything. That's one thing, we're showing for LiveSense's genomic workflows that we can scale their workflows without losing any performance. So if one server doing one kind of transformation takes time X, if you use 10 servers, it will take X, 10 X the time to get 10 X the results. If you have 100 servers, it's going to take 100 X servers to get 100 X the results. What customers see with other storage solutions either on-prem or in the cloud that they're adding servers but they're getting way less results. So we're giving the customers five to 20 times more results than what they did on what they thought were high-performance file system prior to their work-highest solution. Can you give me a real-life example of this? When we talk about life sciences, when we talk about genomic research, and we talk about the loaded-biddy files and millions of samples and whatever, I mean, exactly translated for me. And when it comes down to a real job task, a real chore, what exactly are you bringing to the table that will enable whatever research is being done or whatever examinations? So I'll give you a general example not out of specifically of life sciences. We were doing a POC at the very large customer last week and we were compared head-to-head with best-of-breed, all-flash file system. They did a simple test. They created a large file system on both storage solutions filled with many, many millions of small files, maybe even billions of full small files, and they wanted to go through all the files. It just ran the find command. So the leading competitor finished the work in six and a half hours. We finished the same work in just under two hours. So more than three x time difference compared to a solution that is currently considered probably the fastest. Gold standard, allegedly, right? Yes, so it's a big difference. Doing the same comparison that customer just did an LS of a directory with a million files. That other leading solution took 55 seconds and it took just under 10 seconds for us. It just get, we just get you the results faster, meaning your compute remains occupied and working. If you're working with, let's say, GPU servers that are costly, but usually they are just idling around waiting for the data to come to them. We just unstarve these GPU servers and let's you get what you paid for. And particularly with something like the elasticity of AWS. If it takes me only two hours instead of six, that's going to save me a lot of money because I don't have to pay for that extra six hours. It does, and if you look at the price of the P3 instances, for a reason, those voltage GPUs aren't inexpensive. Any second they're not idling around is the second you saved and you're actually saving a lot of money. So we're showing customers that by deploying work on AWS and on-premises, they're actually saving a lot of money. Yeah. So explain some more about how you're able to bridge between both on-premises and the cloud workloads. Because I think you mentioned before that you would actually snapshot and then you could send the data as a kind of cloud bursting capability. So is that the primary use case you see customers using or is it another way of getting your data from your side into the cloud? So actually we have a slightly more complex feature. It's called tiering to the object storage. So now customers have humongous namespaces, hundreds of petabytes, some of them, and it doesn't make sense to keep them all on NVMe flash, it's too expensive. So a big feature that we have is that we'll let you tear between your flash and object storage and lets you manage economics. And actually we're chopping down large files and doing it to many objects, similarly to how traditional file system treat hard drives. So we treat NVMEs in a parallel fashion, that's world first, but we also do all the tricks that the traditional parallel file system do to get good performance out of hard drives to the object storage. Now we take the tiering functionality and we couple it with our highest performance snapshotting ability, so you can take the snapshot and just push it completely into the object storage in a way that you don't require the original cluster anymore, so. That's pretty cool. Yes. So I mean you've mentioned a few of the areas that your expertise now and certainly where you're working. What are some other verticals that you're looking at? I mean what are some other areas where you think that you can bring what you're doing for maybe in the life science space and provide equal, if not superior value? So currently. Where are you going? So currently we focus on GPU based execution because that's where we save the most money to the customers, we give the biggest bank for the bank. Also genomics because they have severe performance problems around building. We've shown the huge semiconductor company that was trying to build Android. They were forced to building on local file system. They took them 35 minutes. They tried, their fastest was actually on Ram, battery backed Ram based shared file system using NFS v4. It took them four hours. It was too long, you only got two compiles a day. It doesn't make sense. We showed them that they can actually compile in 38 minutes. So a shared file system that is fully coherent, consistent and protected only took 10% more time but it didn't take 10% more time because what we enabled them to do is now share the build cache. So the next build coming in only took 10 minutes. So a full build took slightly longer but if you take the average, they're build with 13 or 14 minutes. So we've actually showed that a shared file system can save time, right? Yeah. Other use cases are media and entertainment. So for rendering use cases, you have these use cases, they paralyze amazingly well. So you can have tons of render nodes rendering your scenes and the more render nodes you have, the quicker you can come up with your videos, with your movies or they look nicer. So we enable our customers to scale their clusters to size it they couldn't even imagine prior to us. It's impressive. Really impressive, great work and thanks for sharing it with us here on theCUBE, first time for each, right? So you're now a CUBE alumni, congratulations. Hey, thanks for having us. I thank you for being with us here. We're again, we're live here at Reinvent and back with more live coverage here on theCUBE right after this time out.