 Live from the Julia Morgan ballroom in San Francisco Extracting the signal from the noise. It's the Cube covering structure 2015 Now your host George Gilbert This is George Gilbert. We're at the Julia Morgan ballroom in downtown, San Francisco at Structure 15. We are joined by Adam Ray CEO of Basho Technologies Basho has Some incredible use cases to bank their reputation on one we just actually The the that the cube got to talk to a couple weeks ago at the IBM inside conference was the weather company one of the world's largest deployments of sort of Internet of things, but not just the infrastructure, but sort of the decision engine behind it And once you tell us a little bit how you got started with them and how that project grew sure Yeah, no problem. So the the weather company Started off with a goal first taking all the weather data in the world So they went after all the consumer weather data they could find but ultimately a fifth of the world's Business is impacted by weather every day. So they turn that into Effectively a large ingestion engine for all IOT so any type of device anywhere You could grab information from a light pole to a weather station to You know government data they might collect from marine biology And so at the core of that they were collecting all this data and starting to manage it across multiple data centers in a distributed environment It became very operationally intensive to make that data available in real time And so that's where we came in well They were originally looking at you know things like they had started with the relational databases Then they use started using Cassandra, which is another well-known distributed database But as they found they went from two data centers to three data centers to eight data centers and more there was no way Operationally they can make that data available without a really highly available Distributed system database like a react key value, which is at the core. Okay, so when they went all the way up to eight Data centers were they trying to get closer to the origin of the data or was it just they were trying to you know Scale out there their hardware capacity Well, so a little bit of hardware capacity scale but also a little bit about data coming from so many places all across the world and so You have to at a certain point you have to if you want to make that data live and usable in a short amount of time You've got to start to move your logistics around to cover more surface area Especially if you're in devices or collecting data from every end of North America to Europe, etc So it's a little both so tell us then How the data volume grew over time and then how did you I mean speed of light isn't is not one of those knobs? You know we physics physics physics except for Einstein, you know figure the way around it So how did how did you get around the the latency of you know Gathering some data that might be in Europe or farther afield But was critical for the calculations that were being done perhaps in the US well So I mean in in our particular database It's all about high availability and so we make all automatically three copies around multiple random locations So someone can always get access to it And so if you've got a scenario like an IOT workload for for example the travel industry the airline industry where they're Using information from the weather company's IOT platform to make real-time decisions and whether they're going to change the flight of a plane So that has to be you know that data has a life shelf and that data has to be highly available Regardless of you know who's trying to how much access it going so I think they they process last I checked 20 They're bringing in some like 20 terabytes of log or sensory in-device data a day And you know I don't know how many petabytes and how many billions of ingest feeds so all that's going into us We have to be able to catch it sort it all around and make it available immediately when it's being asked for so You know it could cost millions of dollars in regards to flight information if something goes wrong on the end of that workload So are the analytics duplicated at each of the three sites just so that the whoever whoever's consuming Let's say you have pilots over you know Europe They want low latency and Their data and then the analytics are being calculated in a sort of European data center same thing happening You know an American data center for you know North American pilots. Is that how it's working? Well, so typically what happens is the data itself is coming from all these in devices And right now that data comes directly in the database now I think if you talk to companies like we're working with like Cisco or IBM They're thinking about ultimately having in sensory devices which will consolidate at the device collecting Because ultimately right now the amount of IOT data being sent across the world is not overwhelming us But think over time it couldn't so you might even have filtering at the true edge one day Okay, we're right now that device just says I give you all this you know time stamp data You want on this weather information or all this barometric data? You want etc. There might be a day where you say look don't send the data until it changes so it goes You know no change no change no change delete delete delete change send right now anyway That's where they're headed But they bring that all into the database which is spread all over the place and then it's consolidated again to the analytics in Key points that are making it accessible. So the analytics isn't spread everywhere Analytics has actually been done in very tight area Consistent areas and then made available to whatever service they're doing like the airline industry So how do you deal with the fact that it's gonna take longer for let's say data that you might be collecting in Asia for You know pilots who are trying to figure out how to avoid turbulence there and the So the data data capture might be happening there But the data processing might be happening in North America whereas data captured in North America You know has low latency in this process. Yeah speed of light understand exactly so they I mean you I Without going into a lot of detail they can geography base it So you can at least talk about Asia versus North America etc But in North America example, they don't need multiple instances for North America They can do it with one and make it available then based on the data services are trying to feed though I would be worthwhile call it with IBM who's now sent recently acquired the digital assets of the weather company their intent is to make that Democrpize that level of service and make it even more easily available So that people could run it in you know their own facilities run it in software access that underpinning asset and all the algorithms For any type of data not just weather Are they building this it sounds like they're building a platform is that a is that a basho base platform or is that Doesn't have a bunch of data management components and analytic and an analytic pipeline on top of that Yeah, so where our value proposition is the distributed data tier So you know for the distributed data tier we're at the core of what they do for a vast majority of everything Now what they how they use spark how they use other? Algorithmic based or machine learning engines etc on top. That's we just feed in so all that's really their platform is about that and We're just now underpinning to catch all that data and make it highly accessible and available at scale. Okay, so What are some of your other use cases I assume this has got to be your you know one of your largest and that really taxes, you know, how you How you can how you can scale out Because you know enterprise software you're always pushing boundaries and finding critical sections that you didn't know were there Oh, yeah, yeah, so what are some of the other ones? You know now that you have a showcase that's the envy of all your competitors What are what are some of the others that have fallen into your lap? Yeah, there's there's quite a few things so we have our kind of flagship product traditionally has been react key value Which there's a lot of profile session data So for example a use case there might be the NHS which is the health care society for all of England You know the Kingdom's 80 million patient profile records all run on our database So and the key thing being is that needs to be available for prescription for doctor recommendations So literally you have life and death type scenarios playing out if it's not available Hence why it runs on us and they left a scenario where they were on Oracle prior And they were running on an Oracle pricing differential Actually the the one of the big things they said is they say they felt like they really said their big money on the integration side Because they had all this extra stuff they had to do to make Oracle work That's it and I haven't heard this because the well they so they had a couple records they came out and said That they saved 20 million dollars in the first year alone on our by using our database versus Oracle And they break that down between the licensing cost of the database and So all they tell us and we're not allowed to comment on the actual is that the vast majority of it was actually in the integration maintenance setup versus the license challenge itself and so the integration and Setup did that have to do with? Siloed systems and that it was easier perhaps because the database is more flexible to keep sort of at the simplest level like Atlantic electronic health records from different institutions and sort of integrate them into one Repository yeah, exactly. Well, I mean a relational database which is a great asset base transaction type database is great As long as you can keep it in one location for a certain type of structured data that you're just going to go after Consistently as soon as you start to get in an unstructured environment Or for example take a profile record and at first you say I'm only collecting a half a dozen data points But suddenly you want to collect six more data points Well, guess what you have to start over with scratch on the maintenance with the relational database isn't designed to just Arbitrarily expand all over the place and go different date with the different data. So it's the it's the ability to evolve That the nature of unstructured data by by default that was in other words the major the major cost Or the major impediment to doing it cost effectively on Oracle. Well that made actually for a distributed environment Relational databases is terrible I Mean if you put them in two days two or three different data centers and you're trying to keep them synced up Right, it's a nightmare. Well, you can't and in two DC's you can't really you can It's a nightmare you like the evil Knievel jumping the snake River Canyon. Don't try this at home Yeah, exactly. And so but that's exciting to me It's because like people in my space like we have probably one of the most Operationally easy scalable distributed databases in the world more one of two like Cassandra and react are really known for a high level of scale But in our case, we're known for operational ease at scale. So that's why the weather company. That's why NHS Semantic when you say operation is meaning the low number of sort of DBAs per instance or yeah Any type of any type of dev ops individual that's actually management post So when we gracefully fail select a node fails, they'll just because there's redundant copies They'll be able to continue to run and when the node comes back up They'll automatically repopulate it versus you having to structure it potentially whereas the others like Cassandra used a pairing architecture Where you have one goes down and you lose a little bit of that data in the middle. You guys start from scratch It's a real nightmare at scale. Okay. I mean so you have if you're in Netflix Who's a great company but has you know 800 engineers you can engineer around that, right? You know, but if you're you know the traditional enterprise client, you're like, look, I don't have distributed systems experts I don't want to have distributed systems extra I just want my data to be available right and I want it to be accurate and we not only deal with the availability the accuracy But also the operational ease and so I mean this is why like uber is a large client. Yeah, their dispatch system every time You're using an uber dispatch You're pulling up based on our database being available in the back end. Okay. We only have a couple minutes I just want to take you get your take on we see a transition from Hadoop 2.0 meaning sort of yarn and HDFS and then you know the 27 different projects that sit on there on that to something We're calling big data 3.0 with sort of converged analytics, you know, it could it's maybe API compatible data stores And and sort of performance and capacity tiering and more unified development on top of that Do you see something like that evolving by in other words to enable this platform to move into the mainstream? Just just the way, you know, you've simplified things. Yeah, I think it's totally so I how I would whether you want to call it 2.0 versus 3.0 how I how I think of it is Real time and Enactionable data that you can use in your workloads Versus post analysis or archival and Hadoop is very much designed for a data warehouse mindset I'm going to collect it all and then beat it up and figure out what I'm going to do with it It's that's why they built H base and stuff on top They're trying to figure out a way to make it a little bit more actionable Right. I mean that's why spark so popular versus map reduce right because spark is a way that I can just ingest it index it use it Right. Um, so where we focus and where rex focus and where bash was a company with our time series offering with our key Value offering is we're saying it's all about actual real-time data We have to collect massive amounts of unstructured data, but you want to be able to use it potentially At the time you collect it But what you don't want to be is necessarily you're throwing it all someplace for analysis the later date because right now The current averages of all the data people collect that max you're using 15% for example in an IOT use case So what's the use of him for all that storage if you're not going to actually get value, right? That's how I would look at the world. I think we have to leave it there. I understand This is George Gilbert. We're at the Julia Morgan a ballroom in downtown, San Francisco at structure of 2015 and we'll be back in a few moments