 It's The Cube. Here is your host, Jeff Crick. Hi, Jeff Frick here with The Cube. We're on the ground at the West and St. Francis in San Francisco, California at HBaseCon 2015. We were here a few years ago, but we wanted to come and get an update on what's been going on, obviously, pretty exciting times in this space. And we're joined in this segment by our newest co-host, George Gilbert from Wikibon. Welcome, George. Good to be here. Thanks, Jeff. Absolutely. And our guest is John Carter from? From Carter Page. Carter Page. John Carter was a movie that I don't think did very well. He had not been to Mars. Carter Page from Google, welcome. Thank you. And I guess you had a great keynote to kick off today, so tell us a little bit about what you covered in the keynote for those that missed it. Sure. So what I talked about was how Google's mission statement, which is to organize the world's information, plays into our involvement with the open source community, both with us initially launching the OSDI paper in 2006 that allowed HBase to be built, basically build a clone of Bigtable. And I also talked about how, both with Bigtable and now releasing Google Cloud Bigtable and with HBase, the ability to have a database that stores the world's information is another way that we're actually extending the mission statement for the company or implementing it. So tell us a little bit about, like many of our mainstream customers, obviously, are familiar with traditional SQL databases, there's a class of applications that's exploding that don't lend themselves well to that that you guys wanted to have as sort of references to start with. What are the characteristics and what are some of those apps? Well, so in terms of what Bigtable and HBase provide, as no SQL application, is you get your trade-off of you allow this scalability to mind-blowing sizes, essentially, to potentially into petabytes. And what happens is the trade-off is you are unable to efficiently do the kind of joins that you would do with a SQL system. So with a SQL system, you think about how you want to organize the data, and then you come along and you write applications on top of that. Now, no SQL, the trade-off you get, the bargain you have to make to get this kind of crazy scalability, is you have to decide ahead of time how I'm going to use this data and how I'm going to query it. And you organize the data accordingly to fit in your database schema with how you're going to access it. So the advantages of this is it allows you to do things of crazy scale, to build a store, like I said, really large amounts of data. It also allows you to do really high throughput types of operations. So Internet of Things is a very popular space right now. This includes gas meters that are spread around the country with there's medical device companies that are stood in recording data. There are phone companies that are constantly taking temperature readings, battery readings from your phone. And when you call in and say, I have a problem, they want to be able to immediately pull up and see what's going on. And you need a database of this sort that can handle the kind of millions of writes per second that allow us to be possible. It's another phenomenon, too. We hear about a lot of shows, schema on write versus schema on read, because you don't necessarily know what the schema that you need when you're getting all this data. I'm sure you guys had a ton of experience in that in Google, as you guys have just continued to expand these base applications over and over and over again. Yeah, it is a tricky thing. And a lot of times, you can find yourself painted into a corner if you don't get the schema right initially. So it's something we really think about a lot. In fact, internally in the company, we have schema reviews where the big table team locks should go through with internal teams and help them to design the schema out. And it's not something to be thought of lightly. You really have to figure out what you want to do with this, what happens if I want to query in this way or that way, how am I going to get to that data? And if a little bit of forethought into that, into getting a good design, pays enormous dividends on the tail end as you're growing up and you're scaling and you find you actually have a lot of room to grow. So data modeling isn't dead? No. But another question, which is, you're running big table out of scale that no one else can match. I mean, no other cloud provider and, of course, no enterprise vendor is going to get anywhere close. What can, how much of an advantage can someone who's considering deploying HBase on-prem get from running it in Google Cloud Platform? Well, so the trade-offs are, there's numerous things that people look at. Some people are initially concerned about placement of the data, for instance. That's something we actually saw by Google Cloud Platform, but actually when you decide to provision a cluster, we will actually tell you exactly where it's going to be. This cluster is going to be right in the middle of the United States in the Midwest, where this cluster is going to be in Taiwan, where this cluster is going to be in the EU. And we'll let you know, so your data is actually going to be there. So that's one thing that I think people are concerned about. In general, that we hope we have alleviated, the big downside to running an HBase server yourself is there's a lot of operational overhead to it. And particularly, if you have a large company, you have a lot of clusters to run, and that can be a real headache. If you're a small company, putting together a large enough operations team to be able to have a round-the-clock pager service is complicated, and it's expensive, frankly, to hire enough people to be able to do that, and it can exhaust them too. And so by using Google Cloud Platform for Bigtable, the customers are able to actually get Bigtable SREs as their own support. And that's a really powerful thing. We've been doing this for 10 years, so we know how to run these things, and react to issues that may come up as quickly as possible. Like Microsoft uses an example with Azure, SQL DB, that they have 1.3 million database instances under management, and if someone tried to do that with 30 to 50 databases per admin, you have 30,000 admins in Azure, is there a similar metric for the productivity of Google running this versus someone on-prem doing it? I mean, I don't have a good number of what in HBase, how many administers you need for how many HBase clusters. There is certainly a significant operational overhead. There was an entire panel, or a series of panels here at the conference about operations because it's such a headache to get right. And sometimes people have very specific needs where they wanna run their own clusters. But frankly, as I said, we've been doing this for a long time. We've figured out how to automate these things to a normal scale. So what we've done is our, we have rather than operations who are sitting there kind of watching and pressing buttons, we have engineers whose entire job is to watch what might go wrong and build tooling around it. Without automation and the immense amount of automation and tooling and reporting that we have built, there's no way we can actually manage the kind of data sets we have at Google. So pretty much anything that is stored is persisted at Google. Any application that has any kind of persistent data is using BigTable in some way. And so it's incredibly mission critical. And that's the kind of reliability that customers who want to use Google Cloud, BigTable could expect from their own applications. One other question that, now that you have an H base personality on BigTable, do you see BigTable becoming an integral part of the Hadoop ecosystem hosted by Google? I mean, will you be part of that ecosystem in the cloud like Azure has its own flavor? So yes, I mean, I think, there's two reasons that we were very interested in using the H base API as our way of connecting with Java developers, with Java programs that are looking to use BigTable. The first is we needed to have an API write something and we could either invent our own standard and throw it out there with a mix of a million other standards and drive people nuts, or we could pick something that we know is tested. I mean, the number of large tech companies in the world that use H bases is pretty phenomenal. They've taken it through its paces. We know this API is solid and we know that if we expose it with this API, it's gonna be something that's gonna fit a lot of use cases. Also by using open source kind of as our way of defining our API rather than doing some kind of industry consortia or something like that, we feel this is a much more nimble way for us to react to things that come up in the market. Whether new ingestion pipelines come in like Spark or things like this. These things, the H base community is gonna react very quickly around it and we as part of that community helping with the API want to be able to take advantage of that. So now the second part is with the H base API, we get integration points with everything that integrates with H base. We have a managed Hadoop cluster. You can spin up on Google Cloud Platform, click a button and you have a few hundred workers that are now talking to Google Cloud BigTable or to H base if you want. We also have the same thing, we are gonna be integrating with Spark and Storm and Kafka and we're gonna be continuing to look at things on an opportunistic basis as customers are telling us what are their most important integrations to fit into this. It's great, it's a great story and it's interesting though that you tied directly back to the mission of getting all the information first collecting and then making all that information available and that's a core piece and then we keep hearing about Georgia and incredible innovation that comes with open source with an engaged community and I'm sure you guys don't have a hard time getting developers engaged as maybe some of the the other big kind of classical school companies do which I think is a real challenge. Balloonberg said something interesting, really interesting to us. We went in asking, do you see a new sort of set of applications growing up alongside the traditional applications based on the SQL DBMS and they said, no we don't see it that way, we see this infrastructure converging with the traditional SQL databases only at a faster pace because of the innovation cycle and at a lower cost as well. Do you see it unfolding that way? That's a very interesting question. There is certainly some convergence that's happening with things like Phoenix and people looking to combine things together in a way that is more queryable. Be able to throw a SQL interface on top of it. Inside of Google, we throw Dremel or our extra cloud products called BigQuery. You'll be able to use BigQuery to query Bigtable which is very much a relational experience. That being said, you still get a different performance metric, right? I mean, if you're doing relational stuff with a lot of joins, having a database that is well suited for that is going to perform differently than throwing it on top of an H-based database. Likewise, if you're trying to ingest a lot of data at these types of scales into a RD-BMS, you're going to have a hard time as well. So I think the market is definitely trying to push these things together. The architectures are fairly different though and so it's going to take, I think, a few more years for it to play out how that's going to work. So we're getting the hook but I want to give you the last word before we go. What are you excited about? You guys are, you're doing a lot of fun stuff, I'm sure. What are you working on today that's getting up in the morning and down to work? Well, the Google Cloud Bigtable is honestly the thing that gets me the most excited. When I first pitched the idea internally and we started building this, the thing that really got me up every morning was thinking if the rest of the world had the power of this database that allowed Google to scale so many applications fairly effortlessly instead of having to try and scale them individually, just throwing on a big table and you knew it could scale to a billion users. If everyone else had this same kind of power and scalability, what types of things could you build on it? So the thing, now that we've launched this, the next thing that is really getting excited is to see what people are actually going to build on it now. Awesome. Well, Carter, thanks for stopping by and George, always good. Jeff Frick here, we're at HBaseCon 2015. You're watching theCUBE, thanks for watching.