 live from San Jose. It's theCUBE, presenting Big Data Silicon Valley, brought to you by SiliconANGLE Media and its ecosystem partners. Hi, I'm Peter Burris and welcome back to Big Data SV, theCUBE's again annual broadcast of what's happening in the Big Data marketplace here at, or adjacent to Strata here in San Jose. We've been broadcasting all day, we're going to be here tomorrow as well with the forager, eatery, and place to come meander. So come on over, spend some time with us. Now, we've had a number of great guests, many of the thought leaders that are visiting here in San Jose today around the Big Data marketplace, but I don't think any has traveled as far as our next guest. Juanjo Sun is the CEO of Transwarp, come all the way from Shanghai at Juanjo, it's once again great to see you on theCUBE. Thank you very much for being here. We'll just see you again. So, Juanjo, the Transwarp is a company has become extremely well known for great technology. There's a lot of reasons why that's the case, but you have some interesting updates on how the technology is being applied, why don't you tell us what's going on? Okay, so like recently, we announced the first audit of the TPCDS benchmark result. So our product called Inceptor, that is a SQL engine on top of Hadoop, we already had quite a lot of features like distributed transactions, like full SQL support so that it can mimic Oracle or traditional DB2 and those traditional database features so that we can pass the whole test. And this SQL is also scalable because it's distributed, it's scalable. So the large benchmark like TPCDS, it starts from 10 terabytes and SQL engine can pass it without much trouble. So I know that there have been other firms that have claimed to pass TPCDS, but they haven't been audited. What does it mean to say you're audited? I presume that as a result, you've gone through some extremely stringent and specific tests to demonstrate that you could actually pass the entire suite. Yes, actually, there is a third party auditor. They already audited our test process and it results for the past five months. So it is fully audited. And the reason why we can pass the test because actually there are two major reasons. For traditional databases, they are not scalable to process large dataset so they cannot pass the test. For hardware vendors, because the SQL engine, the features are not reaching enough to pass all the tests. There are several steps in the benchmark and the SQL queries, there are 99 queries. The syntax is not supported by all hardware vendors yet. And also the benchmark required to update the data after the queries and then rerun the queries for multiple concurrent users. That means you have to support distributed transactions. You have to make the upper data consistent. So for hardware vendors, the SQL engine on Hadoop, they haven't implemented the distributed transaction capabilities, so that's why they failed to pass the benchmark. So I had the honor of traveling to Shanghai last year and going and speaking at your user conference and was quite impressed with the energy that was in the room as you announced a large number of new products. You've been very focused on taking what open source has to offer, but adding significant value to it. As you said, you've done a lot with the SQL interfaces and various capabilities of SQL on top of Hadoop. Where is Transwarp going with its products at today? How is it expanding? How is it organizing? How is it being used? We group these products into three catalogs, including Big Data, Cloud and AI and machine learning. So there are three categories. The Big Data, we upgraded the SQL engine, the Streamer engine, and we have a set of tools called Transwarp Studio to help people to streamline the Big Data operations. And the second product line is Data Cloud. We call it Transwarp Data Cloud. So this product is going to be released early in May this year. So this product, we build this product on top of Kubernetes. We provide Hadoop as a service, data science as a service, AI as a service to customers. And we allow people to create multiple tenants. And each tenant is isolated by network storage CPU. They are free to create the clusters and spin up and turn it off. So it can also scale to hundreds of cores. So this is the, I think this is the first way implement like network isolation and storage persistency in Kubernetes. So that it can support HDFS and all Hadoop components. And because it is elastic, just like cloud computing, but we run on bare model, people can control the data, control applications in one place. Because all applications and Hadoop components are continentalized. That means they are Docker images. We can spin up very quickly and scale to a larger cluster. So this data cloud product is very interesting for a larger company because they usually have a small IT team. But they have to provide the bigger capability and a machine-only capability to larger groups like 1,000 people. So they need a convenient way to manage all these bigger clusters. And they have to escalate the resources. Even they need a building system. So this product is, we already have a few big names in China, like China Post, PetroChina and Stigrid of South China. So they are already deploying this data cloud for their internal customers. And China has a few people. So I presume that China Post, for example, is probably a pretty big implementation. Yes, so they have, but the IT team is like less than 100 people, but they have to support thousands of users. So that's why they, usually, we deploy 100 cluster for each application, right? But today, for large organization, they have lots of applications. They hope to leverage big data capability. But there are small team, IT team cannot support so many applications. So they need a convenient way, like just like we deployed Hadoop on public cloud. So we provide a product that allows you to provide Hadoop by service in private cloud on bare model machines. So this is the second product category. And the third is the machine learning and artificial intelligence. We provide a data science platform a machine learning tool that is interactive tools that allows people to create machine learning pipelines and models. We even implemented some automatic modeling capability that allows you to feature engineering automatically or semi-automatically and to select the best algorithms for you. So that the machine learning can be, so everyone can be a data scientist so they can use our tool to quickly create models. And we also have some build models for different industry, like a financial service like banks. Security companies, even IOT, so we have different pre-builder machine learning models for them, they just need to modify the template then apply the machine learning models to their applications very quickly. So that probably like less than for example, for a bank customer, they just use it to deploy a model in one week. It is very quick for them. Otherwise in the past they have a company to build that application to develop the machine models. The unit takes several months. Today it is much faster. So today we have three categories, like the big data cloud and machine learning. Machine learning and AI. It's three products. And you've got some very, very big implementations. So you were talking about a couple of banks but we were talking before we came on about some of the smart cities, kinds of things that you guys are doing at enormous scale. Yes, so like we deploy our streaming product for more than 300 cities in China. So these cuts are connected together. So we use streaming capability to monitor the traffic and send the information from city to the central government. So all the sort of central poetry. So whenever illegal behavior on the road is detected, that information will be sent to the policeman or the central report trail within two seconds. Whenever you are seen by the camera in any place in China, there are lots will be sent out within two seconds. So the bad behavior is detected. It's identified as the location. The system also knows where the nearest police person is and it sends a message and says this car has performed something bad. Yeah, and you should stop that car in the next station or in the next load across the road. So it is, today there are tons of thousands of policemen that depends on this system to work for their daily work. Interesting. So just a question on, it sounds like one of your sort of nearest competitors in terms of let's take the open source community, at least the APIs and in their case open source Huawei. Like, have there been customers that tried to do a PLC with you and with Huawei and said, well, it took four months using the pure open source stuff and it took say two weeks with your stack having being much broader and deeper. Are any examples like that? There are quite a lot. Like we have more market share like in financial services, we have about 100 bank users. And so if we take all banks into account, so for them they already use Hadoop. So we, our market share is above 60%. 60? Yeah, in financial services. So we usually do PLC and like run benchmark run their real workloads. And usually it takes three days or one week they can find, we can speed up their workload very quickly. Like for Bank of China they run their, they migrate their Oracle workload to our platform. And they test our platform and Huawei's platform too. So the first thing is they cannot migrate the whole Oracle workload to open source Hadoop because the feature, the missing features, we are able to support all these workloads with very minor modifications. So the modification takes only several hours. And we can finish the whole workload within two hours. But originally they take, usually take Oracle like more than one day, more than 10 hours to finish the workload. So it is very easy to see the benefits quickly. Now that you have the streaming product also with that same SQL interface. Yes. Are you going to see a migration of applications that used to be batch to more near real time or continuous? Or will you see a whole new set of applications that weren't done before because the latency wasn't appropriate? So for streaming applications, those real-time applications are mostly new applications. But if you are using Storm API or Spark streaming API, it is not so easy to develop your applications. And another issue is once you need to, once you detect one new rule, you have to add those rules dynamically to your cluster. So the IT operator, they do not have so many knowledge of writing scholar codes. They only know how to config. Probably they are familiar with SQL. They just need to add one SQL statement to add a new rule so that they can dynamically. In your system. Yeah, in our system. So it is much easier for them to program streaming applications. And for those customers, they don't have real-time applications. They hope to do some like a real-time data warehousing. They collect all this data from websites, from some sensors, like PetroChannel at all your company, the large oil company. They collect all the latest information directly to our streaming product. In the past, they just collect them into Oracle and around the dashboard. So it only takes hours to see the results. But today, their application can be moved to our streaming product with only a few modifications because they are all SQL statements. And this application becomes real-time. They can see the real-time dashboard results in several seconds. So Juanjo, you're number one in China. You're moving more aggressively to participate in the U.S. market. What's the last question? What's the biggest difference between being number one in China or the way that big data is being done in China versus the way you're encountering big data being done here, certainly in the U.S., for example? Is there a difference? I think there are some differences. Some seem like customers usually request a POC, but in China, they usually, I think they focus more on the results. They focus on what benefit they can gain from a product. So we have to prove them. So we have to help them to make a great application to see the benefits. I think in the U.S., they focus more on technology than Chinese customers. Interesting. So they're more on technology here in the U.S., more on the outcome in China? Yeah, that's interesting. Once again, Wan O'Soo from CEO of Transwarp. Thank you very much for being on theCUBE. Thank you. And I'm Peter Burris with George Gilbert, my co-host, and we'll be back with more from Big Data SV in San Jose. Come on over to the forager and spend some time with us and we'll be back in a second.