 live here in Germany for HP Discover Frankfurt, Germany. This is theCUBE, SiliconANGLE.com's flagship program. We go out to the events, extract the signal from the noise. We're here to cover HP and find out the latest and greatest what's going on with the company. Obviously, HP is a huge technology company announcing a lot of different news. I'm John Furrier, founder of SiliconANGLE.com. Join with my co-host. I'm Dave Vellante of Wikibon.org. Go there, check out all the research from peers. We're here with Frederick VanHaren, the senior director of the R&D labs at a company called Nuance Communications, a Massachusetts-based company, public firm, very interesting firm. Frederick, welcome. Thank you. Appreciate you coming on theCUBE. So Nuance is, again, a company that I've known a little bit over the years. I've followed Dragon Systems, which I know is one of your products. So I'd love to hear the story again of Nuance Communications. Why don't you start there? Sure, so Nuance is a worldwide technology provider for imaging and speech technology. So when you think about imaging, think about the technology that's embedded in the scanner. So when you scan, you end up with an image and the products we provide on the imaging side will allow you to convert that image in an editable document, like a PDF or a Word document. So the other side of the company, which you mentioned, which is speech. So there are actually three components to speech. Three is the text-to-speech component to it. So text-to-speech is basically the computer talking to you. So imagine a GPS, if you have a GPS, TomTom Garmin talks back to you. So that's our technology. Does that in different languages and allows you to get street-by-street guidance left and right. Really, okay, so much of the turn-by-turn voices are you. That's correct. And T2 and Mr. T says, take a rat, fool. That's you guys. There you go. So that's the text-to-speech component. So you also have the opposites, where you talk to the machine, that's the other product you brought up. We have a desktop product called Dragon Naturally Speaking. We have that available in multiple languages. Very, very, very popular product nowadays. Other technologies, where our technologies embedded is mobility. So a good example there is we have apps for the iPhone and Android. Dragon Search, so you can make a request basically saying, please find the nearest restaurant or Chinese restaurant. Your phone has built-in GPS, so it knows where it is. We'll send that information to us. We will handle the query and give you a list back of restaurants or movies or whatever your question was about. Now, you're in the R&D labs side of the business, right? That's correct. Talk about your role there and what is that whole R&D labs all about? So it's more specifically about speech recognition. So if you look at a speech recognizer, there are really three components to it. The first thing is what the user is saying. The second component is the recognizer. The recognizer is a piece of software that is going to compare whatever you are saying with something which we call a language model. A language model is a data subset of all the data we own and that's what we ship with the product. So eventually your voice is going to be compared to the language model. Now you have to imagine that we have petabytes and petabytes of storage. Now there's no way when we sell Dragon that I can put petabytes of storage on your laptop. So we have to make decisions what goes, what doesn't go. So once you go through a recognition, the engine will, for example you could say, the question could be yes or no. You could say okay, yes. Now what the system will do is we'll say, I think for 80% the person said yes. For 10% the individual said no. And for another 10% I have no idea what the individual said. Now in order to improve the recognition results for everybody, so as a native speaker American English might be a lot easier for you. You will get a better accuracy than for me. But from a user usage perspective, we want everybody, even people with accents or dialects to be as successful as possible. The only way for us to do that is to collect more user data. Now once you have collected that user data it has to get into the language model in some way or form. So you need a high performance computing environment that continuously process that data such that your language model can be updated on a regular basis such that the accuracy and the success rate goes up all the time. An interesting component is mobility. So you have a cell phone. Now typically we provide a language model that is generic for everybody. Now when you call with your cell phone we pretty much know your number. And that means that we can attach a personalized language model to you. So one of the things we're doing today is to improve the accuracy by every time you call or you make a request. We basically say it's you and we improve the language model and every day if you would say the same thing of day after day after day the accuracy should go up over time. So even your dialect or if you have difficulties pronouncing certain words eventually the system will take care of you individually. So okay, so how does that work though? So I've got a cell phone, I've got a cell phone provider. AT&T happens to be my provider. And what I give some kind of opt in to allow you to monitor in exchange you're going to improve it. So let's go to the flow. So let's assume you have an iPhone you go to iTunes, you download the Dragon Search app. The Dragon Search app indeed has a legal footnote saying are we allowed to process your data? Yes, no, let's go with the flow. Let's assume you say yes. So now every time you use the app that data will actually go to our production site. So a production site is responsible for just doing the recognition. What that means is you say something and the the recognizer, the production recognizer will give you the results. Now it doesn't stop there. What really happens in the background is whatever you said in your ID is now being passed to the research environment which is where I come in. So I get all that incoming data, I get the IDs and then over time we process the new language models which we then push back to the production site. So the next time you use the app the production site says oh I know this individual and we have a special language model for them and that's where the whole story goes. So the more you use the system the better it will work for you. But it's massive, massive amounts of data, right? That's right. Now you're moving that data around, are you not in a way or? Well yeah, so let's define moving around. So we have a large HPC environment that is about 15,000 hard drives. It's over 10 petabytes of storage. So the amount of data we move around locally it's about 160 gigabits a second. Yeah, so locally you're moving a lot of data but it's fast. That's right. The HPC environment. Getting the data in even itself you've got multiple access points pounding on your system, right? That's right. Intruding into your system. So that's an architectural challenge as well. It is, yeah. So we have to design what we call our own platform so we buy the traditional hardware but we have to introduce or build our own platform that allows us to deal with 15,000 hard drives and we double every year. So the capacity doubles, the performance doubles. You just need to keep up. You're scaling. So that's amazing. So how would you describe your biggest storage challenges? What are they? Is it just to be able to ingest that or process that or all the above? I think the biggest challenge is so it's capacity and performance combined. So we need more capacity in order to be able to absorb more data. However, at the same time, we also need a decent amount of CPU power to process that. Now, the more CPUs you add to hit your storage, the more performance you have to add. So the challenges there are to add more devices, storage devices that help you in that area. Typically you can choose capacity or performance. The moment you say I want capacity and performance, you're basically asking for the impossible. Yeah, they're going in different directions. That's right. And because of that, we always try to work with vendors and we have an extremely good relationship with HP where we expose our problem and they find ways to introduce some features in the product that over time will help us. So it's not just buying what's on the market. It's also providing HP or through the engineering section, providing them enough information so that they really can help you. Fred Rick, I want to ask you a question about just kind of changing topics around obviously nuances back end when you think about data ingestion. Yes. So forget about HP for a second. Just from your company's perspective, how do you deal with all this new data that you want to store in the back end? Is there a data explosion that you're experiencing and how are you guys handling it? For example, are you looking at things like Hadoop? Are you looking at what is your storage strategy relative to like Hadoop, for example, and is there requirements to get that information out faster and have the old data warehouse model? You park it out into the farm, into the hinterlands of the data center and tape and disk and you get it back a couple days later. Run some reports now. The requirement is real time communication. So there's been an emphasis of real time. So can you share any input on to the market out there about what you're experiencing? Yeah, so it's exactly right. The data explosion is causing us to provide innovative ways of processing that data. You mentioned Hadoop. Hadoop is a bunch of products combined. I think the real component that's useful to us is MapReduce. So what we're trying to do is first of all, trying to be able to absorb all that data, which is more of a physical problem. You just want to store it. The second problem is the processing and that's where we really are. Are you guys using H-Base at all? Are you looking at H-Base? Is there a database that you settle on? Not really. I think now we're really trying to figure out what we would like to use the next coming four to five years. There's no doubt that today there are products out there that will help us. But with the data explosion, the last thing we would like to do is to get stuck in it with a product where it works today, but it doesn't work tomorrow. What do you think about the, just as an opinion, the general consensus around databases? You've seen that cyclical movement again where you have a unique purpose-built database and then it gets more general purpose. You've seen Olab, you've seen Cubes, you've seen things like that and then it gets back to SQL, so structured, unstructured. What's your take on all the database situation unstructured versus structured? Yeah, it's a tough question. I mean, the traditional database that exists today is a huge problem because it really doesn't scale. I mean, the only way I think a traditional database will survive is if storage actually will be more based on SSD and more RAM-based. NoSQL has definitely a future, but I think for us, we have no clear winner, right? We really don't know if using a NoSQL database is really the answer. But you guys do have, for example, Facebook, we talk to those guys and they use a lot of unstructured for a lot of their chat messages, so this is some homegrown solutions that they had to build because there was no other solution. So do you guys have a similar kind of experience? Yes, so that's what I was referring earlier about the platform. The platform is basically our software layer where we pick and choose software components that are available and assemble our own workable environment. So for the startups out there that are trying to really create products out there, what's your advice to them being that you would be a potential customer as it gets more mature? What will you look for? You mentioned stability, is it obviously performance? Yes, I think there are three main components. It's scalability, stability, and cost. And scalability really means the market is growing so fast that if you had a product but it doesn't scale, you're really not helping yourself. So you always have to think at least six to 12 months ahead and try to envision where is it going to go if it's successful. Stability means that while you're scaling, you want to make sure that the product doesn't fail while you're scaling. Meaning that there are always issues you're going to have while you're growing, but you just need to be able to address those. Well, for the folks watching right here in this live stream, Fred Rick was on a panel last night that Dave Belonte moderated during HP's large announcement. And I asked a question from the audience and I really thought your answer was one of the best which was when I asked about the business models that are changing in the data center. And what are some of the challenges? I thought your answer which was essentially what you just said is really right on and that is that scalability is a really big deal but also the proverbial uptime availability is kind of thrown around like a punch line by vendors but ultimately you cannot be down. That's right. Can you elaborate on how critical that is? It's very critical. So traditional terminology is you have production and then you have research. So we are kind of a research environment but our uptime is actually better than our own production teams. It's really a challenge. But it all comes down to engineering and the flexibility you have to resolve problems. I mean, people always talk about redundancy. It's like having a plan A which is you know, it's a perfect world, everything works. Plan B is well, you design it for the typical issues. And plan C is where, you know, what do you do if a complete storage stack basically disappears out of your environment? And those are the real, real important components where you have to think of. Well, my final question we're getting the hook here as we wind down is this, what is on your agenda for this next year? I mean, you don't have to tell us from a company standpoint, any secret sauce or any kind of the family jewels there within Nuance. But what are you watching as someone who's in the business who you have to do the research, you have to manage what you're doing. What trends are you watching closely that are really going to be high impact for your business? Well, I think from a business perspective, mobility and healthcare are the two big boomers at this point from a business perspective. From a technology perspective, I think storage will evolve really fast in the next 12 months where a lot of people will come up with, you know, getting rid of the mechanical device, so to speak, which is really one of the components we need in the fast growing environment. I'm also hoping that the Hadoop and equivalent market kind of matures and what I mean with mature is that there are a lot of flavors out there. So there's no doubt you can put something together but really mature where you can have millions and millions of records going in per hour without having issues less than right. There's no doubt you can use Hadoop. You do need the infrastructure and the support around it. Well, thanks so much for coming on theCUBE. We heard someone who's in the trenches dealing with it every day. Fred Rick, sharing his stories and critical path items like scale, storage is obviously all the rage and with SSD, Dave Donatelli was sharing the same thing. Obviously Hadoop maturity, totally agree. Great stuff, this is theCUBE. We go out and extract the seeds from the noise. There it is. This is siliconangle.com's exclusive coverage of HP Discover in Frankfurt, Germany. We'll be right back with our next test after this short break.