 Hi, welcome Jeff Frick here in theCUBE's offices in Palo Alto. We're here for a CUBE conversation, ongoing series where we have some of the practitioners and the technology folks here in Palo Alto and the Bay Area come by the office and kind of give us an update. So we're excited here to welcome John Blumenthal, the VP Product Management at Cloud Physics and I'm joined by our esteemed and often on-air David Floyer, the founder and CTO of Wikibon, co-founder of CTO Wikibon. So welcome to theCUBE. Thank you, Jeff. So to get us going, why don't you give us a little update? I'll just give a little thing. Before we came on-air, we were talking about cloud and hybrid cloud and we throw these terms around like nothing really and really there's actually infrastructure behind the cloud and there's actually people that have to manage the infrastructure behind the cloud and only manage it but put it in and as you said, make decisions a lot of how. So that's what we're gonna jump into but let's get kind of the quick update on the company. Yep. So John Blumenthal, the VP of Product Management at Cloud Physics, one of the co-founders and we recently G8 our platform at the end of VMworld and going into Q4 of last year. We've subsequently raised a large round of funding to grow the company as we started. Thank you. As we started to see the onboarding success of a group of customers as well as partners in the first part of this year and then on our data platform we have recently starting in June, the end of June, announced a very focused set of solutions that are analytics and simulations around storage that take on a lot of what I'll call classical and traditional storage management use cases with the new approach that our platform actually offers. Okay and how much was the round? Can you tell? We did 15 million dollars. 15 million and who were some of the investors? So Jeff Co came in as a new investor. The Mayfield fund continued as well as Kleiner Perkins. Okay, awesome. And then how many people in the company now? So we're- Approximately. We're growing quickly. So we're- That's what I say, approximately. We're at 30 and we're growing fast. I actually recruited my former boss from Veritas days, Jeff Hausmann, to come in and actually assume the CEO role to grow the company. Jeff has exceptional skills at this stage in the growth of the company. And I get to focus on the product that I'm here to talk to you about today. Excellent. Let's jump in. Dave, I'll let you- Yes. So dive in here. You're at a very interesting nexus as data centers grow into more complex data centers and the hybrid cloud, et cetera. What are the fundamental problems that you've set out to try and issue? And why are they different from how it's done today? Yeah, so I think the nature of the problems in a data center really have started to change in a way that from the preceding paradigm of client server, I think suffered from a lot of dramatic events that were associated with waste and risk, mainly because it was new and it was the first step off the mainframe effectively. Now in the cloud timeframe in which we operate, I think the drama associated with these things has gone away because of the fundamental advances that things like virtualization have offered and how do you consolidate resources and supply those two workloads in a way that is safer and more economically effective. But I think what that's left us with is what you're saying, which is complexity and a death by a thousand cuts because you can't deal with that complexity and see through it in a way that allows you to make really good decisions. The life of the data center operations person as well as the owner of the entire infrastructure really, I think suffers from a dearth of data and they are making a succession of decisions without understanding the relationships among those decisions and the consequences of them. So as a result, you build something that's very efficient and effective but as you get into the second and third generation of what you're building out and growing, those characteristics of efficiency, cost savings, safety start to really crumble and start to become more apparent in the operations and what you actually start seeing manifested usually in a surprising way in the infrastructure that you run. So we're here to try to solve these issues one by one using data. So and obviously most environments are now much more virtualized. BM or Hyper-V or whatever or Docker in the future. And that obviously means good things because you're sharing the cost of the infrastructure but it also at the same time it brings the interaction between all those resources and the actual environment becomes much more complex. And it's true of networking. I mean networking is not nearly so virtualized but obviously true of CPUs which are very virtualized and is now becoming increasingly true of the storage layer. Storage usually has been held the brunt of that. Storage has to prove usually that it's not the villain. So what are you doing to help that particularly in storage, the storage administrator? So storage is in the middle of a revolution as you guys see your previous shows highlighted before with the introduction of these fundamental advances in media with flash and SSD. But the introduction of these new layers into the infrastructure contributes to the complexity and another set of decisions around how much do I purchase? How do I supply it to these contending workloads on this platform and where do I place it? And these are very complex infrastructure decisions that really cannot be resolved with the traditional all just throw hardware at the problem. Budgets don't allow that anymore. There's I think a new discipline in the house which is requiring you to explain yourself as a decision maker that's quantitatively driven to prove out why you wanna reduce flash or SSD to resolve a lot of these historical contensions that you see in your environment. And so this is where cloud physics steps in because the nature of understanding your infrastructure in the face of what the storage industry can now offer you is a very complex question instead of decisions. And even from the procurement, the start of that question at the procurement level is a complex question that we've built services around to help you understand how to put your data to work to figure that out. And then it extends itself and becomes core to how you actually operate the infrastructure once you have introduced something new and then how you interact with your vendors on an ongoing basis. We believe that all of these things require data science to help you drive that. But we don't wanna create the imposition and requirement that you become a data scientist in order to reap the benefits of what your systems actually offer you. So if I look at today's environments, there's an awful lot of data management coming from the specific storage arrays or the specific CPUs, there's a lot of data that comes from that area. And there are a lot of good tools from the specific storage vendors, a lot of management frameworks. How is your approach different? Are you taking advantage of that data? How are you linking to that data and how are you making it? I presume you're a heterogeneous approach. So how are you dealing with that? Because it also sounds counterintuitive when you say there's not enough great data, but as you just mentioned, a lot of the stuff in the data center is spinning off all types of data. So I wonder if you can kind of speak to there's a lot of data there, but it's not the right data. And then your approach of potentially kind of going outside the building and getting a whole different set of data to help make some of these decisions. You guys are hitting on like what I think is one of the greatest ironies in our industry, where the whole internet of things is being really evolved by the IT industry itself. But a data center as an internet of things, if you think of it as that, is one of the most heavily instrumented objects in the world. The problem with it is that it's, the data is not processed and presented in a simple, timely, and coherent fashion to the people who are responsible for it. And so cloud physics has arisen to help solve that problem, to basically harness this data in very unique ways. That's actually not that dissimilar from what the largest infrastructures in the world do with their data. And put that data to work for you in a way that doesn't impose superhuman requirements to understand what to do with it, how to apply it, and what it means. And so that's the challenge that we're meeting right now in the market, specifically on storage. But to your question, David, what is unique about it, there are really two dimensions of uniqueness. First is that we collect data that no one else really does today in the form of workload tracing and getting you insights as to the storage characteristics, the storage shape of a workload, and all the associated workloads that are running in your environment to understand how they contend, how they overlap, how they interfere with each other. And understanding that, literally at the top of the hypervisor, before all of this data actually hits the array, where it's too late, you're in reactive mode, once you start to gather data that's already hit the actual storage subsystem. So the data itself that we collect is all v-sphere-driven data. So our solution today is very much focused on VMware environments. Our roadmap includes extending that to Hyper-V soon. But for today, it is a collection that involves performance, configuration, task, events, everything that spins out of the ESX off of v-sphere is what we collect. But in addition, we have these specific workload traces that are very unique in displaying and visualizing what is the actual workload characteristics that the storage system has to support. Now we do that across thousands of data centers today. So the second unique value proposition arises from the SAS orientation of our solution where by doing these aggregated collections that result in an anonymized data set that represents on a continuous basis how systems are configured, how they perform against those configurations, what is interesting to your environment that can be related in a cohort to what's happening in other systems in the wild to discover new levels of efficiencies, to discover optimized and real best practices that are now quantifiable, and also finding risks in operational hazards that our algorithms discover and then publish to you to give direct relevance around the level of safety with which you might be operating, but not aware of issues that exist in the wild that you should be aware of. So I want to explore those value propositions with a little understanding more. But just before we do that, could you explain a little bit more what you mean by a workload? Is that VMs, an application? Are you taking a top-down view rather than just a pure storage view? So today it's a top-down view. I can also describe it as a middle view. If you think of your infrastructure as a kingdom, we're starting at the top by collecting in our workload tracing offerings. We're collecting effectively the SCSI sessions that are happening between the guest operating system and the virtual disk that it's interacting with. So this is the SCSI client server interactions where we're collecting effectively the entire workload trace of what and how that particular application that sits in a virtual machine is consuming and interacting with storage beneath it. That's the top view that I characterize is the top of the kingdom. The middle kingdom is the hypervisor because there are a lot of things that happen once that IO hits the top of the virtual disk. And when it finally is emitted out of the hypervisor and sent actually to the underlying storage subsystem. And in between there, there's a lot of stuff that goes on that's been characterized by various folks as a blender effect. So that transformation. The IO tax. There's a tax there for sure. And a lot of it has to do with how you've configured your systems, what's the nature of your workloads, what's the capabilities of the underlying storage system. So this is beyond human comprehension, right? The combinatorics here are wild and cloud physics exists to take that complexity and really crunch it and reduce it in a way that is meaningful to you but doesn't require the exercise of understanding all the relationships of the physics, if you will, of how all this stuff actually works. And presenting it to you though for specific use cases that can be solved applying data to it. So it seems you've got two main buckets, one of which is operational to solve a specific operational problem. So maybe you could give a couple of examples of that. And the other one is more planning, which is, if I understand it correctly, you've got these large number of data sources coming in from a whole number of different companies. And then you can use that to compare. Am I getting that right? You really are, yeah. So let's start with the operational one first. So what would I expect that was different from my, assumingly, what it is. Yeah, or even what's the hanging fruit? What's the first ROI that people start to see or what's the really easy thing to demonstrate the value of your service? So in storage in particular, one of the most onerous tasks that is thrown at you is as an operations admin frontline person who really has to deal with that incoming call that says the app is slow and it's your fault because you virtualize the damn thing. So there's a time to innocence that you're trying to achieve here. And to do that today though, it's really ugly. You don't even know if storage is the culprit here. Literally, you don't know that. You're being accused of latencies that are producing a bad user experience or breaking a process. And so the first thing you have to do is dig around to figure out what direction is this heading, where is the root cause of this thing. And most people because of their past experience and the nature of storage go straight after storage. Our statistics drawn from our dataset indicate that most of the time the root cause of performance problems for applications is in fact storage. So what we've produced- But it is usually the root cause. It usually is. So people default to that, right? It's not a bad reaction then. It isn't, but sometimes storage is seen in a way that is kind of obliquely involved. And so I'll give you an example. So someone writes a really bad SQL query, right? The SQL query spins on the CPU and the other application that's trying to access an underlying storage device is actually locked, right? The real culprit here is the CPU in the bad query, not the underlying storage system. And so storage isn't always at fault here, but how you first go about with one click absolving storage or condemning storage is the essence of what we provide. So for example, today when you wanna do troubleshooting on storage, you basically have to find out which virtual machine is being impacted and then what is the configuration of that virtual machine to the underlying storage and then who are the other people sharing that storage to figure out where is the contention? Who is actually suffering from what this resource hog is actually committing? Well, to do that today, you have to log into about four or five different either element managers, virtual center itself, your application performance management app and each of these windows, it typically is tiled on your screen, on your monitor, has different time references. So if you look at it from the app form, you'll be looking at possibly milliseconds, but when you look at the virtual center form, you're looking at like 20 second intervals. And so you can have a demon here in the millisecond range that is a ghost because it never shows up here because the spike in the latency is only exhibited in a second or two, but it's never captured in this other window. So you don't even have a normalized view across time. You have to stitch all this together in your mind, really sophisticated and users who have time to do this will export these data sets into a spreadsheet to try to basically go through a whack-a-mole exercise to figure out where is this problem. So we consolidate all this, we normalize all this. We present to the user with our performance troubleshooting card, which is a visual metaphor for our service, the ability to immediately see where is latency occurring, what's the duration of that latency, and then right next to where that latency is depicted, for example, on a particular data store, you see who the culprit is, and you see the listing of all the victims. And then for a VMware admin, you can take really effective action right away. You can isolate that virtual machine, you can spread them out, you can change your DRS settings, you can change your storage IO control settings. You don't have to just go and channelize everything by buying more storage to try to figure this problem out. You can act with data as your guide to basically drive you to a better configuration, right? And that's an example of making a good decision, I think, that leads to better decisions as you avoid the purchase of more hardware, more storage as a knee-jerk reaction to a problem that could be resolved if you simply had data supplied to you in a more cogent fashion. So one of the areas that is being increasingly virtualized are database systems, and database systems are expensive, the database itself is by far and away the most expensive component. One core of that costs you $45,000. Yep, plus. So how could your system help in monitoring complex databases? Because this is where the locks really happen, isn't it? Yep, yep. How could your services help in terms of optimizing the overall infrastructure? So one of the advances in improving database performance in particular is the deployment of SSD. And understanding, as the infrastructure designer, responsible for a large number of databases, so these are workloads that already exist that you wanna improve their performance, as an example. One of the key things you can do to begin your architecture exercise is to baseline your environment using our application where the workloads can be characterized as to their read-write mix and other dimensions of throughput and latency that you're already seeing on these systems. Once you have that report of how your virtual machines are actually laid out in how their IO profile is represented to you, you can then also run our caching analytics service that will look at those profiles and in the course of tracing those workloads, run those against our simulator to determine what the latency benefit would be if in fact you introduced a form of acceleration. So this is a planning exercise based on your current, what if I change these to Fusion IO cards or an SSD or whatever. If I introduce Pernix or some of the other more advanced and Nuvo approaches to doing storage acceleration, especially for high end applications like a database. So we contribute to understanding where to introduce those technologies and then how to configure them because our simulator is very accurate in depicting not only what the latency benefit would be, but what that latency benefit, how it changes as you add or remove the size of the cache. So you can do this for your initial planning exercise, but as you start to provision new virtual machines with new workloads, you can draw upon this analytics service again to figure out where should I best place this particular workload because there may be cache available for it or should I go and allocate more cache for these particular types of workloads to improve the latencies that you see across the system. So and then lastly, from the point of view of looking at the planning side, how are people using it to compare themselves with other people? That's always been a big problem. And as you go towards the cloud system, you can compare yourself very accurately on price with clouds, but can you compare yourself with clouds, for example, or can you compare yourself with other people in terms of the effectiveness efficiency of your operation? It's a great question. I mean, so we have some forms of cost analysis in benchmarking today in our, some of our services. Those are growing because our data set now is become more meaningful because of its size. And so you should expect to see from us very soon certain forms of comparisons that allow you to look at really how your resources are deployed and how effective are those resources relative to what other folks are actually doing. So it's not just simple configurations, it's actually resource consumption and the actual yield that you're getting from your environment. But what we have done recently and introduced that as gaining traction are really two dimensions of this. One is cost calculations that involve a real quantitative decision that I think CIOs are really pushing the owners of the infrastructure to figure out, which is am I economically better off if I move my workloads here that are running locally up into an infrastructure as a service provider like VMware's VCHS or Microsoft Azure? How do you go about making that decision economically? And we've produced calculators that allow you to look and analyze your local environment and workloads and then run estimations and simulations against the catalogs, service catalogs and costs that are offered up through the service provider's environment. So you can look at a particular workload or groups of workloads or your entire data center with our service and rapidly determine what is the cost for me to actually move up some portion of my infrastructure in a hybrid cloud form to those environments. So we see that as a key formation of the decision-making process that the infrastructure engineer and architect face today. And we feel that that decision has to be done quantitatively and it has to be presented to the people who are responsible for the financial structure quantitatively. Very interesting indeed. So that, and then just finally, where are you going in the future? What's happening? Yeah, so our platform is continuing to become more robust. We're putting more forms of different data on this platform. So you're going to see from us incorporation of data that you were mentioning coming off of storage platforms. You'll see it coming from other hypervisors. You'll see it coming from hybrid cloud services in partnerships that we seek to form with those entities. And so as we bring more data... You're looking like network or you're looking at those as well? There are resources too, yeah. So our platform supports more than just the analysis of storage. We have just focused very heavily on that since our June release and it's also the genetic makeup of the company. But we have other analytics and simulators that we're producing around network, CPU and memory also. But most of the focus that you'll see from us in the coming few quarters will be very much storage focused. John, thanks for coming back. This is a huge topic. We're getting the hook from the guys. We're getting over time. But a lot going on, obviously a lot about the how for people that have to make decisions to implement things like cloud and hybrid cloud and virtualization because it's not as simple as the words that we use. There are actually people in rooms making decisions, implementing real technology. And I think it's a really interesting way to gather the data and the profiles from the broader community. And I think so there's a lot going on, a lot of development, we got all that money and I can put to work. So go to cloud physics to learn more. John, thanks again for stopping by the conversation. David, always great to have you here in the office of Jeff Frick here at Palo Alto offices of theCUBE. Thanks for joining us.