 Hello and welcome. My name is Shannon Kemp and I'm the Chief Digital Manager of DataVersity. We'd like to thank you for joining today's DM Radio, Planets Aligned, the Irresistible Forces pulling big data into the cloud sponsored today by Unravel. It is a deep dive in continuing conversation from a DM radio broadcast a few weeks ago which if you missed you can listen to it on demand at dmradio.biz under podcasts. Just a couple of points to get us started due to the large number of people that attend these sessions. You will be muted during the If you'd like to chat with us or with each other we certainly encourage you to do so just click the chat icon in the bottom right hand corner for that feature. For questions we will be collecting them by the Q&A section in the bottom middle of your screen or excuse me in the bottom right hand corner of your screen or if you'd like to tweet we encourage you to share highlights or questions via Twitter using hashtag DM Radio and as always we will send a follow-up email within two business days containing links to the slides, the recording of the session and additional information requested throughout the webinar. Now let me turn the webinar over to Eric Kavanaugh the host of DM Radio to introduce today's webinar and speaker Eric hello and welcome. Hello and welcome back and if you hand me the ball there may be a latency issue. So all right folks here we are once again a DM radio deep dive. Yes indeed this is your host Eric Kavanaugh. Planets Aligned, the Irresistible Forces pulling big data to the cloud it's going to be an astronomical show today folks so let's go ahead and dive right in. So the great migration you know I've been tracking this for quite a long time now I actually have my own data that I've been watching very closely I do all of the marketing or at least goes through me for the Bloor Group which is of course a partner with Dataversity in producing DM radio these days and these deep dives and I've been doing email marketing track email marketing for CASP almost 20 years now since 1999 that's right the last millennium is when I started using this stuff and I expected cloud to be a hot topic for the enterprise by latest 2010 2012 quite frankly and it just wasn't and I could tell from the numbers raw numbers that I would get of open rates on email blast anything that said cloud in 2012 2013 2014 even into 2015 was just a snoozer nobody was interested everything as far as they were concerned was still on premises it was around 2016 that things change and all of a sudden cloud became a pretty hot topic and I'm now referring to this as the great migration and what I'm talking about is it's real now enterprises understand and appreciate that cloud is real you can look all over the place and see the major vendors jumping on board but of course what's going to happen here things don't happen overnight right there's going to be a very very long tail to on-premise applications into on-premises data right those data centers are not going to go away in fact I kind of predict that there's going to be a bit of a backlash I think it's already happening and you're going to see certain aspects of the enterprise really resists this movement into the cloud CFOs for example chief financial officers they don't want their data in the cloud but we're going to talk about big data today and we all know that data has gravity we're going to hear from George Demarest here in a few minutes of Unraveled Data which is a really really interesting company I've been tracking for a couple years now at least doing some very cool stuff to enable the leveraging of big data but like I say long tail to legacy systems and in fact I used just yesterday a quote that I remember hearing about five years ago on DM radio when someone gave me a definition of legacy systems a legacy system defined as any system that's in production I think that's kind of funny so in the world of technology we often think the cool things are new in reality they're often not very new so I put together this timeline for enterprise cloud adoption and he goes all the way back to 72 believe it not 1972 IBM rolled out its first virtualization technology it wasn't until 1997 that yahoo email launched who remembers that remember excite email yahoo a bunch of folks all jumped on that bandwagon yahoo of course has been sticking around ever since excited still out there there are people on excite don't run across too many of those emails though quite frankly it was 2006 that rack space released its cloud offering the same year amazon web services well goodness gracious they made a tremendous foray into the cloud I remember taking a briefing from them around 2011 and was amazed at how broad and deep their portfolio of applications was at that point it's now 2019 obviously amazon is the leader they're absolutely a juggernaut they are just kicking butt and taking names across the board so in 2007 there was almost zero big data business in the cloud it just wasn't there yet the enterprise was not yet ready and you can see an o8 project red dog in o8 google cloud platform that's when that was launched now you don't didn't hear about it a lot until the last couple of years but for those of you who are in the industry I could tell you one bell whether that shows that the cloud adoption for the enterprise is real for google and that's because they're going out and grabbing some of the best enterprise software senior executives folks from informatica folks from teradata folks from some of the bigger vendors are going over to google and they're also going over to amazon so both of those companies are very very serious about the cloud these days as is another big company we'll talk about in just one second 2009 google docs was released and oh my goodness how did that revolutionize collaboration i love google docs of course microsoft also has now gone into the cloud 2013 is when docker was launched 2014 kubernetes launched most of the people i've talked to say kubernetes has won and in 2016 i would suggest is the year that the great migration began in earnest so there's satya nadella microsoft ceo why is he smiling i credit satya with really changing the vector of enterprise interest in cloud because he put his company poll hog toward cloud with microsoft azure and they have had a huge impact that the fact that they went all in on microsoft azure cloud has really changed minds in the enterprise i think that was the determining factor quite frankly and i do think it's kind of funny that we can thank microsoft for saving us from the monopoly of amazon web services well cloud is now number one that's a classic line from john loopy card remember when he gets taken over by the board it's real resistance is futile i don't think there's any doubt about it there are lots and lots of reasons for that think of the companies like sales force for example 20 years those guys have been around it's not the tallest building in san francisco is the sales force building of course mark vanioff came out of oracle so he cut his teeth there well what does this all mean you know speed and performance is a huge factor so if you have these these sort of on-premise environments and you're trying to leverage cloud computing you're trying to leverage sales force you're trying to leverage some of the maybe marketing technologies that are out there it's now the martech 7000 by the way if you want to look that up it's a very fun thing to explore martech is in marketing technology i think about 10 or 11 years ago it was the martech 150 then it was the martech 250 then it was the martech 350 then 500 then 1000 then 2000 and 3000 then 5000 and now it's the martech 7000 meaning there are 7000 companies doing sales and marketing automation as software as a service that is a staggering number is a tremendous amount of data out there that you can leverage for your business to get that complete custom review for example to understand what the market opportunities are to get your messaging out there to sell stuff there's also this interesting dynamic of cap x to op x right it used to be that we would get data warehouses built with capital expenditures millions of dollars would be set aside for an 18 month or a two-year project to build an enterprise data warehouse well companies like snowflake have come along and just dramatically transformed that reality and of course amazon red service red amazon web services with redshift i actually remember the company that uh whose technology became the kernel of redshift the company called pair excel barry zane rick glick what 10 years ago we're working on this technology they struck a deal with amazon pair excel actually went away redshift is still here to stay and that's a whole movement right now for data warehousing in the cloud right so there's another force moving us toward cloud computing and i would suggest that these days of cap x projects really are numbered so this movement to the cloud i'm suggesting really should be a front and center all hands-on deck type of activity to make sure you get the right data into the cloud in the right fashion and you manage that process accordingly so think about all the different major vendors that are in the cloud now sales force amazon oracle microsoft is your sap with its business cloud platform ibm of course google the google cloud platform we mentioned that here we go rack space and there are more coming every day the cloud is the new center of gravity we're going to hear that in just a moment from george demarest of unravel data so i was thinking about how you manage these environments right this is actually a fun little slide just an image of a a b with a transponder on it iot is such a huge space these days the internet of things being able to track all the different objects that are out there whether it's cars or mobile phones whatever the case may be think about manufacturing think about oil rigs think about all of these different use cases for the internet of things it's a huge space right now and there's a ton of data flying around if you're trying to keep track of 10 000 vehicles or of 150 000 mobile phones or of five million mobile phones for example you've got a mountain of telemetry data flying around well that's going to live in the cloud i mean let's just be honest that data going forward it's going to start in the cloud it's going to stay in the cloud being able to manage all that stuff it's going to be a real challenge and i can tell you we'll talk about this for just a couple quick seconds here systems management has always been hard if you go back even 20 years or so ago one of the main drivers for dealing with systems management was wait for it christmas because the holiday season would have such huge spikes in traffic on these e-commerce sites so that's where a lot of the technology that we get these days like even if you talk about containers for example workload balancing all these major developments in e-commerce and in web-based scalable applications and infrastructure really spun out of the holiday season because they needed some way to handle these massive spikes in traffic without choking off their customers to be able to sell their wares so i just think it's fun that christmas and the holiday season is really what would help us but the point is that 20 years ago systems management was difficult 10 years ago systems management was difficult these days when you start dealing with all of the major scale out applications that we'll hear about in a minute kafka for example which of course spun out of linkedin kubernetes all these different technologies these scalable big data technologies had duped for example even though we don't hear the word had duped is still rocking and rolling out there these are incredibly complex environments and being able to do systems management troubleshooting workload balancing being able to estimate the right number of servers that you're going to need the right number of nodes to handle all this stuff it's a really difficult stuff typically you're just looking at histograms and it actually requires a whole lot of knowledge on the end user to be able to piece that stuff together when cpu usage goes up when there's a network bottleneck for example knowing why that happens doing that kind of troubleshooting has always been very very challenging that actually got me thinking i don't know if you all remember from calculus class but in calculus you have to recognize the nature of the challenge of the nature of the of the problem you're trying to solve and then you apply the appropriate tool or the appropriate formula to kind of unravel that there's a little pun for you and it got me thinking that's kind of like systems management the way it's been over all these years well something is changing something big is changing it's a combination of artificial intelligence and just some really clever architecture and really clever of thinking on the part of some innovators and i would suggest to you that systems management going forward is going to be a different game so we'll talk about this on dm radio all throughout the year in various capacities ai is changing everything big data is a huge driver and i would argue that the days of the old-fashioned way of troubleshooting and of managing complex information environments the old days are changing and a new day is dawning and with that i want to hand it over to george demarist of unravel data very very interesting company doing some cool stuff and george i'm going to try to hand you the ball right there and you now have the ball show us what you got george thank you eric thank you everybody thank you for joining us eric and you confirm you see my slide yep look good okay there's me actually if you move around you'll see that my eyes follow you in that photo so yes i'm from i'm from unravel we are an ai ops company for big data um we do performance management and troubleshooting and all sorts of things for big data environments so thank you for the opportunity to to talk with you today um the topic is uh is about um big data moving to the cloud uh there's uh there are a lot of reasons why uh generally people are moving to the cloud um but there are some particular forces at work that make big data uh especially uh especially suitable uh to running in the cloud and that's what i'm going to talk about today and then a little bit about um some of that next generation intelligence that that eric just uh just spoke about so that alignment is as eric has said is already underway uh the most recent uh cisco cloud index shows that by 2021 95% of all data center traffic will come from the cloud huge growth uh in data center traffic also um a larger and larger proportion of companies that are either uh that either have cloud first or cloud only strategies you have newer uh companies that are that start in the cloud and stay there so this is already happening and for all the reasons that eric mentioned it's uh the systems management is is nicely taken care of in the cloud environment uh it is uh simple to set up or fairly simple to set up so this is already happening uh you know cisco's projection is that that cloud computing will replace traditional data centers within three years that seems a bit aggressive to me but you get the idea this is um this is the this is the gravity that is pulling really all companies but in the big data world um gravity um is especially uh important um you probably heard people talk about the fact and especially in big data the data has mass it has gravity and there's also a certain amount of organizational and technological inertia about moving uh to the cloud um so and people are spending a lot of time analyzing this that you know that people have come up with uh with formulae uh about application mass based on data volumes and data density and cpu utilization and memory and disk usage uh that the data gravity has has is a function of the mass of data and applications and the number of requests per second um latency and locality of reference always uh are part of the equation and just sheer data volumes and then finally um because um much of our data right now most of our data and in the future um a vast majority of data will be originated in the cloud um that that's sensitive data that you spoke about from cfo's and and whatnot will most likely for many remain behind the firewall in in data centers but big data um is going to originate from the cloud and so uh so a lot of people are asking themselves why am i moving massive amounts of data in you know inside my firewall especially data that's not particularly sensitive i mean if you have 10 terabytes of telemetry data uh how sensitive is that i mean you need to write applications to make any sense of it so um and people are doing a lot of calculations about how much it costs to get data in and out of the cloud um and of course the cloud vendors they make it as inexpensive as possible to move data into the cloud um but you need to pay to get it out of the cloud so there are also those types of forces at work that uh that make uh moving big data operations and uh applications to the cloud so big data uh itself is still fairly young it has been around and we we know tons more than we used to know but it really has to be remembered that you know really only six or seven years ago did hadoop come out of the shadows from yahoo and and companies like that google and so forth that these types of technologies were used by just those web and cloud giants um and the problem is that when you get data volumes of terabytes or petabytes or exabytes that it is basically uh data chaos so what we try to do in it of course is to um is to create some order out of that chaos and the first uh entrees into big data were really fairly uh you know they offered one level of abstraction and they were very specific use cases to uh search result indexing and um you know counting um you know iot is a later example but lots of data just being organized and counted so um so for in the world of astronomy especially you know before telescopes and before science you know the constellations were a way to look at the stars and see some order in that vast chaos uh in the big data world um it has grown up around especially around open source projects um many that originate out of google and yahoo and so forth um and you can see them on your screen now patchy spark uh kafka um also the the cloud providers themselves but impala and hadoop and uh hbase and so forth uh a whole set of constellations around uh the big data environment and the problem is that in doing management and doing performance tuning and doing troubleshooting um there has not been a way to um to make the connections uh between the different components and how those connections affect performance until very recently so this is an example of uh an architecture of uh distributed systems that new data applications are built upon so a data ingest data collection batch processing etl data prep and then finally analytics and more lately ai and machine learning and iot and so forth and so this i mean there are a million permutations of this but a lot of people are looking at a big data stack like this um and it's uh in order to get order out of that chaos like the universe is actually a 3d picture or a graphic of the known universe um but it doesn't really work unless you have the metadata so um that brings some order to the chaos of the universe and something similar is happening in uh the big data world the challenge is that when you have multiple distributed systems so spark uh impala kafka uh hadoop all uh distributed systems all trying to work together clusters of clusters and so forth so it makes it very hard to identify a root cause let's say my application has failed or um or it's very difficult to define realistic uh slas because of this complexity in managing and tuning distributed environments and that that in turn um makes it uh you know a lot of finger pointing between developers architects ops people uh cloud providers and so forth so um very hard to identify a root cause because there are so many contributing factors beyond just the distributed clusters the the multiple clusters um it could be uh container issues it could be uh data and file formats it could be uh some infrastructure um hardware problem it could be scheduling it could be wrong network settings you might have the data laid out incorrectly there might be bugs in the code so just to show you the what is uh the really confounding nature of big data it also makes you know big data a big area of frustration for cio is because it's very hard to predict it's very hard to know um if you're getting the optimum performance out of your big data environment uh CEOs and and so forth they love the idea of big data and machine learning and AI and everything it brings but practically speaking it is um it is a very complex problem it is a hornet's nest to to navigate for it ops people for application architects and so forth so the other reason um that people move to the cloud of course is just performance is um if you take your largest computer um once you've run out you know across that that computing boundary and need more computing power um you're kind of stuck so that is why clusters have always been i mean since i started my career with vax clusters back in you know i don't even want to say the year um that we've all known that cluster computing uh is the ultimate goal of computing uh hundreds tens hundreds thousands of computers working uh in harmony working on uh compute problems uh in a coordinated fashion um it has been done on the macro level with smp systems how do you scale in a single machine um then to small clusters and now to the cloud so um but the the impulse of course is to amass compute power so that you can exceed the speed of any individual computer um so that is what you get in the cloud environment in order to get you know acceptable results or real-time AI for instance which is one of the hardest compute problems to solve um also uh scaling uh AI and machine learning uh to millions or or hundreds of millions of users is of course can only really be addressed by distributed computing clusters and that is the essence of big data it is a distributed computing problem with distributed data so um i think people in the big data world all sense that we are having in the inevitable direction of computing we've been working on for 20 years or for for longer um so um we have now the cloud that offers these clusters um and i know this will hurt your eyes for a bit but just give me a second um cloud vendors are providing a lot of tools and a lot of capabilities to make these computers run uh in an automated fashion be able to spin up large clusters very quickly but just automation of systems management is not enough uh in the especially the big data world that is increasingly clear that you need context um with automation and that means AI and ML so the some key areas of big data um that require automation that is informed by AI um are application auto tuning um application tuning as i mentioned hugely difficult um and if you have a hundred uh applications running on a thousand node cluster um trying to tune individual applications manually um is more and more becoming a fool's errand um there has to be better automation and it has to be informed um so that silly human errors or you know one parameter one tunable parameter setting correctly or uh the wrong spark shuffle configuration or what have you um if that is not informed um by better intelligence uh or an expert system like unravel um then you're really uh really in trouble same thing for root causing problems with the spale of a failed application if you have a uh a spark application running on 20 or 200 or 2000 nodes um you know you can imagine the complexity of trying to track down uh you know code failures and um improperly set memory boundaries uh container issues and so forth i spoke before about uh slas in the big data world and for a long while um really even only three or four years ago there was uh kind of a prevailing opinion in it circles that that big data just isn't a production technology that it is it's either experimental or it is uh it will tolerate its failures because the potential gain the potential new revenue streams the potential for big data is so great but nowadays with big data becoming more uh more proven uh more relied upon um slas become a much bigger deal and of course um your ops teams they they live in an sla world so sla management especially for applications with uh with data in motion streaming technologies and so forth um if that is not guided by um by machine intelligence uh also very difficult to get serious about slas um for these increasingly important big data or modern data applications or how about optimizing the cluster itself i mean do you it's hard to i mean you get some basic usage information about your cluster um but what if you are able to monitor your workloads and have your cluster learn about those workloads and adjust itself or make recommendations for you to reconfigure uh container sizes or cloud instance types uh or or what have you so a lot of different uh a lot of different moving parts in the cluster and trying to get the most out of the compute power the memory the the data throughput um really requires machine intelligence and an example of how this how the running of big data and ai and ml applications itself and trying to tune them becomes a big data problem that companies like unravel are using data science and using machine learning ai and really a raft of um of other analytics approaches ai is is the easy um kind of kind of buzzword but there are many different um you know algorithmic and analytic approaches to to for instance uh auto tuning applications so um so applying uh ai to root cause analysis requires collecting the data creating the models uh training the models and then running against uh your your uh production environment creating a predictive models in order to be able to uh tune the cluster or the application or the stage of the application so you know in a modern data pipeline it may be you know five stages of um of spark with some callouts to you know to other processing engines so forth so creating a data model is becoming really a critical in order to understand the running status of a cluster and of big data applications and that in the end is what we do so a little bit of a commercial for unravel I hope you'll pardon me but um unravel is um it's five years in uh with with the dna from cloud era um from we have uh phd's from uh from duke university our founders very smart people we have uh we have input from uh hortonworks people from cloud era and app dynamics and uh we are um well funded we have some great customers and we have you know more than 50 000 nodes being supported now with uh with our investors noted below but our solution is in a nutshell performance management for modern data applications or big data um and we unlike individual ai ops technologies that focus on a on a particular component or problem um we monitor and tune and automatically troubleshoot um the full big data stack that I showed you before um we can use ai and other um other analytic technologies to monitor and optimize resources and costs um one of the things especially relevant today's discussion is uh migration to to the cloud from data centers we we have some specialized intelligence that can uh help you move to the cloud and then as I mentioned automatic tuning and remediation so that's unravel in a nutshell um here's our product architecture which looks a lot like a big data application architecture but you see on the left side um that uh that white box is our is our data source our data source are the applications themselves the platforms and technology spark coffee a cloud um cloud era impala hadoop no sequel sequel etc and uh even though our topic today is is cloud we are um we support uh cloud environments on-premise and hybrid and multi-cloud environments so same product across all of these environments and what we do is we collect in a very efficient way data from all of these environments um in our in our data collection process we then build a correlated data model um that is a dynamic data model is that is constantly being updated depending on the state of the cluster and the state of running applications and then we have a number of intelligence and automation engines that we apply to that to that data um so uh running analytics i've spoken about we have an automation engine we have a tuning engine and an inference engine and you'll see i'll show you a couple of screenshots that we actually make detailed recommendations um for spark and for mac reduce and for for hadoop and so forth and then finally the output of um of of all those uh all that automation intelligence is displayed in dashboards or uh we can take automatic action to auto tune applications or to kill a job if it's uh running crazy in the cluster um we can do smart alerting over slack or pager duty or email uh we do reporting um and as i mentioned we we give very specific recommendations um uh on running applications and the cluster operation environment so that's our architecture like i said it looks like a classic big data pipeline in a sense and and that really is what it is we are a big data company tackling big data uh with ai and ml and other analytics so um just to show you um the problems we saw we spoke about uh apm for big data and we also fall in this uh ai ops category that the gardener is speaking about but specifically for the big data environment the full stack i mentioned cloud migration optimization and troubleshooting so i think i've covered this a little bit but let me show you specifically what that looks like so uh from an application context um the the the screen grab on the left is actually the the auto tune pop out from the product um you can see the tuning recommendations um on on the top right so map reduce parameters spark parameters we um give you recommendations for the optimum value not just based on a set of rules not just a rules engine but also based upon the running characteristics of um of the application uh for instance uh unravel has a as a feature called sessions and what that does is it enables you to run your applications your big data applications a number of times and we will um progressively collect more and more data so that we can make more and more targeted and accurate recommendations and auto tuning of that application we provide air reviews and all sorts of analytics that i just don't have time to talk about but then we also um look at the uh the operational level of the cluster so we do cluster optimization you can see here um recommendation um for uh for hive queries um we also provide predictive capabilities capacity planning and forecasting so um that is especially important for the cloud as i'm going to show you one second but also um helping you understand how much uh how much compute power but also how much you're spending um either on-frame or or in the cloud and this uh here are the cloud capabilities in particular this is uh only one kind of one sort of angle on unravel but it's very specific to clouds that's why um i have included it and so what we uh from top left um so the in order to to really do the the best job of migrating your applications um unravel helps you understand your current on-premises workloads so you can run the unravel cluster discovery port and get a readout of you know how many cores how much memory uh how much data uh is is running in in your data center and then we can provide some analytics to to clue you in on which applications are the most likely or or uh or the most profitable if you will to move to the cloud so for instance apps that are bursty or um apps that uh have uh fail all the time maybe they're contending with other applications in your cluster so the cloud gives you the ability to isolate applications and that's that's a good reason why bursty applications or uh or resource hog applications um are good candidates um we can also um show you um charge back reporting that's the the third the the center you can see the that um the colorful circles there um by application by you can tag your applications so it can be by department by users by so forth very configurable um and then we provide even intelligence to map your on-premises cluster uh to deployment in the cloud and that is that includes aws azure and uh gcp um so we can give you some recommendations on which cloud instance types how many uh how many uh nodes you need um so that calculates the number of v cores and the amount of memory available to you um and we give you a number of different looks if you want to just do a lift and shift to the cloud we provide you guidance if you want to optimize for cost or if you want to use our advanced analytics and ai to to right size it for your particular workloads we provide that capability as well and then finally um some of the classic apm capabilities we do um enables you to track the migration and its success and how much you're saving and and and so forth and and also of course um application usage of of cloud resources so you always know um which users which applications um which data pipeline so forth are using uh the most uh resources and and so forth so that is a look at what unravel looks like um just to kind of sum it up the the benefit of having ai um applied to uh big data is that it removes the blind spots removes the the uh unknown unknowns uh in the in the big data ecosystem um that means that troubleshooting uh which in the big data world it's not unusual uh for uh for troubleshooting debugging to take weeks or even longer we have customers that have um cut down their troubleshooting time um so drastically from literally from weeks to troubleshoot uh what might turn out to be a simple uh configuration problem with with Kafka or Spark um to instant results to show you oh that user is allocating way too much memory or your container sizes are way out of whack um some things that you can eventually figure out um but but it isn't it better if uh if you have um an ai tool like unravel to to help you so 98 reduction in troubleshooting time um you can set SLAs and meet SLAs and unravel will actually even give you recommendations on you know okay i need to hit 100 of my SLA or 95 or 80 or whatnot sometimes you can tolerate a little bit of uh SLA violation and uh unravel enables you to be very custom about your SLAs and then that eventually means that if you're getting the most out of your cloud resources and you're basically getting optimum usage for your compute power for memory and and data and so forth that your cloud costs are going to be uh sensibly reduced and by you know in many cases by orders of magnitude with our customers but easily by 60 percent um across the board so that is unravel and that really kind of brings us to the end of the uh prepared section of um of this presentation the planets are aligning for big data in the cloud the irresistible forces are of course the gravity the mass the the the fact that it's expensive difficult and slow um to move data around so because data is being originated in the cloud a lot of people are going to elect to leave the data in the cloud and and have the processing done there um finally clusters systems are great um we know that there are you know there are in the you know close to probably 10 000 um you know big data you know huge clusters around the world that in the next probably five or so years is going to multiply that many more customers yourselves probably included are going to be running much larger clusters because the automation and the intelligence will be there to go faster than the speed of light and then so that's really where we fit in and that's um those are the irresistible forces pulling big data to the cloud and with that Eric I've kind of ended my my section of the of the presentation what do you got sure sure things so you know it seems to me that the obvious selling point here is the remarkable complexity of these environments right you're just not going to be able to use traditional systems management technology to be able to track across these clusters because a there's so much data but b there are so many potential variations and that's why you want algorithms running in the background all the time tracking all these developments and finding the signatures of problems right that's really kind of what it pulls down to is to find the profile of a particular issue save that and then you recognize it next time it comes around right yeah exactly it's the problem is just becoming too big for for manual intervention and I think most people know that that that the end game of automation is so you can do more and with big data you know when you're you're talking regularly now about hundreds or thousands of cluster nodes and it's just not practical anymore to to manually intervene you need high levels of automation that are guided by machine intelligence yeah and back to that calculus metaphor it seems to me that this is the answer for kind of moving us beyond that phase because unless you're a math genius calculus is actually pretty difficult and I would argue that systems management even with some of the best technologies a few years ago was very difficult because again you had to have an end user who could really understand what the spike in CPU usage means relative to this bottleneck in network traffic relative to this disk fragmentation issue or whatever you would just be looking at histograms and trying to piece together in your mind what all that stuff means whereas by leveraging AI under the covers you're able to scan I mean you talk about sessions we should maybe go into that for a little bit you're able to scan these environments hundreds and thousands of times and look for patterns and variation and spikes and valleys and so forth and it's always in those those strange seeming anomalies that you're going to find the trouble spots but again a human being trying to do that manually it's just not going to happen right can you talk about the importance of sessions and the importance of gathering so much data and being able to analyze so much data at scale yeah yeah so that I'm also looking at the question that came in talking about the cloud being you know being appropriate for for applications with data spikes and so forth the the challenge is that that that CFOs and CEOs and and CIOs to a degree they have they're more and more being directed that either cloud first or or cloud preferred or cloud only so so that really all kind of points to the same thing that that the people need to make cloud as fundamental to their IT strategy as any other IT technology and we talked about the the mark tech 7000 just briefly and for those who aren't familiar again that's 7000 different applications for sales and marketing optimization there's a whole bunch of new stuff these days like influencer marketing for example but just thinking about tapping into the twitter api or a facebook api or the linkedin api for example and for those who don't know Kafka came out of linkedin so Apache Kafka was the engine that actually drove linkedin it's a giant messaging bus basically and all of these big data environments have their own unique architectures and their own signatures so to speak so what unravel is able to do is basically tap into lots of these different environments and give the end user the company that's leveraging this technology a view that that's sort of single pane of glass across incredibly diverse and topographically disparate environments right yep for sure so i uh i see another question eric do you want to read the question you want me to sir i'll read it out so with much of the management being done by the cloud provider is unravel as useful in the cloud as it is for on prem data centers that's a good question yeah so i want to reemphasize that that we are not a systems management company may we do stuff with the with we do collect intelligence around the infrastructure but the cloud providers do provide some great systems management tools and monitoring of that environment we're also not just a monitoring technology we we are focused on the automation and also get collecting intelligence about running applications and they may have many different components the Kafka might be hdfs might be map reduce might be spark might be tez might be lots of different big data components and none of the none of the cloud providers are providing you know any kind of real intelligence about that many components much less trying to correlate their behaviors and their performance and that is what you know that is a very for one thing we know for sure it's a very difficult problem takes a lot of takes a lot of of engineering takes a lot of of intelligence to do that so yes the cloud providers do systems management and they do some monitoring but they're not going to auto tune your data pipeline that may have many different distributed components and even the individual management components from big data vendors themselves you know cloud era and map are and hortonworks have some nice tools to monitor hadoop and cluster environment they are not doing the cross environment kinds of intelligence gathering and automation and auto tuning that we do so we are we're pretty unique in the industry and we started out as basically AI for hadoop technology but we have branched out now to cover the entire big data stack and we're we are adding new engines so you'll see some new stuff coming out of unravel that monitors AWS technologies that monitors azure big data technologies I can't pre announce them or my product manager will kill me but so we and the nice thing about the our approach is because it's you know it's really kind of a big data approach we can continue to add engines and we can continue to add finer grained intelligence about spark and about Kafka and about HDFS and so forth so we can go very deep and provide a very high definition picture of your environment that you just can't get other ways and and the debugging of those types of environments means you know pulling down systems or application logs and other operational metadata from the cluster itself and you know garbage collection and JVM stuff and all that stuff is analyzed by unravel and and as far as we know no one else is kind of doing it the same way yeah and you know I'm reminded of a couple of things one you mentioned spark a few times and spark of course really took the industry by storm a number of years ago and provided a tremendous framework for leveraging big data applications but spark has its own challenges right I mean memory usage is a challenge and when spark jobs fail you know typically they're failing in really serious critical environments right so being able to understand the nuances of when spark will fail of when these jobs will not complete that's really important for keeping everything running I mean you talked about troubleshooting going down by 98 percent I mean that's just a staggering number but the key is that because you're collecting data from all of these different data points and because you're able to then correlate them and again in the background use machine learning to scan over and over and over again and remember with a lot of these technologies you'll be able to identify the profile of very complex problems and then over time you're kind of building up a repository of signatures is that right that's right that's really behind the the data model I spoke I spoke of so that as we learn more and more about the spark environment you know spark as a platform component as well as running spark applications that we fill in more and more of the picture and given that spark example and anyone who's been dealing with spark knows that a lot of failures are really from you know kind of mundane things like you know improper memory management or just you know shuffle boundaries and things that you know if you check beforehand you might be able to catch but oftentimes you know especially if you have a portfolio of dozens of applications you need to be running and managing a lot of these things just won't get caught so that's kind of the you know removing human error part of AI that there are simple things that and some not so simple but things that are very commonplace and and spark failures I mentioned and spark slowdowns are often the result of very mundane things or just you know not great behavior from app developers or what have you and unfortunately for a long time it has been a trial and error type of situation for people to size containers to to allocate memory for spark processes and so forth that is the type of thing that just needs to be automated and have people not worry about and that is that is our goal. Yeah right and you know I'm reminded that I wanted to include a slide with a picture of the the Eagles album Hotel California just as a metaphor for getting data into the cloud right you could check out any time you like but you can never leave well you can pull your data out of these cloud providers but that's when they tax you right it's on the egress and so here again it's really important to understand in terms of your cost projections and then your total cost of ownership where you're going to be moving data and it's really important to have that sort of cross landscape view and that's where I see unravel having a really nice advantage because you can peer into all these different environments that I listed I think 10 or 11 major cloud environments these days and to your point when you stitch together four five six seven of those different environments the complexity goes like right through the roof and there's just absolutely no way human beings going to be able to handle all that stuff and so a technology like this that can again leverage the power of machine learning to scan these systems and scan and re-scan and re-scan and notice the disparities notice the spikes notice the anomalies that's going to be key for making sure that your your big data applications continue to run right yes and add a level of complexity on top of that that we're seeing more and more people wanting to run hybrid cloud and multi-cloud environments so it's not like every customer is running just on AWS they're running something in the data center they're running something on AWS something on Azure and something Google and that becomes a logistical nightmare so we are the same product on all those environments and we can collect intelligence and make recommendations and auto-tune on all those environments with the same code the same product the same UI so that is an additional level of complexity along with the complexity of a modern data pipeline and I was impressed to hear that number you threw out there's something like 90 percent 95 percent of data center traffic in the future is going to be in the cloud it's interesting we went from talking hybrid cloud and I think about four weeks later and it switched to multi-cloud because that's the reality I mean if you look at for example SAP going all in on F4 HANA which is ERP in the cloud in memory very very intense environment can you guys see into SAP's F4 HANA as well? Not currently but as I said the the architecture of Unravel means that that we can really analyze any environment I mean if if we wanted to do MySQL or or some mundane technology the the architecture supports it so who knows in the future I mean we're still fairly young but we have done a ton of work and we cover really we've done a lot of research on you know who's using what out there today and big data and we think we have great coverage it does evolve and there are interesting technologies coming out of Amazon Google and Microsoft so we are partnering with all of those companies so just kind of watch this space because you'll see some more interesting cloud news coming from Unravel in the future. Yeah and I'd like to throw one more last kind of a bigger question at you you know through the history of this environment of this industry and certainly the dataversity crowd we have technologies like data warehouse that I mentioned early in the webcast here business intelligence for example all of these disciplines revolved around a single premise which was that we can gather our data and get a better idea for what's happening in the world and fuel some insights that we need to make some better decisions there is this whole new approach of tackling that I refer to it sometimes as real-world data at scale or some folks are talking about it as alternative data so there are lots of examples of this you saw this kind of early in the hadoop movement of companies using satellite imagery for example of traffic patterns going to shopping malls and using that to determine roughly what the the football is going to be in the mall that day and roughly what the projections are going to be we're going to see more and more of that kind of activity where big data at scale is leveraged and that's a whole different way of viewing the world right but again it's going to require tapping into all these big data environments and being able to unravel them so to speak and understand what's going on do you see that as a pretty significant market trend going forward this sort of alternative data movement certainly it's it's I think people in the big data world understand that these do you touched on data warehousing for instance and so data lakes and data warehousing are often the very first big data use cases because a you know data warehouses are expensive so if you can offload let's say the ETL portion of your data warehousing process that could save you millions of dollars and data lakes you know basically become you know a macro data warehouse if you will this huge collection of data that you then apply structure to I mean that is really the process of big data developers and and an architects is getting taking all that data and providing structure to it so the tools until recently just haven't been there and the understanding of machine learning and AI has just come on and leaps and bounds in the last five years same with data science so yes I there's so much untapped potential in big data and now the tools are becoming available to really become very very imaginative so as I said earlier in the webcast I think people in computing have always thought that you know clustered you know coordinated computers that are that are intelligent and and self-tuning and self-maintaining and you know lights out data centers and so forth has always kind of been the goal but now it actually looks like we're on the horizon of being able to tackle these much more difficult problems and but in the end it's going to be it's going to need those those new advanced technologies in order to make those advanced technologies successful in the end yeah and with that I'm going to hand it back to Shannon camp thanks so much for your time today folks great presentation thanks to unravel data Shannon take us out Eric thank you so much George thank you so much for this great presentation and thanks to unravel for sponsoring today just a reminder to everybody I will send a follow-up email by end of day Friday for this webinar with links to the slides and links to the recording of this session and thanks to all our attendees for being so engaged in everything we do we just love the questions and that have come in and thanks everybody George Eric thank you so much bye bye see you guys