 Hello and welcome, my name is Shannon Kemp and I'm the Chief Digital Manager of Data Diversity. We'd like to thank you for joining today's DM Radio, High Velocity, the What, Why and How of Streaming Analytics sponsored today by Zoom Data. There's a deep dive in continuing conversation from a live DM Radio broadcast a few weeks ago, which if you missed, you can listen to it on demand at dmradio.biz under podcasts. Just a couple of points to get us started. Due to the large number of people that attend these sessions, you will be muted during the webinar. If you'd like to chat with us or with each other, we certainly encourage you to do so. Just click the chat icon in the upper right-hand corner for that feature. For questions, we will be collecting them via the Q&A in the bottom right-hand corner of your screen. Or if you'd like to tweet, we encourage you to share highlights or questions via Twitter using hashtag DM Radio. As always, we will send a follow-up email within two business days containing links to the slides, the recording of this session and additional information requested throughout the webinar. Now, let me turn the webinar over to Eric Kavanaugh, the host of DM Radio to introduce today's webinar and speaker, Eric. Hello and welcome. Hello indeed, ladies and gentlemen. Thank you so much, Shannon. It's time for another Deep Dive with DM Radio. Yes, indeed. We're going to have some fun today, folks. This is one of my favorite topics for all kinds of reasons. High Velocity, the what, why, and how of streaming analytics. Let's go ahead and dive right in. We're going to hear from Mike Allen today, a very animated and knowledgeable person in our business. This is a VP of product over at Zoom Data. As many of you know, who have attended these past webcasts with us and the other webcasts that I do, Zoom Data is a really fascinating company. They've taken a whole new approach to analyzing data and to visualizing data, including streaming data, real-time data. So let's dive right in. Streaming analytics. Who cares about streaming analytics? Not this kid, apparently. But really, it depends heavily on your business model. So if you look around, and I've found this to be true now for the last 15 to 20 years, at least, in this business, in the business of data management. In fact, many years ago, the first article I ever wrote for DM review, and that's actually where the DM came from for DM radio, data management, DM review, way back when in 2000, and I think two, I wrote an article about real-time data warehousing. Well, real-time data warehousing back then was nothing like streaming analytics today. But even way back then, I recall talking to Stephen Broves, who is still, I believe, one of the senior people over at Teradata, and he said, if you want to know where you're going to find these really innovative technologies, it's always the most competitive markets. So he talked about grocery stores, for example, because there's such competition in that business, you need any edge that you can get. Well, to think about what's happening in these days in this market and the information economy, we have tremendous amounts of competition and it's coming from everywhere. We'll talk about that later in this webinar. But the point is, any time you have a highly competitive market, that's where you're going to see a likelihood of value from streaming analytics. Of course, anything or almost anything, that's web-based. Think about all these different web engines these days and the tremendous amounts of data flying around. But you need to be able to access that data and analyze it, and you can't wait a day or a week or a month to analyze some of these data sets. You need to know right now what's going on, because if you don't, you're either going to miss opportunities or you're not going to recognize threats and both of those are really bad news. So any data-intensive business, you think about financial services. For example, think about healthcare. Think about all the IoT devices in healthcare environments. Being able to track in real-time or near real-time across multiple geographies, hundreds, thousands, even tens of thousands or even these days millions of devices. Just think about the raw tonnage, if you will, the virtual tonnage of that data. Data does have gravity. It does have momentum. We talk about the velocity side of the equation. We'll hear about that today as well. You need some kind of capable solution to be able to handle all that stuff. And the bottom line is the traditional data systems, data management systems are just not going to cut the mustard. So any high-speed industry, obviously, and cybersecurity, cybercrime, think about the headlines in the last year or two. For those who aren't in that particular space, I can tell you that it's a very challenging environment. And one of the perhaps misconceptions about analytics in general and even predictive analytics, for example, even machine learning and artificial intelligence and all this really cool cutting-edge stuff is that the best algorithms are never going to be right all the time. All they're ever going to do is give you a probability that something is the case. So especially in the world of cybercrime, since, in fact, many of the people who get in and do bad things do so by proxy, they do so by misrepresenting themselves so they look like a normal user. Well, because of that, you really need to have a very robust system that can tackle tremendous amounts of data and allow you to slice and dice that data in real time or near real time. It's just absolutely mandatory. So timing is everything, right? That's what they say. Well, it's especially everything today. So I brought up this metaphor to drive home. Sorry about the pond there. It didn't mean to do that. The message that in this kind of environment in Formula One racing, for example, think about how critical every second is. When those drivers have to make a decision, they have sub-second response times that they really have to adhere to or they're going to be in big trouble. Think about the amount of data that's streaming through their brains as they're cruising around this track. Now, a lot of that stuff is unautomatic so much, but nonetheless, you can rest assured they are paying keen attention to their entire environment. And any small change in how they handle that vehicle can have a huge impact. So I bring this up as a metaphor just to point out how serious this stuff is, how important it is that we pay attention, how important it is that we really keep our focus intently on the road and on where we're going, and on our environment, right? Context is so important in any environment, especially in a place like that. So this is one of the coolest stories I've come across in a while, and what I'm trying to do here is offer some groundwork for you to understand why streaming analytics is important, why it needs to be addressed, at least investigated. So yes, the Knights Templar invented banking, at least according to this one theory. If you think about how much banking has changed in the last, let's say, 20 years, well, in the last 200 years as a sort of side note and interesting context, if you will, at a later event, time was it was really mostly countries that borrowed money, and the only time they would borrow money would be to prepare to go for war. Well, my have times have changed. Now we all borrow from banks to buy houses, or many people do, and cars and other things, too, so banks are still incredibly important in our economy, but they are in the crosshairs. We kind of saw that in the meltdown in 2008, way back in the summer, the oil prices spiked up to $147 per barrel. The highways were empty. I don't know if anybody remembers that time, but it was a very strange time in American history, quite frankly. But that was a very early warning sign, the classic Black Swan event, and it's the kind of thing that you really don't want to have happen. So you want to be aware enough of your environment, aware of the dependencies that you have in your organization, aware of the interactions you have with somebody at scale, and if you don't have that, you're in trouble. So think about banks, cybersecurity, fraud detection, market awareness, and what's the most important one these days, customer experience, right? Banks need to be analyzing all their customer data, customer behavior, for example, to understand when someone might leave, customer behavior to understand who might be a good prospect for that next financial product. So if you think about banks, they need to be analyzing, and again, traditional systems, batch-oriented, where it takes a day or a week or a month to see some report, that's going to be less and less relevant in these highly competitive industries. And that brings me to a really interesting and compelling example. So if anyone doesn't know about Zell, it's quick pay. I came across it on my phone. I'm a Chase Bank customer, and I saw it one day, and I was like, what's this? So I looked into it, instantly, to any number of people, either by mobile phone or by email. Now they do have to sign up. Not all banks are participating in this, but there are several at least, I know. And the point is, you can send the money for nothing. No fee whatsoever. It's just a value-added service now that's part of what you get with your banking. So like I say, Chase Bank has this on my phone. They have it online. I can send money to people very easily. And for someone like me, that is such a wonderful thing, because the post office is my nemesis. I don't know why I just have a hard time writing out checks, putting them in envelopes, getting a stamp, walking out to the mailbox. For some reason, that's a difficult thing for me. But doing 17 or 12 or 3 or 5 taps on my phone is not that big of a deal. So we now pay people by my phone by Zell Quick Pay. Well, think about Western Union, money transfer. That's their business model. So imagine if one day, all of a sudden, someone comes along and does what you were doing for money, and they do it for free, and they do it at scale. Well, that right there is digital transformation. That is tremendous disruption. So what are the incumbents going to do? Like I say this, the smart ones are looking into technologies like streaming analytics. They just are. They're focused on this stuff. They recognize they see the train coming down the tracks, and they're focused on solving that problem and finding new ways to innovate. We'll talk about some companies that have done an excellent job of that. So think about Telco. When I age away here, because I remember early in my business days is when, well, actually before I was in business when they broke up my belt, well, what a big deal that was. I can tell you as someone who does online radio and now terrestrial radio, usually by the phone, someone who has done 2,000 or more webcasts over the years, I can tell you audio quality is really pretty important. And I can also tell you that back in the day, those old copper wires, they were used for audio quality. Then what happened? We had the R-Box. Remember those regional bell operating companies? We had mobile come along, which just disrupted the R-Box tremendously in fact to the point where they started buying them up, right? And all these mobile companies are running everything. Well, mobile is nowhere near as good as the old fashioned landlines. So here we have a degradation in quality as we move forward through time. Well, that doesn't seem to make sense, does it? Well, you know, this is a huge progress result in better quality. Well, now, of course, mobile are getting better. They're getting better at this kind of stuff. But still, if you are a mobile phone provider and you want to keep happy customers, well, what do you need to do? You need to focus on making sure that the audio quality is strong. You want to make sure that you can communicate via tower to all these different cell phones everywhere. Think about the amount of data, not just the telemetry data, but all the other metadata around these environments. Think about the criticality of understanding what's happening. Being able to see into these systems and understand what's happening and where things are happening, that is tremendously valuable. Because why? You can identify opportunities. Well, what happened? Why did we just lose or drop 5,000 calls in the middle of New Orleans? Well, if you can quickly explore lots of different disparate data sets from operational systems, you can figure out pretty quickly, what we see, because this tower just got struck by lightning, for example, or you could even see that, wait, well, there's a new Samsung phone that came out and for some reason things aren't working with that particular form factor. In other words, this allows you the ability if you have these technologies, if you have this analytical capability, it allows you to identify what's happening and address that very quickly and thus keep some customers, because nobody wants to lose customers. And then there's VOIP. I don't know how many people have had the unpleasant experience of talking on VOIP, but one of the invitations I like to do is when you have a VOIP problem and someone's trying to tell you something very important, let's say your boss is trying to give you an assignment and they say, I really want you to go down right, that's just not good. You can't understand what they're saying. And I try to point out to people when they're on the phone on these radio shows, I always try to check their audio quality and ask what kind of phone are you on? Is it a landline? Okay, it's a cell phone. Okay, what's funny is a lot of people do get offended as if they take it personally that their phone quality is not that great. Well, it's not their fault, it's the phone, right? And they think, well, I sound good, but you're on your end of the phone line. Like you're hearing your voice through the ambient air and through the bones in your head, right? You're not hearing your sound through the phone line and back unless you're smart enough to log in on a web seminar somewhere and listen to yourself and then you hear just how strange things can get. But the point is that VoIP, these systems too, well, one of the challenges is that network activity will affect quality of the sound, especially if someone starts downloading a huge file, for example, or has a tremendous amount of activity on the network, that will disrupt VoIP, that'll give you audio problems. So once again, this is why you'll see companies in that space focus very intently on figuring out what exactly is happening and finding solutions for it. All right. So the other area where we see a tremendous amount of data once again is in this social realm, which you get what you pay for, right? Facebook, Twitter, LinkedIn. One of the challenges with these organizations from their perspective, of course, is just the raw amount of data that's flying around. It's just overwhelming how much information is out there. And being able to track all that I haven't seen it in quite some time, so they managed to solve some problems there. But once again, you have this real-time issue. Think about the issue with Facebook, all that data being compromised. Being able to identify when that's happening is of critical importance. It's not good enough to wait until, oh, I don't know, after an election, to figure out what happened. No, you have to see what's happening and address it in real-time. And I can tell you too, from a consumer perspective, and an advertiser's perspective, well, these engines really threw some curveballs at the world of traditional advertising. Because Facebook is technically free. Twitter is technically free. LinkedIn is technically free. They're all coming up with their value-added services on top of the free service to make some money from you. But what keeps happening? Well, the rules keep changing, don't they? And I can tell you that the numbers are all over the place. There's some links in a bit.ly, and you run it through LinkedIn, you run it through Twitter, you run it through Facebook. The numbers are all over the map. In other words, Bit.ly will tell you there was 1,000 clicks. LinkedIn will tell you there are 57 clicks. Well, which is it? There are some ways that they count clicks that come into play here, but the point is that the numbers are all over the place. So even from an advertiser's perspective, you really want to be able to understand what's happening, and a lot of times you want to be able to understand what's happening. So that brings me to the Amazon in the jungle, if you will. Amazon is just a huge, massive company. Who doesn't compete with Amazon these days? They're all over the place. They're co-opting all sorts of technologies. They're a very, very smart company, very durable company, very innovative company, and I promise you, they are using streaming analytics right now. If you think about trying to compete with Google, with Amazon, what's happening these days in a lot of cases? You have smaller companies trying to compete with these Goliaths. Well, it's going to have to be the case of the mammals versus the dinosaurs. Oddly enough, these guys will be the dinosaurs at some point in time. They're really not now, but they will be. The dinosaurs are arguably the big brick-and-mortar operations. But my point here is that we all need to be aware that we're now not just competing with people in our geography. We're competing with people all over the world, so technologies like streaming analytics, they're going to position us for success in the future. And that brings me to my final slide. So, yes, streaming analytics can supercharge your business. There's no doubt about that. You can identify threats and opportunities in real time. And here's one last significant point. Think about, especially any midsize-to-large organization, the data management teams are typically pretty robust. You have enterprise data warehouses that usually cost millions of dollars to implement, many of which are still doing a great job, delivering the right information to the right person at the right time. There's a tremendous amount of engineering that went into these solutions to get people what they need. Well, streaming analytics should not only offer you new opportunities and new benefits that were not going to be had with traditional data warehousing environments, but really realistically, they should also help you load some of the stress on those environments. So what does that mean? It doesn't mean that data warehouses are going to go away. I think that's one of the big misconceptions in the market today, especially with these discussions around big data analytics. No, they're not going to go away, but what should happen, in my opinion, is we can offload some of the stress on those environments, maybe wind down some of the processes, all of that ETL, that extract-transform load, whatever you wear in the data warehousing world, we can start to wind some of that stuff down and then expand the use of the data warehouse. And really what you want to do is you want to have a combination of technologies in place. You want that data warehouse for your deterministic work for doing your reporting, especially if you're a public company, and you want the analytical side, the sort of big data IoT fun stuff that we've been talking about, log-based analytics, and then as you find examples where it's very good for more traditional reporting, well, you either bake that into your data warehousing environment or you formalize your processes around big data analytics. And with that, I'm going to hand it off to a guy who knows all about streaming analytics, a very intelligent man who is just a fantastic guy to work with, Mike Allen of Zoom Data. Take it away. Thanks very much, Eric. So let me just switch so that I can present. And while I do that, I can work out how? Top left-hand corner of your screen. Thank you so much. There we go. Thanks very much. So as Eric mentioned, I have a background in streaming analytics that goes back quite a way. One of Eric's slides reminded me of my past, which was the slide of Fleet Street in London. Where I actually started working in the streaming analytics business was in trading floor software. This is when we were building software before you could go out and use Kafka for free, before you could leverage some of the capabilities that are available in open source today. This was very custom. This was very high velocity. Back in the day, there were hundreds of updates per second, possibly thousands. I think when Eric put up the slide of LinkedIn, Twitter and Facebook, I believe they're processing about 12 million events per second right now between them. So high velocity has changed, but it's always about stretching the limits of what is possible. So my background started with trading room systems, which is very interesting to me that what I learned through doing that has become relevant in much across all industries over the last couple of years. So without further ado, what I'm going to do is through a few slides talk about high velocity, the what, the why and the how of streaming analytics. My agenda is really simple. I'm going to talk about the what. I'm going to talk about the why. I'm going to talk about the how. There's a slightly different perspective than some other people you might encounter when they just talk about streaming analytics. So that should be interesting for you. With a bit of luck, if the demo gods are with me, I'll give a demo, and then we want to leave plenty of time to do Q&A. So lots of people have really interesting questions in this space. So I want to leave some time for that. So without further ado, I'm going to not stick to my agenda, and I'm actually going to talk about why first because it's very hard to talk about what without setting the context of why are we talking about this now. What is significant about streaming analytics that is relevant for you today and what is the significance? To do that, I'm going to point to a survey. So I'm a product guy, so rather than having an opinion, I always have an opinion. I like to base what I say on data. We worked last year together with O'Reilly, the same people who bring you Strata and all these books and conferences to do a really thorough survey of the state of data analytics. If you're interested in this survey, you can actually download a copy. You just have to register and request a copy on the Zoom data website. But the survey was like, I think it was 870 odd practitioners across all different industries, doing analytics, managing analytics, managing data platforms and we asked them a lot of questions about the state of play and what they were doing. There's lots of interesting insights in there, some of them kind of obvious. Relational database is dying. Everybody's doing big data but a lot of people are still in pre-production with it. 72% of people are doing it, but only 35% are doing it in production. A lot of analytics now are what we would call embedded, so people aren't relying on the classic BI tool as a standalone tool to point to their data to do analytics. They're actually embedding insights into their applications. But the bottom left is the one I'd like to highlight and say I'm going to talk about, which is streaming analytics is becoming the new normal. So it's not just companies, it's not just the high frequency trading on Wall Street and companies like Facebook and LinkedIn who are doing millions of events per second and care about the latest streaming data. This is the broad base of people in the analytics and data management business. We see that 15% are actually leveraging data that is one hour old or less. So these aren't things that are being processed through ETL jobs and held overnight and used for analytics the next day. These are things that are coming in and being used for analytics. 10% are looking at data that's less than a minute old and five or six percent are looking at data that's less than a second old. So that's a really big departure from the sort of traditional analytics world, the traditional BI world. And so I'm going to talk about how that is relevant. One of the things that was interesting I'll just throw out a couple of little things that we found in the details of the things. You know Casca and Spark came up a lot. Streamset, NiFi came up. One very popular streaming analytics source was not sure. I think that's an indicator that there are people out there who whose companies are leveraging streaming analytics but they don't know how it works. These are things that it's a new domain for many people. So my goal today is to try and share some insights about what, the why and the how. So still on the why, I think the why is it's all about value and time sensitivity. If you can leverage fresh data, you can have a competitive edge. The companies that will succeed in the future are the companies who are data driven and empower people to be able to leverage that data as it becomes available. So there's a, the more recent the data, the higher the potential value of being able to leverage a competitive advantage. You know, older data, historical data is the traditional realm of BI. That's very much sort of batch processing and then analytics on ETL's cubed data. The more modern area is being able to react data that is from hours to minutes to seconds old and this is the domain of streaming analytics. It's a continuous, it's a continuous line. I think that's one thing that's very interesting here. We shouldn't think of these as two different domains. We should think of this as a continuous space. Data always has some age and the relevance depends on the value you can extract and not the technology using. So with that, that's the why this matters. Let's talk about what do we mean by streaming analytics. Again, rather than making my own definition up, I thought I'd present one definition. So this is from Forrester software that provides and this is how they define streaming analytics. You can look at their waves, you know, like their magic quadrant Forrester equivalent for streaming analytics software that provides analytical operators to orchestrate data flow, calculate analytics and detect patterns on event data from multiple disparate live data sources that allow developers to build applications that sense things, think and act in real time. Now honestly, I think this is a great definition. It's pretty good, right? It really speaks to the kinds of systems that are really established in that classic streaming analytics space. In that wave, I think I've worked at two of the three companies in the leader's quadrant. There's, Tidco is one of them. Tidco, you know, acquired Streambase. Streambase is on version 10 right now. Software AG, Software AG acquired a Palmer and that's a core piece of their streaming analytics portfolio. That's on version 10. Now these are pretty well established things. These aren't new technologies. This has been around and what my observation about this is, this is a very niche view of streaming analytics. This is, if you look, it says the allows developers, right? How many of you are developers? I suspect not all. I think a lot of people who work in data management analytics are business users who are trying to leverage technology, not write code to process information. Yes, there's definitely a set of use cases which are event streams being collected and processed by machine learning and processed by systems and then acted upon. But there's still a lot of data which requires human analysis. So it isn't just computer based processing, nor is it just live data sources. I think being able to leverage streaming and real-time data is about being able to combine it with historical and other data sources. So I think from my perspective, we can go a bit broader than this to really talk about what streaming analytics is. So there's another perspective. It's really a fundamental requirement for modern business intelligence. If you're trying to do gain intelligence from your business data, you need to leverage streaming. That's my perspective. So Gartner defines modern VI, enables non-technical users to autonomously execute full-spectrum analytic workflows from data access, ingestion, and preparation to interactive analysis and the collaborative sharing of insights. I'm going to translate that into English. Modern VI lets people ask questions of their data and share the answers without getting IT to do all sorts of work to make that possible. Right? I think that's where everybody's headed. I think that's grand. But it also needs the ability to analyze the freshest data and consume it as it changes. And this means that streaming analytics is a mainstream requirement of any modern VI platform. To be able to do modern business intelligence, you need to be able to deal with data streams. So this leads to the concept of a kind of a real-time VI. But before I get to that, I wanted to give a couple of examples of the kinds of businesses or customers or markets where people are leveraging the next order of magnitude blossoming variety. So one example an investment bank that's monitoring real-time trade looking for anomalies. These are tens of thousands of transactions per minute. Federal systems integrated. These are people working with government agencies who have huge amounts of data, many thousands of transactions a minute. They're looking for insights. They require people to be able to look at the data, not just machine processing. Talco monitoring subscriber data across multiple continents. Billions of data observations. Cyber crime monitoring. Again, billions of devices. Billions of activities. A lot of machine processing, but also the need for human investigation and insight on that data. Cable set top data again similar or ad tech. Huge amounts of data in ad tech that is required to be able to provide an insight onto what ad to place and how to value that ad placement. So these are sort of a lot of examples which really demonstrate that this is happening across all industries. So that was the why and the what. So let me talk a little bit now about the how. How do you do streaming analytics in the context of modern BI? So let's just talk about doing modern BI. We talked about volume, velocity, variety, big data world. This is all about the velocity. Be ready for all of that kind of data. Don't force users to define the questions they want to ask in advance. If you have to arrange your data, process it in qubit so that it's ready to answer the questions that you know you're going to ask, then you're missing the opportunity to ask questions that you don't know about yet. You need to avoid data friction at all costs. So if you're moving data, pre-cubing, processing, it slows it down and makes it very hard to access the freshest data. So the goal is always make the freshest data available to users and provide a dynamic experience so that if there are updates and it makes sense to show them, show them. Don't make it a user driven, oh I need to refresh this dashboard to see if anything's changed. Things are always changing. The world of BI expects continuous change. So this really is where I wanted to introduce the idea of real-time BI. So real-time BI is just thinking about BI differently. You know it's not simply real-time activity monitoring, like the sort of streaming analytics that sort of Forrester is possibly talking about, but it's and nor is it sort of streaming visualization that you see in trading flaws. But it's the ability to be able to do two things, respond to user inquiries and actions, and refresh data as it becomes available. So without user interaction, you want to be able to provide data to users as they need it and want to see it. So how do you not do this? So if I look at this little chart, this is a sort of traditional sort of BI data pipeline, data warehousing pipeline with streaming added. You see the real-time data streams coming in, they're getting munged, they're getting pushed into a data warehouse, they're getting cubed, they're getting having a traditional BI serve appointed them and the traditional BI tools are going to be used to ask the questions that are known about in advance. Stream processing is kind of globed on the side. There are some specialist tools which let people see the streams, but otherwise the streams have to be sort of wrangled back into the data warehouse through a landing process such that they can be cubed, such that they can be served through the traditional BI server to the traditional BI user. So really you're gluing a lot of things together and you're providing different tools that see the world differently. So you're not having this bridge between the historical and the present through continuous tooling, you're using different tools. So how to do this differently? Well our perspective is you need to rethink BI. Think all data as continuous streams. Rather than thinking of data as a snapshot or of a collection of historical data think of it as a stream. Everything can be represented as stream. For example, sales transactions are made at points in time. With a real-time BI, you can visualize those over time, filter the data and you can basically use time as a continuum to engage with your data. It may be real-time, it may be historical data, but you still may want to stream it. So what does that look like in that clean and modern approach where you think of all data as streams? So one of the nice things about all data as streams, it means all data comes in the same way. Messaging management, stream processing data gets landed. A modern BI platform is smart enough to be able to stitch together the landed data and the updated request. So be able to do delta processing. It's able to not require cubing. It's able to require to be able to interactively query data and receive live incremental updates. It relies on modern technologies. You see the streaming systems like Kafka. You see stream processing things like Spark streaming or NiFi or stream sets and so on. You see fast modern data landing platforms, whether this is things like Kudu or whether this is an impala or things like MemSQL or there are many there are many many fast modern data platforms that are designed to both support ingest and analytical query things like solar where people are actually supporting search type semantics as well on top. So there's a couple of things facets which illustrate that. These are things that we do in zoom data. One is that you want to be able to connect live to any data source. One is that as data updated data is landed the user can have that data refresh to their screen. So you're able to implement a updating visual experience. You as I mentioned represent all data as streams. So you need to think about well if a data query takes a very long time maybe I can break it up. So Zimdata has this concept of micro queries. So if you have a micro data set and a query might take a minute we break that query up we parallelize it across a parallelizable architecture where possible and starts streaming back the data as it becomes available so that we can estimate values and provide an interactive experience before the data is final. So you would get this concept of data sharpening. So just as you when you watch a video on YouTube it starts playing immediately and then you see the data getting the image the picture the video getting crisper as data buffers. We try and do the same thing with data so that we get some results you can interact with it you get a sample and then as we get more data in we can crisp that data set. So again it's a whole idea of using a set of patented techniques to leverage everything as a stream and to provide a fully interactive experience with micro query and sharpening. We use time as a business context so everything in zoom data is can be sort of viewed over a rolling time period so if you want to wind back data you can if you want to see previous periods you can so time is just a facet of the data when it was landed and so this gives us the ability to do something that we like to call the data DVR so a bit like with again video analogy you can rewind replay and fast forward the data as it is arriving so you can see behavior over time which in a lot of these applications that I talked about makes a huge difference. So I'm going to switch and hopefully if the gods are going to be supportive of this and do a little demo to illustrate what this kind of solution looks like with zoom data so let me just call this website up login so this is zoom data zoom data is a modern bi platform that has a lot of powerful capabilities for dealing with both big data fast moving data and data a lot of different varieties so the front end is web based the back end is microservices and smart technology but as I login I see opportunity to click on lots of different dashboards here are some examples looking at tens of billions of rows of data in impala on Hadoop looking at search data sources looking at no sequel data sources looking at cloud stores like Amazon S3 Google BigQuery fusing data from multiple sources working with these fast stores that are able to deal with ingestion and analytic query at the same time like MemSQL and then lots of different streaming type use cases working with NiFi or Internet of Thin type use cases or Kudu and so on and the one I'm going to pick on here is actually a cute little illustration using an analogy so you could think of this as a trading system where rather than trading stocks and bonds or options we're actually trading fruit in this demo so with this demo what we have is if you look at the architecture down in the bottom left we have a producer which is where the trades come from for example the trades are introduced into the system via SMS they are then passed through Kafka as our stream transport they're passed through a stream processing or ETL type enrichment using Spark streaming they're landed in solar as our data sync and then they're connected to from the zoom data server from the zoom data server to be able to provide this visual interactive experience so as you can see if you want to start sending orders to my system you can text so anybody can play in this game you can text a fruit so for example you can text Apple 50 which would send an order for 50 apples or you could send banana 100 and so on and the way it works is each order if you send banana 100 that will send 100 banana orders to the system and you will see that as those orders go through the screen is updating in real time which is very interesting so the the zoom data interface is actually polling to get refreshed updates but the thing is that isn't this isn't a pure CEP window analytics type thing this is true business intelligence on the full data set so I can for example switch any of these charts and say well I don't want to see a bar chart of the fruit orders I want to see a pie chart and I would like to see this is the pie chart across all orders that have happened it's not a pie chart of the last orders that are being processed in the application this is actually being queried against the back end so this is the full data set and so this is a very sort of interesting type use case because what I'm doing is I'm actually blending the sort of real time data as it comes in but I'm also being able to do historical analytics on that data and as I mentioned I have the ability to manipulate time so as you see this is updating nicely in real time as you're placing orders I notice that people are putting in things like chair 100 well we've got a smart system here and it won't let you put in things that aren't fruit or swear words it filters those out so that's part of our Spark streaming but so I can stop time if I like and I hit the pause button and no longer will you see updates in real time and I can look at my historical window and I can say well actually I'd like to see different time areas I'd like to see a current hour see a rolling window of the last minute or I'd like to see whatever sort of time preset I'd like or I can simply use this device on the time bar to be able to sort of scroll back in time like if you've been familiar with video editing I can scrub so I'm basically scrubbing back in time and I can see here this is back at the beginning of our demo where I just had one order for Apple and now I can say well I'd like to actually start replay from that point in time with my play button but maybe instead of playing in real time I'd like to accelerate that and I'd like to see one minute per second rather than in real time so we can sort of scrub rapidly through and I hit play and you'll see that as the time plays all of a sudden all of these messages come in and you're seeing a sort of a fast rewind so if this for example was a sort of cyber security type example you could see what was happening during an event and you could sort of scrub and pan through the data set as the data was arriving and being landed so this is a kind of very powerful type environment and again because I have this modern BI environment that's designed not only for streaming data and historical data big data but it's also designed for modern data sources so for example I can do a text search because my data is being landed in solar and I can just filter out all the berry type records and see only berries in my oh no no berries I am going to filter out banana let's do that and so I can see only the banana type orders or things that match banana so I can do a full text search and I can see each of these orders as they happen I could also for example because it's designed to be a a fully interactive system I can drill in and look at the details so I can actually see the individual records in the system so again drill to detail so it's a very powerful fully featured BI platform that not only supports streaming and historical but also supports drill to details and the facets of modern BI platforms that are taking a while there's probably a lot of banana orders in here but you get the idea so I can drill through and show there we go those are all my records of the banana orders and all of the details in the system so that's really all I wanted to show in terms of the demos so let me just go back to you for all texting and playing along with a demo there so that was really all that I wanted to say I just maybe sort of wanted to summarize before we switch to Q&A so if you want to learn about zoom data if you want to get access to that report on who's doing what in modern analytics today you can go to zoomdata.com you can play with our product you can actually play with that demo I think as well live we talked about the what the why and how of streaming analytics and I've talked about how it's not just the analytics that people think of as machine based pure high volume but it's also the blend of bringing streaming data fresh data into your traditional or into your BI type user driven investigative explorative data discovery type use cases streaming is a core property of modern BI and that it's all possible with the latest generation of modern BI and which of course zoom data is a great example so that's all I wanted to say so I'm going to hand it back to yeah and if you would if you would Mike go ahead and okay never mind I was going to ask you to push one of those slides forward we might load your slides into the console here just because I love that sign co-sign you had going but would you like me to share that one again yeah go ahead and share it and while you're sharing it I'll throw a couple comments from the audience no you got some impressed audience members out there who said wow one of them and this is cool referring to that real-time demo and I like the way you talk about rethinking BI right because we have a whole culture around business intelligence that frankly got us where we are today and the last thing you want to do is throw everything out with the baby out the bath water as they say but the idea here is to rethink BI and make it much more of a real-time environment because these days to the point I was trying to make the competition is so strong in so many areas throughout the modern world that if you're not paying attention to everything around you something is going to blindside you and you're going to go under think about Uber think about Airbnb think about a lot of these new business models that are leveraging cloud infrastructure at web scale and just turning industries upside down I'd like you to talk a bit more about how people respond when you explain this concept to them because I'm guessing you get a lot of nodding heads and opening eyes right it's very interesting it's a journey of rethinking very traditional approaches which have been around for a long time and BI hasn't changed much in the last 20 years and I think in the five stages of grieving there's the five stages of transition from a traditional to a truly modern perspective on BI and I think streaming is a really interesting one because this is one of the ones that really gets people to it facilitates thinking differently you know you can always try and solve big data by preprocessing so the bigger the data gets the more clever approaches people have to making it look like not quite so big data and using the same techniques in BI but when you think bringing streaming into the mix you can't do that so you have to think differently you have to think of data as continuous streams and so that light well normally goes on somewhere during the first or second demo and then people maybe they start to leverage the technology even though they're not you know landing the freshest data but they're enabling a platform for the future and so it's a it's a journey and people are very excited about it when we typically show it to them and if you would go back one slide to the slide just before this one the clunky and retrofit approach see if you look at this and then compare it to that next slide again but leave it here for a second you can see the difference between a sort of disjointed and as you say clunky environment versus a much more smooth and seamless environment and I think one of the points I'd like to make here is that every arrow you see in between these environments tends to represent an opportunity for something to go wrong right this is one of the main challenges of traditional ETL based batch based business intelligence is that you're not seeing a real time view of your environment what you're seeing is the view as it was last week last year and sometimes of course when the transformations are not correct or if there was a batch load that didn't go properly you're actually seeing the wrong view so I think the key is to really take a much stronger approach toward understanding the big picture at any given moment in time and I think that's what you mean by modern BI right yep you got it you know this this flavor it looks simple doesn't it yeah well it also kind of reminds me of a lambda architecture when I first saw that I thought oh is that a lambda architecture and we'll technically know it isn't but one of the challenges with lambda is that it looked good on paper but in reality it wound up being very painful is that something you've come now I think as soon as you start getting into those Greek letters of the different architect you lose a lot of people that's all I'll say on that one beware of Greeks bearing gifts that's pretty funny no and let's talk a little bit about micro queries because this is I think one of the most interesting concepts you guys have come up with and it really to me it represents an awareness about human behavior right and for all the talk of robots taking over and artificial intelligence supplanting people is in jobs et cetera I think a lot of that is remarkably overblown but the fact is that human beings still need to be very heavily involved in driving all these solutions and if someone isn't using a tool effectively then that tool isn't providing any value and let's face it attention spans are dreadfully short these days which means people have to get responses quickly in order to stay engaged in the process in the actual analytical process itself I'm guessing that was a big part of the calculus behind going down this micro query path to keep the user engaged and to keep the mind spinning right you you've got it absolutely I mean my view of the future is that you know the majority decisions will be made by computers and everything will be automated and will be wonderful but the most important decisions and the most important validation of those computer generated insights will be done by people people's brains the most complex and aggressive piece of technology in our brain is the visual cortex we want to process information visually you know we can process I don't know 30 frames a second we need we get bored looking at the same number we need to see it moving so that ability to deliver a dynamic visual interpretation of data is essential so you know hard questions you know some of these queries against a 10 billion row data set will take minutes to resolve so what do you do to engage the brain to let the brain see what's happening and start to interpolate which is what our brain does well you break that query down so that you can start delivering results quicker and you animate that learning as it happens and you do that whether it's one big query or whether it's a query with updates streaming data or and so on you just treat everything as streams yeah I know that's exactly right we have an excellent couple of questions coming in here from attendees let me throw one out one attendee asks does zoom data provide a mechanism whereby governance is applied to different data sets you know for example how a zoom data restricts users access to a given dashboard and data within that dashboard can you talk about granularity of governance and control that's an awesome question so I'm going to answer that in a couple of ways firstly one of our one of our core tenants and one of the core tenants I think is really important for modern BI is not moving the data so you have to push down the data workload to where it is and that means you have to push the security down in fact you want to push the security down because as soon as you're trying to do data security and governance in multiple places you have a nightmare on your hands whereas if you authenticate the user and you pass those credentials down to the data store and associate them with a request you know who's accessing the data and whether they have the rights to do it so it's very much the philosophy and the philosophy of doing modern BI right is you don't move the data and you keep that the primary governance where it belongs. Now of course the Zimdata is a powerful tool and not only do we support dozens of different kinds of authentication and authorization approaches you know that sort of SAML and LDAP and the Ilk and whether it's things like Kerberos or you know Sentry or any of these different security systems we also have the ability to overlay additional security at the Zimdata level if you so desire so you know there's a lot of flexibility there and we're seeing more and more people you know the end goal is keep security in one place I really do love this well here's another pun sorry folks this movement towards not moving data around right what you want to do is be able to access the data where it is see it mix it in matching with other data sets and do your analysis in that virtual playground that kind of sits on top of the data right because moving data around every time you do that it takes time it costs money and it opens the door to problems right yeah absolutely good let me throw one other question to there's a question from an attendee asking does this platform support unstructured data you guys support all kinds of different data right that's another great question it goes back to what are the tenets of doing modern bi right don't make any assumptions don't assume that you have relational data stored in traditional data warehouse may have you know large data you sort of high volume you might have high velocity data you have high variety data different types whether that's structured unstructured no sequel log files hierarchical graph data so we've worked with pretty much all flavors of structured and unstructured data and we've architected zoom data to be really agnostic and designed to be more than just a another sequel based traditional bi tool but really a flexible modern bi platform no not about it and you know there's one more thing I'd like to talk to you before we wrap up here we're very punctual for dm radio deep dives you know the world these days has so much information from so many sources and any analyst knows that an additional layer of context can almost always provide some value and often it can provide tremendous value the key is really to figure out quickly what level of value that's going to provide me that's one of the key value propositions that you folks have really focused on is enabling the analysts to test this data set to test that data set and to do so very quickly such that you don't waste time you don't go down wormholes you can very quickly ascertain just what level of value you're going to get from some new data set and that keeps the analysts busy keeps the analysts productive right absolutely yeah good stuff folks well I have to say I really appreciate the demo that you did there you got a lot of good kudos for that and folks I have to thank all of you out there for your time and attention for throwing some great questions at us this is part of our ongoing series of dm radio deep dives of course dm radio itself going again tomorrow Thursdays at three eastern feel free to send an email to yours truly if you have some suggestions for a topic info at dm radio.biz and with that let me hand it back to Shannon Kemp to take us out Shannon Eric and Mike thank you so much for this great presentation definitely did not disappoint this was just amazing as you can tell from the audience comments as Eric mentioned and thanks to our attendees for being so engaged in everything we do we just love it and to answer the most commonly asked questions and just a reminder I will send a follow-up email by end of day Friday with links to the slides and to the recording so you can dive deeper even deeper into the content just deliver it thanks you guys so much thanks everybody have a great day thanks everyone bye bye thank you bye bye