 Hello and welcome. My name is Shannon Kemp and I'm the Executive Editor of Data Diversity. We'd like to thank you for joining the first installment of the new Data Diversity Webinar Series, Data Insights and Analytics, brought to you in partnership with First-Time Francisco Partners. To kick us off, to kick off the series, John and Kelly will discuss the series' namesake and talk about data insights and analytics frameworks, and just a couple of points to get us started. Due to the large number of people that attend these sessions, you will be muted during the webinar. For questions, we will be collecting them via the Q&A in the bottom right-hand corner of your screen. Or if you'd like to tweet, we encourage you to share highlights or questions via Twitter using hashtag DI-Analytics. As always, we will send a follow-up email within two business days containing links to the slides, the recording of the sessions, and additional information requests throughout the webinar. Now let me introduce our speakers for today. Well-known industry analyst, John Labley is a business technology thought leader and recognized authority in all aspects of the Enterprise Information Management. With 30 years experience in planning, project management, improving IT organizations, and successful implementation of information systems. He is the President and Chief Delivery Officer at First-Time Francisco Partners. Also joining us is Kelly O'Neill. Kelly is the founder and CEO of First-Time Francisco Partners. Having worked with his software and systems providers key to the formulation of Enterprise Information Management, Kelly has played an important role in many of the groundbreaking initiatives that confirm the value of EIM to the Enterprise. Recognizing an unmet need for a clear guidance and advice on the intricacies of implementing EIM solutions, she founded First-Time Francisco Partners in early 2007. And with that, I will turn it over to John and Kelly to get today's webinar started. Hello and welcome. Good morning, afternoon, evening. Everyone who's listening, this is John and the other voice you're about here is Kelly or better be. Good morning and happy new year. Or hello and happy new year. Might not be morning. And it might be warmer cold. That's minus 8 Celsius where I'm sitting right now. And I'm hopefully as warmer for some of you. Anyway, but I'm not outside. I'm inside. So anyway, let's get going with our topic here today. And first, Kelly, I'm going to talk about just welcoming you to the new series. Data insights and analytics is a hot topic, I guess, is one way to talk about it. Our prior series on CDOs actually inspired this because most chief data officer work is around this area. And it's an important topic right now. So we're looking to grow your understanding of this. And that means not just the big data and analytics, but also maybe some of that mundane stuff like BI and reporting and data warehouse. The key here is you get value from this discipline. And it is not always an easy process. So we want to get past all of the hype and techno speak and again help you get results. That's really what this series is about. If you hear anything that you want changed or would like to see in the future, please check with the abstracts that are out there with Dataversity or use the Q&A facility here to send us some ideas besides your questions that we are going to answer. Kelly, add some stuff to that. Yeah, you know, we found that the engagement last year, especially boy, the last one at the end of the year, there was so much engagement and questions that we're going to try very hard to leave enough time within each of these webinars to be able to address your questions and answers. So all of that is very valuable. We appreciate it and we will do our best to make sure that we're accommodating it. Very good. So without further ado then let's just kind of get started. Today we're going to talk about what is a framework, what is an architecture. There's kind of a difference and we feel that's important for you to understand. One, you can speak pretty clearly towards one audience and the other one you can speak clearly towards other audiences. And then as a way of example, we're going to dive into a few representations of those architectures. The classic big data, the word stand box or lab is extremely popular these days. Real-time analytics, not a new concept. Sorry if someone thinks it is, it's been around a while but we'll talk about that. And then we're going to take a look at some of the old-fashioned stuff just by way of balance and hopefully you have some things to take away from the conversation. And as Kelly said, we really do want to answer your questions or hear what's on your mind. Anything, Kelly? Nope. I saw you were muted. I didn't know whether you were talking to the mute or not. Yeah, exactly. Good question, right? Well, we're going to talk together on these next flu sides. Flu sides. First of the year, give me a break. It's really how you deliver and get value out of something. What does the architecture do for you? What are its pieces and parts? And that might even include an organization model. We want to show holistic thinking with frameworks and architectures. But there's an awful lot of talk about maturity. Right? Kelly, out there in the market and where do you fit? And there's this continuum of delivering data to operate your business versus the super sophisticated predictive analytics. And I suppose next year we'll add a bubble for artificial intelligence or machine learning or something out there. But that's not necessarily a hard and fast continuum, is it? No, I mean, absolutely. I think the business requirement is across all of the levels of maturity. I think that these are just different types of analytics and consumption of data. Some are more sophisticated in their approach and require more effort, more cost, more I guess future and bleeding edge thinking. But operational reporting is still quite important. Yeah. No, I would say we very often run into a client or a prospector. We just talk to somebody at a conference and they'll have the predictive analytics group off in one part of the organization and they're doing fabulous things. A thousand points of light are coming out of the data scientist. And in the other end of the organization, they can't get an operational report accurate and get anyone to agree on their results. Well, does that mean that they fail? They haven't sequenced up the continuum. No, their framework for doing this is going to have to allow for that. It's going to have to allow for some operational things as well as getting value. The key here, right, Kelly, is get value out of your data assets. And this is where it happens. This is the consumption end of your architecture. Absolutely. And that's also why we talked about this as data insights and analytics because it's not just about analytics, or at least in context of this webinar, the topics and the things that we think about are not just analytics because there's data insights that can occur across this continuum and the data preparation process. There's certain amounts of efficiencies and economies of scale when you're looking at data preparation, even for something that is very forward thinking around predictive analytics and taking advantage of NLP and all of that. But if the data is not prepared in a consistent way, then you're not going to actually get the business value that you're wanting out of these different sorts of levels of maturity. So it is important that we're still considering the data components of this, not just the analytics. Yeah, absolutely. So if you're sitting there with your head in your hands going, you know, how can we possibly be spending money on big data when we don't know how many widgets are out on the warehouse. That's okay. You're among friends and it's okay. You just need to start to address both ends at the same time. So here's a common typical framework of the, it's one we use often. And so you will have data insights and of course you have to wrap this governance around this. You have to wrap governance around it. You have to have alignment with your organization. You have to be culturally ready and you have to have the oversight of the data. This is a common theme you heard from us last year. We're not going to change and we're not going to back off either. It is really, really important. There is an operating model somewhere along that spectrum that we just talked about. You have to do business. And that act of doing business has to interact with how that data is used in the business. Then there's all these components that we like to kick around and talk about. Sadly, we all go to the middle of this first. What's the architecture look like? What's the data quality? Who gets it? When do they get it? Is it push or is it pull? And what does it look like? We need sexy presentation. We need visualization. I got to have this tool. A lot of time is spent on data wrangling, which is I think that's my favorite term of the year so far. Not only the sixth, but it's my favorite term of the fifth year so far. It's just all the movement and heavy and light lifting of data. And then of course our old friend Metadata sitting on a big pile of technology of various shapes and forms. But this is something you can take to anyone in the organization, right? Okay. So then go ahead. No, I was just going to say and then I was just talking into my mute button, by the way. And so I think that again as you said, John, a lot of times we jump to the capabilities that are in those yellow boxes. We will talk quite a bit about those different capabilities across the year, but let's not forget that this does need to be in the context of a business strategy. Your business strategy and your data insights and analytics strategy need to be aligned. So that's why we've got this concept of organizational alignment and that takes into consideration things like organizational change management. Because as the analytics world continues to progress in absolutely dynamic and rapid pace, the data world needs to progress at a rapid pace as well, which means that you are impacting the way that people behave in their jobs across the organization within IT, within an administrative and kind of data maintenance organization within your traditional data management organization, and then the way that you interact with some of the analytics and data science organizations. Maybe there's a big change to your organization just by nature of having new titles and new roles that are things like data scientists and that sort of thing. So there is a big component around recognizing the need for organizational alignment and organizational change. I think if we think about the concept of the framework, the framework guides decision making as John said in the previous slide. So what are the types of decisions that need to be made in each of these categories and how does that impact my organization or my plan or my strategy? So that's how we think about using that framework. And then the next thing, of course, the architecture and everyone's familiar with the CAN slide, I call it. And we've done a very simple one to just kind of hold our thoughts together for this conversation today. And it's probably not all that atypical. We have data sources and we have some data movement and part of the organization, it could go into your data warehouse ODS type aspect. And another part or even part of a planned architecture, there's the ingestion into the big data environment with the lake and the sandbox and that kind of things. And of course, we surround it with our governance and operations management data quality and controls and all those things. So this is when you start to get into flow and direction and where things live and where it sits and all of that for the layers if you're into a service oriented type world or a multiple layer type thing, that this is where it goes to. We have an asterisk there. We want to talk about real time today, but that little asterisk there means that real time stuff can be everywhere. Low latency data usage is just as important as high latency data usage. So we're putting that everywhere. And so that we're going to talk about in the perspective of this architecture for the rest of this conversation. So without further to Kelly, if anything to add, we'll just dive in here. We have probably some very deep technologists on the call. We have decidedly kept this at a high level as our first webinar of the year and we will continue both within this presentation to drill down into the cans if you will as well as as we go on through the year. So just recognizing that we're trying to keep to drill down over time both within the webinar as well as within the webinar series. This will evolve. We've got the real Harry looking one in the file. You'll see it, but we just for the sake of today, let's start with this. So let's just talk about these four areas we're going to talk about. There's a lot more to this conversation inside an analytics than just the big data stack. That's one thing to bear in mind. This entire architecture we showed you has relevance. I know there's someone out there that says everything can go in the lake and our problems are solved. That is not correct. Nor is it economical. So that's one thing. There's more to it than just this stack. The other thing that has changed is at one time we were very linear with this stuff. The old picture that goes from left to right, I call it the bow tie. Kelly, we saw this for years and years and it's the bunch of disk symbols on the left and then some magic happens in the middle where all the consumers are on the right and it kind of gets narrow in the middle to the warehouse and explodes to the right again. It looks like a bow tie. That's just the solution to the spaghetti diagram. Yes, that's the solution to the spaghetti diagram. It's linear. It's kind of a left to right or top to bottom or bottom to top. What we're doing now is a lot less linear. There's a lot more going on and if you add services to it instead of a flow along a service bus or something like that, but the fact is that we are really changing a lot in that area. So that is a big big difference, but that doesn't mean some of these old structures go away. It just means maybe you supply them differently. So we're going to cover some of them. The bottom line is these are all these things are arranged by a series of characteristics of latency or access or what the value is, who they're exposed to, the capacity, the volume and you know with a capacity to just move things through the pipe and that's really what's going to set the tone for you here. So we're going to look into standard big data, some of those characteristics, the sandbox, a real-time thing and then what we would call the heritage type thing because heritage sounds really classy whereas old sounds old. So we're not going to talk about, we're not going to talk like that. So I'm going to move forward here unless anything to add, Kelly. I'm good. Alrighty. So big data. We have a very elementary architecture or framework there on the right for big data and you've got your sources, what is called ingestion, a structuring layer because when you ingest into you don't necessarily put things into a type of file system. They're not connected together in the big data world like maybe they might be in a more relational world and then we have of course everyone using this stuff and we have our old friend metadata on the bottom there to monitor and the technical and the semantical or the definitional type stuff and those are your basic components. Now those are kind of new components compared to history here but in the area of concern obviously this is well-established technology. Now we can't say this is the next big thing. It is here. It's rocking and rolling. But we've noticed a couple of high priority areas and some real gaps in organizations and we'll call those to your attention today and the rest of the year we're going to dwell into them. Metadata. There is the technical and the business part but that's the tip of the iceberg. Lineage, meaning interpretation of it. The real issue here is only recently. Only recently have we begun to see solid tools to help address the total lack of metadata in say a traditional Hadoop map reduce type environment. There's a lot of folks that think schema on read is perfectly fine but that only works to a point. So metadata is a high concern here. Also security, privacy we're kind of keeping a tally sheet here for Kelly and I and with our folks and we've got a really nice handful of folks that have found personally information and information that needed to be secured has found its way into a sandbox or a big data environment. So you still need to do that. You might also have contractual agreements with the data you're pulling in from the outside that you don't know about. And you really need to consider those. Data governance is always there but it's oversight of the lineage, oversight of the quality and oversight of semantics. Now it's not maybe the same data quality we would have in MDM. It might just, it might have a much broader tolerance for variance. I think Kelly what we're saying here is that the data scientists cannot squeeze all of the quality issues out of the big data because of sheer volume. That is only relevant in a few kinds of models. The other rest of the time you know you're probably going to have to think about is this stuff really usable the way you're finding it. And then lastly latency, the access to use and some physical type things is how are you going to get to it? How long is it going to be sitting out there? And what type of structure do you want to set this in? The key thing to hear about big data and we'll hit the difference here with the sandbox is this is persistent data. This is going to sit out there and it's got all the risks and it comes with data just kind of hanging around out there. Yeah the only other thing I would add is as technology continues to advance so does our compliance and regulatory environment and so do our beliefs around appropriate use of data and companies own internal culture and policy around what is appropriate and what's not appropriate. So just because the technology can do something doesn't mean that it's necessarily correct, legally correct or even appropriate to do something. And so then when we talk about these high priority concerns it's keeping in mind what boundaries you want to put around that data to make sure that it is fit for use. That's an old term we use for quality for probably decades now right? But is it fit for use and are we structuring the access and the consumption of the data appropriately? Sometimes it becomes a sort of chicken and egg process where if we don't understand the data we can't actually ensure that it adheres to our security and privacy standards around permissions and policy and contractual arrangements etc. But I think that that's something to keep in mind as we look at the trending and capability from a technology perspective and how do we make sure that we're still getting insights from the appropriate use of that technology. In future webinars this year we're going to dive deeper into where some of these attitudes and things manifest and talk about data scientists and data quality and the differences in usage dictate these things and there's some organizations where they're not recognizing that. We're going to dive deeper into some of those finer points around that. Here's a use case just through somebody who's kind of new to this on the phone. We've got a few examples in here for you where we have a telecommunications company with the typical telecommunications thousands of terabytes upon thousands of terabytes of data but cross-referencing across a data element. So you're just thinking Hadoop, right? Yeah, just silly little stuffed element. I am just totally, I've got my merch wicks up here. And they identify churn and they've had a really good, and this is really what's cool about big data is back in the old days with Data Warehouse which took us a long time to find those little points of light. We've seen those points of light a lot sooner and that is a wonderful, wonderful thing but there is a dark side to that. We'll get to that in another webinar but we are seeing that the power of data is becoming recognized and that's helping spread the other parts of the organization. Let's move on to the next one because I obviously can't read that one. Let's see here. So the sandbox, the components are similar but a sandbox is by definition the mental impression with a sandbox or a lab is you can experiment, you can play around. Being standalone, but a sandbox does not have to be big data. We have the big data part represented on the top with our ingestion and analytics thing but there's also, you can just have a standalone bunch of stuff and good old sass and just bang on it. And that's perfectly fine. They tend to be very batched because you're going to load stuff and play with it and then the people using it are going to be an analyst or a data scientist. This is not something that you put out for broad access or productionize. I think the key there is if you start to use it all the time then you need to productionize it. So the high priority concerns here are as the data goes in do you understand what it really is? Going out and finding stuff and now data scientists love this exercise but you have to still do it then you have to, regardless of where you're getting in, you have to make it user board standardized in some sort of way and then you can go ahead and play around. So it's not just grab everything and throw it in close but not quite. With the security and privacy because you're putting in raw data that implies no controls. And by the way when I say raw I don't mean the granular raw data could be have some level of non-granularity to it but it's coming in pretty loosey-goosey and you don't want to control the environment. You want people to have fun experimenting with it but you do want to, for example, don't expose personal information if it's there and if you think there's a risk of doing it and you have to be careful about that. And with metadata, another kind of word for lineage here and maybe a more sophisticated view beyond lineage is the provenance and pedigree. Do you know where you're getting it and you're not working from it and you know what the possible uses of it are and that allows you actually to build some tolerances into your work. And from our performance characteristics again, non-persistent, non-production if it's production it's not a sandbox, okay there's you got to be careful with that. It is self-service just you know there it is folks have at it they're not going to be some production environment to wrap around it. And the other thing we found out and this is a funny one and Kelly and I and Malcolm we've chatted about this recently these things get dirty really fast. I mean like my room when I was a teenager we're talking you can't find anything and then it's not useful anymore. So there is a certain amount of housekeeping to go with these additional insights from you Kelly here on this one. Yeah I mean I think that you know one of the things that might seem to be a bit contradictory in this slide is we talk about the consumption components as being data scientists and data analysts only but then at the same time we just talked about self-service and I think that over time that's going to change slightly in the sense that self-service right now applies to data scientists and data analysts but as most of you have heard about the concept of the citizen data scientist and how there's this drive to free the information we've got several clients now believe it or not even within financial services that are talking about data freedom within their organization where it's self-service all over the place. So I brought this up here versus in the previous slide because sandbox are the first places where people really want to be able to access and play around kind of at any time but that trend will become broader and deeper and people are going to want to have more access to the data which means we need to consider those concerns around data discovery understanding standardization etc. So I don't need to reread those bullet points but we do need to consider that this is not going that the idea of a sandbox and the idea of self-service may start to come into conflict even over the course of this year as a matter of fact. The use case we have here is predictive maintenance which is near and dear to my heart that's where I cut my teeth in BI many years ago where someone has collected a bunch of sensor data from machinery and machinery that has some very rigorous maintenance requirements and they've saved themselves an awful lot of time and money by being able to predict parts failures using a sandbox using the modern approaches they've been able to take what a grinding exercise which might have taken six months to analyze time between part failures and stuff into only a few weeks which saves an awful lot of money obviously in uptime and things like that. So again another example and again there's many, many good examples of value of this type of technology. When we go to the real time oh by the way I see some questions are coming in I see one right now that I'm going to jump on as soon as we're done I'm going to jump on in a good way sorry let me just don't misinterpret that it's a very good question and it's very relevant I'm going to hit that one and don't forget please ask questions if you want to ask questions. The components for real time very similar except we've got to stream it in there we've got a lower latency for the inbound side and we've got to really stream it but we can do real time in a data warehouse we can do real time in an operational data store too real time in different ways doesn't necessarily have to be the big data technology environment. A lot of concerns are the same how fast can you get it in does the pipe handle it once it's in there do you need to tell anybody what the results are do you have some messaging things do you need the in memory type thing for the low latency and things like that. Security and privacy same thing you have with anything real time if it's going to be a problem to do something real time and have someone looking at it right away you know address it governance again we're going to look at metadata and compliance to make sure that all this fast moving stuff is being treated in compliance and from the whole standpoint of latency and real time getting it out is really really important so you're going to have to have good analytical facilities these types of structures will generate messages or agents or their own series of events probably and and if you're not used to that I call that closing the loop and if you're not used to that you're going to have the process aspect of that. These are very flexible structures and again very low latency very high performance things. Callie anything on your side? No I'm wondering though so sorry I know that you were looking at some of the questions that were coming in here that might be appropriate to answer at this juncture so maybe I'll read it out to you and by the way thank you for putting in the questions so I'm not sure that I agree with your depiction of how information flows between ETLE, EW and big data. Looks like you're still looking at the information in the old traditional way. Can you please comment? Yeah that's the one I wanted to answer. I'm happy to tackle that now. What we showed you is a sample architecture but that begs another question. That is actually a reduction of an architecture that currently works for a company that I've worked with and the question that begs is is there a right way or a wrong way and the answer to that is the traditional way data moves might be okay for a company and so that architecture the architectures aren't right or wrong so I'm going to I'm going to disagree with the implication and the question that there is a right way or a wrong way. The right way is the one that works for your organization so if the data flows traditionally for some reason because that's what the pipes can work or it works then you need to be open to doing it a certain different way. There's a lot of people entering the business now that don't remember a period of time when we started to really build larger data sets back in the late 80s and early 90s and that was the war between a dimensional model and the normalized model and the denormalized data models and everyone would ask me what one should I have and the answer is I don't know what does your business need so there is no right way or a wrong way so if you have a context that you don't agree with how this flows then your context is of a different organization but that doesn't mean that this particular example and by the way it's a very high level sample that it shows so the question that it begs is there a right way or wrong way that to me is no although we weren't trying to show any way in this picture but it does and thank you again for that question because it's something that we see all the time am I doing this the right way is there a reference architecture from my industry there's one we can start with but you're going to have to put some science and some engineering on this Kelly you want to add to that at all or move on to another one I think we'll talk about it in a couple of slides I think we'll talk about it again so real time analytics this is probably a more well known example using social media news reports everything speeding into a group that was monitoring this and the recent outbreak of Ebola was before the right people to deal with that were notified a lot sooner than they typically were notified before and it really helped contain that and this is a classic example of real time because real time media feeds, news reports social networks everything are being gathered by organizations that monitor diseases around the world and this stuff is being done in real time and actions were taking but you know look at the benefit where the benefits came from they knew how to act it wasn't just putting the stuff in there and some result comes out as what do you do with the result what follows after the result and this is a really really good example of the benefit of real time and analytics that's time to go into the I'm just going to check the questions here real quick see if we have another one here Callie do we see here a couple more but we can catch them towards the end so legacy structures they still have relevance we still do reports right we still have this intelligence we still have hundreds and thousands of departmental business analysts doing things that affect the ebb and flow of business just as much as the result from a sophisticated model is going to affect the ebb and flow of business the components have very familiar names like ETL data warehouse operational data store and they're still relevant because there's a lot of aspects of big data that is not relevant to them and that's the old volume of velocity veracity type things if they're not there you know just because you can doesn't mean you should you need to be open minded to those types of things we've come across several folks that feel that everything must be in a data lake and then somehow that will be made a broad spectrum structure that meets all the needs and maybe it can work and again maybe it can't but you need to put some thought into it the concerns we still have these days with ETL is you know the pipeline big enough because more and more data is getting moved and your old heavy lifting ETL batch stuff sometimes just can't carry the mail so you might have another high speed thing or parallelize ETL things also a lot of organizations are invoking service oriented architectures which means they want to move their data or use services as much as they can while services have a limit they will stop working or not they won't stop working but the performance won't be adequate at a certain point in time but if you do have services build a data layer don't just put it just don't do another business process service build data services so everybody can share in the delight of service oriented architectures you have to understand that you know Kelly we run into this we did a few things last year where this was really important to clients departmental use of things you know historical reporting I didn't put it up there but regulatory reporting operational ad hoc these are still really really important things and you know don't shove this stuff off into the corner as old stuff it's still really really important the reason these are really relevant is you still got variant latency variant historical requirements variant operational requirements lots of moving parts lots of different characteristics that means you need to be a little more flexible with things other than just one type of structure so we need to consider that Kelly anything that you would want to add to that one before we're getting up the question timer that's pretty cool yeah I think one thing that I just wanted to add is that and I can't remember if we had put in a use case for this one or not and I was trying to go back to our draft of slides John do we have a use case? I'll tell you what let's go through the use case and then I'm going to comment the floor is yours why don't you raise your hand and we're good to go so healthcare big hospital and they built what they call a data warehouse and it is a data warehouse because in terms of the way it's loaded and the velocity of the data it's not big data but they're still using predictive models against it and artificial intelligence against that data and loading in their electronic health records to do that and they're you know generating anticipated improvements and when I say anticipated doesn't mean they haven't recognized it that's their target metric is improved patient outcomes in other words create algorithms that say patient A is getting this kind of treatment has these types of physical characteristics you probably should do some other type of protocol with that patient before a doctor even figures that out and then of course that reduces cost because there's a lot we do a good bit of healthcare work and if you're not in the industry when you start to get to the decision sooner about care you eliminate a lot of testing and a lot of cycles and a lot of patients discomfort and you don't occupy beds that shouldn't be occupied etc it really really works into an enormous set of benefits for that and so here's another example where they're targeting really significant improvements in patient care and they are achieving them for sure but they're looking for really really big big numbers so Kelly I'll turn that over to you and let you weigh in here. Yeah I think that this is a great example of taking a look at the available architecture the available data capability and blending it with some new capabilities like artificial intelligence to be able to create a path and a plan from legacy to ultimately something that is more cutting edge and I think that that's one of the things that we want to and you'll probably hear us talking about quite a bit in the comments about it in the lessons learned section but realistically companies can't start at a sophisticated data science level because if they've been in business for more than 20 years they have something that is now considered legacy and it's virtually impossible to jump into that new capability without how to transition from a previous architecture with certain capacities and limitations to that new architecture and demand and so one of the things that we see that works quite well is when an organization such as this one starts to blend that legacy capability with some of the new components that are available to you as ways to consume and view that data so a lot of work is done to make sure that the patient records are correct, optimized and available and then leveraging that using some of the new technologies via artificial intelligence and other sort of front end analytical tools can help blend a legacy architecture with a new architecture so what I want the other thing I wanted to say is we have another client that's very much an exaggerated example of this where the company is over 100 years old as you can imagine their systems were cutting edge in the 70s, they still use some of them it's virtually impossible to completely sunset some of them but at the same time they have a head of strategy, they have a data science organization and they have a group of people who are on that cutting edge and that bleeding edge and so what has ended up happening is kind of an mess between the legacy data and the desire to use that legacy data in a forward thinking way and so what's ending up happening is the recreation and the spawning of multiple new data stores in this instance almost an exaggerated way because it's possible now with the new technologies you can create a click view data store in a day in a nanosecond and so you end up with this proliferation of data stores that doesn't necessarily help to get to the concept of data insight what ends up doing is you have a great sandbox but then when you go to production and try and really institutionalize that learning it makes it very difficult and so I wanted to just provide another use case of lending or the challenge that can come up when legacy systems are not incorporated into the plan when looking at how do I take advantage of artificial intelligence, natural language processing machine learning etc which is what we all want to get to from an analytics perspective. We allowed a lot of well the takeaway slide I'm sorry and then it's time for Q&A one more quick reminder here some questions are coming in I see one that just popped in and there's a couple more here that I have looked at and we can address but let's just kind of talk about what we would like you to take away from this. This is our initial look at this material we're going to springboard from here and drive a lot deeper as the year goes on if you're a technical person and you were hoping for real real depth can start there have to build a foundation we will get there don't worry and heck I'm a propeller head at heart we will certainly get down to those types of things but from this particular presentation we're going to hit yeah there's the question right there reference architectures are just that you may not need a lake you may not need a data warehouse or a sandbox you might need a Acme incorporated data store whatever your company you know if you work for Cogswell Cogs you might need a Cogswell Cogs data store of course anyone that recognizes that says oldest me then so but it's you know reference architectures are just that you need to have that framework that you can talk to people about and then you need to draw that architecture that the lines do flow the white lines you think they should flow and reflect kind of the things but just please remember there's not a right one or a wrong one there's the one that's right for you because of that we would like you all to not just cobble stuff together out there when you start to buy or died or most of you probably have started already a level you have Kelly we have clients have tried this stuff they've done a proof of concept of a big data and they've handed it back politely to the vendor and said no maybe next year but just don't start to slap stuff together and then hope you're going to make it all interface somehow a lot of people can't they discover that their pipes aren't big enough or what they want to do with the data they just can't even physically do the way their applications are built so step away from every exercise in this area is an engineering exercise it's not slap on a new thing it'll really pay off in the long run manage that architecture with your needs and your usage versus what we have oh we have a license for X Y and Z let's just it's a hammer let's make everything look like a nail may or may not work in this predictive analytics world it may work in this predictive analytics world but it may may not and then lastly I kind of dove into this earlier web services they're really an important tool a lot of organizations are agile you build a lot of services and you don't build data layers we'll talk about this architectural characteristic later in the year as to some of the things to consider don't take those off the list just because you're manhandling data and creating data lakes and things like that so we have a few questions here Kelly anything to add to our takeaways and then I'll start to grind through the questions here yeah I think just to reinforce the point that there is value to looking at a plan and understanding what the business usage is what the business goals you're trying to accomplish and what it will take to get to those business goals and that business usage and you know regardless of whether you are doing operational reporting whether you're doing predictive analytics whether you have a sophisticated data science group scoping that data science project scoping that capacity and those business goals and understanding the constraints within your technology within your network as you just said John within your business understanding is important to the success of the project and there's lots of different stories around when people were first experimenting with big data where it failed I remember maybe not too much last year but in 2015 you know big data is failing big data is failing well it's because they thought that it was a silver bullet that solved a lot of the problems that came up with you know quote unquote emergency technologies and I think now that we're understanding that it's not a silver bullet we're applying some of those previous capabilities around understanding of requirements addressing and understanding constraints and evaluating new technical capabilities to solve problems and the good news is technology is also innovating to a point in which there are so many different solutions out there that are enabling organizations to link legacy systems through to these new analytic capabilities that's helping to bridge that gap so I think that would be a key takeaway is remember those capabilities that you have within your organization around planning scoping that apply regardless of the technology that you're using and to take advantage of that okay Kelly stay don't mute we got a few questions that grind through here first one to talk about is any place to find out more specifics about what was done in the healthcare space yes if you reach out through Shannon she'll work to me and we'll we will make a connection and we have to you know obviously be careful of confidential things and all of that but if all the ducks line up we will have the right people in the right contact with you there we can take care of that so Kelly you want to start this one and I can finish up definitional point of view how is a lake different from a warehouse someone and as the question continues here I saw another term on information dams how is a dam different from a lake Kelly do you want to start that one or me or you know I've been talking a lot is the dam the expletive that's the response to the lake I'm teasing you I'm looking at the question right now so without giving away the next session so this is a great lead in to our next webinar which is talking about data lakes versus data warehouses and so I think the big difference is that there was the feeling that data warehouses were actually structured and we need to be able to put an unconstrained amount of data into a repository that will enable us to do all kinds of fun analysis and I think that the constraint of the structure of a data warehouse is what the data lake responded to so fundamentally I think a lake is meant to be less structured and I'm not talking about the data type being structured versus unstructured I'm not talking about the repository that holds the information whether it is structured in some sort of schema like a warehouse or whether it is just freely structured in kind of a file system sort of lake so that is I think one of the big differences John do you want to comment on that again we'll get into this that's our next session a month from now is grinding into those differences and they have different audiences now you could say philosophically philosophically they are kind of the same you know put a bunch of stuff in somewhere and everyone can enjoy it but there are some usage differences and obviously some structural differences but there are also some distinct similarities and success factors and challenges that you need to deal with more it sounds like I'm teasing a little bit for the next one but for example a lake as a metaphor a lake has water in it if you want everyone to enjoy the lake you're not going to fill it with polluted water which we see a lot of that so that's the thing as for an information dam there's also a dam bottle stuff up and holds it back so yeah you can have a structure where you might want to use it and then trickle it downstream to to users sometimes I think people invent these things because they sound like they've invented something again I think this is more of the distribution and the storage and it's the engineering behind it and you know hey a dam can be different from a lake or not different from a lake right I have a home on a lake that has a dam that created the lake so I kind of think it's the one and the same to me anyway we can get to that if you tell for sure I do there is a subtle difference but it's almost a philosophical difference as to your point of view there's also the concept of data ponds so you have lakes you have ponds as you have warehouses we had marts as we have staging areas or caches we have dams you know there's that going on so there is a subtle difference but I think it's from a distribution standpoint and a management standpoint more than anything on that one. Kelly anything to weigh on and I have actually two more short ones here that we can address nope that'd be great let's go to the other two I mentioned a big difference in technology and I did that I mentioned that you know there's a lot of cool things happening now you talk about AI and all of that is this you know are the current it's kind of a two part question really it's with all these big differences in technology are we seeing potential value from getting our arms around new stuff is it that different is it worth considering it just because it's that different so that's the first part of the question do you want to take that one Kelly absolutely I would absolutely consider it if your business case demands you to look at your data in a different way technology is out there that enables us to do amazing things with it now and you know when we used to think about the way that data is you know in the old days prepared it was you know cleansed and standardized and all of that and now when we look at data preparation and more of kind of a big data world that data preparation is taking advantage of things like natural language processing is taking advantage of machine learning is doing all of this stuff for us right and so that I think is a huge value if you can match your business goals to that sort of automated preparation which leads to more automated analysis and therefore your usage of that information can be more sophisticated and in more real time right so this is the value of all of this and I think that it's absolutely worthwhile taking a look at again looking at what is your ability to what is your priority from an organizational perspective and how do you implement it and consume it in a way that is meaningful and maybe you have your own little sandbox of new tools and technologies many organizations are really taking advantage of an innovation team where they have their organizational sandbox where they're trying out a lot of these new ideas and then they're moving organizationally into a production environment and I'm not talking just technology I'm talking organizationally going from innovation out so my viewpoint is absolutely take advantage of it but don't overspend when you don't really know what the goal is. Right yeah I think that's the key just unless you are really massive organization and can afford to just dabble you're going to have to put some type of thought on how deeply you dive into this and most companies have started it already but we answered our question. The other one the other second part and I'll start it and then we can move towards the wrap up here so quick answers on this one are traditional vendors adapting and I think they are and I think we're seeing that across the whole thing even data governance tools which you say well that have to do with big data they're actually doing big data specific things to help support analytics and stuff like that and your heavy lift ETL vendors and data quality vendors they're all waiting in and adapting to the new ideas and the new things. Look data is very very valuable the light bulb is going on and everyone is adapting if you don't adapt and you're not going to be around very very long. Kelly over to you and then we can wrap up. Yeah absolutely any of the larger companies that provide either the data stores, the data movements, the data processing or the data consumption if they are not organically developing it they will be acquiring it right and so that is the purpose of our startup environment here that we have I say here I'm not thinking of physically here in Silicon Valley but all over and our startup environment is what pushes the envelope and pulls along those legacy companies that may not be able to innovate quite as fast as new companies so think about it as your innovation environment and then your oracles, your IBM acquire those companies so if they're not innovating and they're not acquiring then they will be obsolete. Alrighty I think that wraps us up we're at the top of the hour almost Shannon we can turn it back to you and thank you everybody for your time we had a nice crowd today. Thank you. And we're looking forward to next month and the rest of the year. And thank you to you both for kicking off our brand new webinar series data insights and analytics what a great start to the series really excited about it and thanks to our attendees for being so engaged in everything we do just a reminder I will be sending a follow-up email by end of day Monday with links to the slides links to the recording of this presentation for everybody and I hope everyone has a great day thank you.