 Hello and welcome, my name is Shannon Kemp, and I'm the Chief Digital Manager of DataVersity. We'd like to thank you for joining the latest installment of the DataVersity webinar series, Data Insights and Analytics, brought to you in partnership with first-hand Francisco partners. To kick off the series or to kick us off for today, John and Kelly will be discussing building a flexible and scalable analytics architecture. Just a couple of points to get us started. Due to the large number of people that attend these sessions, you will be muted during the webinar. For questions, we will be collecting them via the Q&A in the bottom right-hand corner of your screen. Or if you'd like to tweet, we encourage you to share highlights or questions via Twitter using hashtag DIAnalytics. As always, we will send a follow-up email within two business days containing links to the slides, the recording of this session, and additional information requested throughout the webinar. Now, let me introduce our speakers for today. Well-known industry analyst John Manley is a business technology thought leader and recognized authority in all aspects of enterprise information management. With 30 years experience in planning, project management, improving IT organizations, and successful implementation of information systems, he is the president and chief delivery officer at First Hand Francisco Partners. Also joining us is Kelly O'Neill. Kelly is the founder and CEO of First Hand Francisco Partners. Having worked with the software and systems providers key to the formulation of enterprise information management, Kelly has played important roles in many of the groundbreaking initiatives that confirm the value of EIM to the enterprise. Recognizing an unmet need for clear guidance and advice on the intricacies of implementing EIM solutions, she founded First Hand Francisco Partners in early 2007. And with that, I will turn it over to John and Kelly to get today's webinar started. Hello and welcome. Thank you very much. Good morning, evening, afternoon, everybody, wherever you may be. Hello, Kelly. We are glad we made it home. Yes, we are. Yes. And we woke up at 2 a.m. into the house entry. Just me and the cats. That was all it was. Yeah. That's good. We're happy to be here. All right. We're going to start with a polling question, right, Kelly? Yeah. Let's get everybody's feedback. So we know that there's a variety of experience levels and organizational experience levels, meaning you personally may be very experienced in analytics in your organization might not quite be ready for it. So where is your organization in its readiness to develop a formal big data and analytics architecture? So you've got four choices here. We have no plans or architecture for analytics, but want to have one. So that's A, B, we have a strategy and our planning and architecture. We have started to implement a planned architecture or none of the above. All right. We wait for the little one. That's right. Everybody, everybody answered the question because as soon he answers, we get back to the good stuff here. Absolutely. No, it helps us target the conversation. Understand our audience. So everybody should fill it out. Even if it's none of the above, that's still helpful. Yes, it is. It's very well. We have actually for those listening and while we're waiting for the results to pop back in here, we, because of the nature of these things, they get specified as much as six months in advance, right? And it's on an editorial calendar and most of the events were defined back in November of 16. But we have taken a look at your responses and your feedback and we've actually made adjustments in this. And here we go. There they are. They're the results. So it looks like a lot of you or the majority of you, 21% are starting to implement a planned architecture for analytics. So that's great. Variety of you are still either thinking about it and considering it or haven't gotten to that stage yet. So I think we'll address a variety level of maturities in this discussion. Some of you will see some simple, flexible architectures that may resonate with you. And those of you who are wanting to get down into the details, you'll see some very detailed service oriented architectures as well. So, John, you want to get started? Yeah, I do actually. Let's see. So let's start with some typical problems and why we thought this particular topic would be a good idea. And that is a couple of scenarios here. One, we have a typical starting point for analytics, and that is an analytics team and some data scientists are set up and they're outside of a formal set of guardrails, which could be a data management or the existing business intelligence infrastructure outside of IT. It could be in a business area or something like that. And invariably, they do good work. No one is disparaging the activity of doing this. But after a period of time, what we have noticed is that there is a sense of, okay, that's awesome. Now, how do we institutionalize this? How do we get it going? And the answer to that is, well, we hadn't planned on that. We hadn't thought about it. We were just going to keep going the way we're going. Well, that's not how to make something sustainable. You will run out of brilliant little insights and little special things and have to go mainstream at some point in time. The other scenario, which again, we see more often than not, is that the CIO in their position decides that, yes, now it's time for the big data and analytics things and sets up a data lab and invites a few of the initial sponsors to do it. And within a period of time, after some little brilliant insights, people go, is that really all there is? Why do we do this? Because the mechanisms of advertising the successes, verifying the successes, making sure you actually did something that moved the business, those things are lacking. And you do get very rapidly into, well, what have you done for me lately type of scenario? And what causes this is an architecture. And so some of you are going to say, well, architecture is just some stuff pointed together. No, we're going to cover some why it's a little bit more than that today. That's some typical things that help shape our talk here today. So our topics, what is an architecture? We're going to talk about that first because we see a lot of architectures, right, Kelly? I mean, here's our architecture. What do you think? We get that a lot. Absolutely. And we go, oh, that's nice. And then we go maybe the justice or something like that. And when should you do this? When do you call it big data and analytics architecture? And when isn't it and all that? We're going to look over some of the key components today and some best practices. And we're going to try really hard again to have time for your questions and answers. So let's go into the definition here right away. Kelly will be popping in here and maybe tossing questions at me or helping clarify things as we go along here. And I'm going to talk about just the definition. We went to the DIMBOK for this because the DIMBOK, whether you agree with everything in it or not, is still a stake in the ground and it's a good place for us to start. So you have the general definition of an architecture, the art and discipline of buildings and structures. And you can have macro level to micro level. But how things kind of work together, that's really key. How do all the parts work together? Another way to look at that is the design of a complex system. And there's three concepts involved here. And one is abstract because you have sometimes the architecture, as we all know, is rather abstract. Anyone who's done a whiteboard session with arrows and boxes has done an abstract architecture. There's the apparent aspect, which is how it kind of fits in with the environment. And then the explicit planning thing, which is going to show all these moving parts and how they fit together in detail. Those are all architectures. But we all have them and we all use them differently. And it's really important to understand that there's different applications for different kinds here. So our essential definition is the organized arrangement of component elements, optimize the function, performance, feasibility, cost. Aesthetics, you know, I've never seen anyone. Kelly, have you ever heard anyone say that is a beautiful architecture or that's an ugly architecture? I don't know if aesthetics are that important. But the cost and the feasibility and the functionality, yeah, they're really important. I have seen a few that work kind of pretty, though, really. Eyes of the beholder. Definition for big data analytics is those arrangements of elements and things that manage and leverage those enormous amounts of data that we need to perform those fancy analytics. That's it. That's it. That's simple. That's simple. But, you know, maybe it's not that simple or else it would be a really short presentation. We do have to avoid going overboard, right, Kelly? Yes. And, you know, being from California, the Winchester Mystery House is one of my favorite analogies for what happens when you don't have a good architecture. And I think this might be a great place, John, to talk about the purpose of an architecture and how an architecture is used to create a level of understanding standardization and awareness. So, anyway, did you want to go through this? And yeah. Oh, I'm sorry. You can go ahead and do that. You want me to keep talking? Okay. I'm happy to keep talking. So for those of you that are familiar with the Winchester, it's called the Winchester Mystery House, because the way that it was constructed is quite a mystery. And what happened was, Johnny, you know the story actually just as well as I do, but the widow of the founder of the Winchester Rifle company was told that if she ever stopped building her house, that she would also pass away just like her husband. And so she kept building and building, and she created this kind of nonsensical home that didn't have a strategy around why different things were being built in different ways, because her purpose was really just to keep building. So what we want to do here is create this analogy of what you don't want to do is build out a Winchester house, and you don't want to just continue to build infrastructure and a technology infrastructure. There needs to be some understanding of why you're building a stairway. So why are you connecting one system to another? Why are you building an additional wing of the house? What's the purpose of that wing? Therefore, what should it contain and what should it look like? And we do grow our architectures over time, but we don't want to grow it in a way that ends up a stairway leading to nowhere or a wing of the house that is not fit for purpose of why it was originally built. Absolutely. In modern money terms, if you tried to reproduce this Winchester house, it would be an astonishing amount of money because of the craftsmanship that went into these nonsensical solutions. But we got to be careful of this bolt-on. You still have to, again, as Kelly was saying, the function and the performance and the feasibility and the cost, those are all things. This is why we do the architecture. It's just not so you understand things. I mean, that is one role, but you have to show how it works. The Winchester house does not work as a house. There are stairwells that literally go into the ceiling. And we want to avoid that. And I think a lot of us, if we're very honest, and look over our enterprise architectures, and we can see that there are little hints and tips of the Winchester house pretty much everywhere we look. So we think that's a really good analogy. There are, in the Winchester house, there are fireplaces and doors and stairways and all of that kind of stuff. We have a lot of, this is a pretty comprehensive, but certainly not, I think, everything we could put into a big data and analytics architecture. Now, remember, and this was on the prior slide, and I forgot to say it, you can do analytics without the big data stuff. Don't forget that. We always like to remind that because there's lovely value in your data assets, and you don't have to necessarily go whole hog on technology until it really, really is the time. We'll visit that a little bit later today. So the way we look at all these pieces and parts, we kind of put them in a functional column or a technology column or an organizational column. And then there's the supply chain and logistics, and we use the word logistics with great deliberation because that's the movement and the management of movement. And then there's the management, which is doing all these things that actually manage the asset and all the elements that help you manage the asset. And then there's the consumption side, which is where you get your value. If the data just sits somewhere, there's no value. It's just just cost. So you have to have a layer there that generates the value, and that's the consumption part there. And you can see without reading this, every single one of these, all of the big architectural chunks that you will see are pretty common that you hear about in big data, such as landing and pedigree preparation models, the analytics, the three kind of analytics we did in our last, I think two talks ago, descriptive, predictive, prescriptive. You see machine learning and AI there in the technology column and data lake management and HDFS and column there and graph databases. Those are all physical structures that we get in there in the streaming. Now there's a few things in the organization that you don't sometimes see, which we have put in there because we've seen that these are challenges. Security and privacy gets a lot of play. One thing we've noticed doesn't get its business continuity. There's a lot of you out there are starting to want to stream analytics and do kind of a low latency or use your prescriptive or predictive conclusions to change the way the business works. That's what the machine learning and AI concepts do. The problem with that, we had a few clients discover that when the environment gets compromised and all these wonderful in-the-cloud type servers still don't work or they let you down, restoring that environment is either a impossible, it cannot be done. Remember, Amazon had a major outage a year or two ago and it could not be recovered. Or there is so much data that traditional recovery methods take months. When we talk to folks, we ask them if you thought about business continuity in the context of your big data and analytics architecture. Then we have all the people and those kinds of things up there. We're going to take a look at these in pieces and slices here a little bit more along the way. Any comments or questions or insights required there on that one, Kelly? No. I think that as we start to see how this is used, then my questions are going to come up. I think it's a great structure. Yes. We wanted to look at the whole big picture now because of our topic here, which is when you talk about flexibility, flexibility doesn't mean everything is flexible. It means some things can bend and some things are rigid. When you start to work towards flexibility, you don't want to be totally malleable because then there's no predictability, but you can't be totally rigid. So any system has to have a planned amount of flexibility here. Now we're going to go through deriving architectures. We're going to talk about and to do that, there are two lenses for architectures. We have extracted these from our practice of architecture. There is the form. The form of the architecture is what does it look like? How does it present? It's because that's the part that honors the audience. There's all kinds of different stakeholders here and they all need to have something presented to them that they understand. The second part is the progression. Architectures have to be best fit. That's a word we use all the time. That is, they have to fit the purpose. They have to be effective. If it's a simple architecture and that is the best fit for an organization and it doesn't have kind of the exotic, latest, greatest thing going for it, that's fine as long as it is the best fit. If the business challenges are complex and you can't understand the analytics architecture without understanding the tire of the rest of the enterprise architecture, then sometimes we need to show the entire data architecture of an organization just to get a sense of understanding the big data and the analytics architecture. The first thing to keep in mind as we're talking about this is there's two lenses that people will look at this. It's the understanding it and you have a lot of different audiences and then how do we get to the future state? The future state is never implemented day one and we have to understand what is, and along the way, whatever we do still has to fit and meet the needs that we have at that time. And we consider these two lenses all the time in conjunction with each other, right? Absolutely, all the time, all the time. Is it a best fit and are all the audiences able to use this architecture? Kelly has a term that she used all the time. Does it stand on its own? Right, Kelly? You look at that picture and it should stand on its own. All right, so the forms that we kind of break down things is this comes from the definition because that just makes it easy, right? There's the abstract, so it conveys and the insight so people can go, yes, I think we should go that way. There's the apparent, which is the most architectures you see are what we would call an apparent architecture. You can see the pieces and parts, you can see how they hang together and you can see what they interface to. They're not the last one, which is the explicit, which is the kind that shows not just the arrows and the stacks and stuff, but you can actually manage and sustain and these types of architectures are almost at the blueprint status type thing. And whatever you do with big data and analytics, it isn't just throwing up the boxes and putting Hadoop and Spark and all these names in them. That's going to be meaningless to part of your audiences. You have to also address the functionality and the interconnection and the interface with that as well. So that's the three forms. And then there's the progression. Our progression is based on what we see and we talked about this on our first slide, our story. At least you start out isolated. So you can have, we have clients that we have a client now that has two distinctly separated, spun up at the same time data science areas. And we will call that an isolated start. And there's value being generated, but the audiences are limited and very, very contextual. Then there's what we call the recognized status and people are going, hey, this works, this works pretty good, we're getting insight here and the audience starts to expand. And now you start to, because you have a bigger audience, you have to have the architectural elements that accommodate the bigger audience and the higher exposure of the data. And so now you start to add some structural things to that. And then we have what we would call an embedded data-driven architecture. And that is the one that can support tactical operations. It can support monetization of data. Those are the ones that you're going to hear about going forward. But that is what we will call the last step type progression. In the current, there might be a fourth one someday, but right now that's the most advanced that we're seeing. The key here is your architecture is not static. You will have, just putting up the future state in a room full of people, it's not going to do you a lot of good. I mean, everyone's going, yeah, they're going to appreciate it, but they're going to know what's going to happen next week or what's going to all that. It's just, sometimes it's just not practical to just throw up that future state big picture and say that everyone's going to get it because they're not going to get it. And is this kind of a one and done progression or can your architecture flow through this progression at different points in time as a sophistication of your requirements change? Absolutely. Remember it's best fit. Most organizations embarking upon a big data and a predictive or prescriptive analytics environment, you're going to start with isolated because they don't, right, we've mentioned this before, they don't understand what it is to react to some of the output from this process and they have to learn how to do it. And sometimes that takes a while. Sometimes they stay there for a long time because they're getting good return on it and their best fit is to leave it like that. But then there's the, well, this is awesome. I think everyone should get on board, right? And then you might move rapidly. What we see probably is we see isolated for a while and then a quick, we're seeing an awful lot of organizations because we get the calls on the questions on data governance and big data and data quality and big data, those kinds of things. They move into this recognized area where a lot of firms are now. But then he embedded, you know, that's where you're starting to get really, really serious and that could take longer. I would say also Kelly to answer that question is that you might see multiple isolated areas before they start to come together as in a recognized thing. And you might also see a portion of your architecture more of the recognized type use and you might start to see an isolated example of embedded pop up somewhere. So this is a general progression but you could actually probably see these things split out into separate work streams based on the business needs. Again, best fit, right? Yeah, absolutely. All right, are we going to see some architecture? Yeah, yeah, yeah. Okay, here we go. All right, so we're going to look at a couple of examples. Well, you know what? It comes down to the Vs really. It comes to the most of the factors are based on your Vs. I start with veracity. Now, veracity was according to the genesis of the Vs and big data where Doug Laney started with the first three and then someone added veracity. Veracity is actually the first one. Your isolated places are, you're coming up with something that people trust. All right? You're reacting to a problem but you're coming up with something with veracity that just kind of establishes some trust. Then you get into the variety of volume and the velocity and we don't need to repeat those. Those are the big drivers of whether you're big data or not big data, but you need to consider those anyway, whether you're big data or not, in terms of just meeting the demands on your architecture. And when you talk about you have different contents, you have different types of velocities and things like that. Another factor we really encourage you to consider is have that blank sheet of paper mentality, what we call net new. A lot of folks say, well, what's our next step for our data warehouse and BI environment to get us towards big data someday? Actually, it's really a relevant consideration to say forget stepping through the generations to really strongly consider maybe blowing up the data warehouse and replacing it with a data lake and or using the data warehouse as an adjunct to the data lake. But sometimes this crawl, walk, run isn't really a good metaphor for architectures. It might be crawl and sprint and that's kind of okay actually. We're going to see an example of those. So those are things that we consider in a nutshell, how they go there. So that's a start. All right, so we're getting to the architectures here. Any comments on that one, Kelly? We'll jump ahead. Let's keep moving. All right, so here's our first progression. All right, here's an example. Now, this is the abstract example. You know, what's this thing doing? Someone says, well, we have to have some functions. All right, the people that are going to do it, the data scientist or whoever, or some analyst here. There you land the data. We have some way to get to it and we have some facilities to do it. We have some tools. We have a place to keep the data and maybe something to move it around. Operations is unsophisticated. You might be using your data analysts at this point in time. You might not even have a data scientist at this time. And if this is what you need at any point of time, this is the abstract version of your architecture. All right, it looks pretty simple. Someone would say, well, that's not the architecture yet. No, it's not your ideal target state, but that is a legitimate best fit set of components if you're just starting and you want to make sure that this works for your organization. And so why would you choose those elements as a first progression? Well, it's because they're bare minimum, right? The landing, you need a place for it to stop and you need to store it because you're taking the data off stream or off of the main thing. This is still something you're never going to do against an operational production source of data. You still need a way to get to it. You still need that access component. By definition, you are doing something more sophisticated than your BI tools can do. So you do have the analytics aspect. You might use your access tools. I mean, I was thinking about this way. A lot of organizations will use one tool to strip out the data to put it into another tool so they can do some analytics downstream. So you also need the storage that is high performance. It doesn't necessarily have to be HDFS or like a Hadoop. It can be a data warehouse appliance columnar type database or a graph type database if you're dealing with some unstructured content. It could be a combination of them. So you still need those basics, right? You need to get it from somewhere, put it somewhere and have that somewhere and have some people use it at that somewhere but these are the bare bones for that. Got it. Okay. So here's an example of what you might see. All right? And this is an apparent type architecture because it shows who's working with it and all that. I wouldn't call it explicit yet, but this is apparent. So here on the one side, you have an organization where they have their regular, their claims and their customers. Well, I think we call this an insurance company, right? They have claims and a customer and you do your good old ETL, your enterprise data, whereas BI reporting goes one analyst. Ta-da. Okay. Dot online. That means they're separate environments. Then you have some client data and it's coming from a whole bunch of different spots. And you ingest it and you put it into some dupe. You're using Spark and Hadoop, which is kind of like is the most popular kind of thing here. MapReduce is getting replaced a little bit here. Those things are being used. Predictive analytics on the data science. So this is an isolated environment. You could have hundreds of people using the enterprise data warehouse and the resulting data markets and stuff, and maybe a team of three or four data scientists just using this standalone or isolated analytic architecture. But this is the architecture that is the best fit for this organization at that point in time. And I think we see this a lot. You could replace the terms, right? Maybe people aren't necessarily using Hadoop. Maybe they're using things like something as simple as doing click or something like that, right? But the idea here is that they're kind of isolated workflows that are created for a reason. So it's not necessarily we're saying that this is good or bad. There's a reason that this was created and we want to recognize how it works, but also some of the limitations that come up as a result. So anyway, hopefully some of you that are on the line are looking at this and saying, yep, I've got something that kind of looks like that. Right? We did this. Yeah, exactly. And this also, and again, in the word for best fit, this is flexible. We had an organization we worked with where they had the scientists over yonder and then the data warehouse department was on the other end of the campus. And it's like, oh, how are we going to get these? You know, they're so far apart. They're not the same department, blah, blah, blah. You know what? That's getting into the apparent and the explicit architecture, which is what are all the people and process aspects of this? And everyone will have thrown up their hands because they started off separate. Well, no, all you have to do is evolve into those other aspects. Let's start to talk about the people. Let's start about the processes just because they're separate areas doesn't mean there can't be some collaboration defined, which also would be part of the architecture. So this is a great first step for evolution. Absolutely. The next one would be the second progression, would be the recognized. And there's a value recognized. And then again, here's our abstract. And now notice some more things pop in because they have to. All right? We had some lively discussions on this slide amongst ourselves here at First San Francisco University. The main campus, of course, not one of the satellite campuses. You have your staging and landing, and you have now the ingestion aspect of big data, which is not quite your batch ETL load, right? There's a lot more going on there. We have also added some glossary and reference or master data to that. Why? Well, because, like Kelly, right? More people are using it. And now they're going to say, well, what's that one mean? Especially if it's in a data lake, and that sucker shows up three or four times in two different contexts. Now I'm going to have to get a handle on that. Now this is where people will say, John, you guys are nuts because I have to have this future state and go for that because the isolated one is the one that doesn't work. The reason it doesn't work is you started to use the isolated one at a second progression level. You tried to take this isolated thing, and you pushed it forward and put a lot of people on it, but you didn't add the reference. You didn't add the glossary. So you tried to take, you had the wrong audience for the, you had the wrong progression for your audience. And this is why people say, well, this architecture is bad because it's primitive. No, it's because you pushed that form, that form of architecture, or that progression of architecture too hard. And that's why it let down, again, best fit, not ideal state. Okay, go ahead, Kelly. And I think the business requirements will start to push you through this progression, right? So being a master data person at heart, I look at slide, you know, what we went through in the previous slide and think, oh my gosh, you know, their customer data is going to be all over the place and they're not going to be able to come out with consistent results from their analysts and their scientists. Right? And so then when that becomes a problem, it actually pushes you into the second progression, where you take another more sophisticated step because you've identified cost justification to do so. Yep. And organizationally, we've added our scientists because now you're starting to get serious. You might have the, what do we call, what we call generally the competency centers, which would be an ACE, an ACE or an ICE or a BIC. I mean, they've had a lot of names over the years and people declare I've invented the ACE or the BIC or the ICE and that's nice. All right, but the fact is that you're creating, you're helping everyone up by doing the economies of scale, which is awesome. But now the economies of scale need to show up because you're scaling, right? Some formal policies and processes, our old friend data governance now has to pop up, right? You know, we hear a lot about the data lake is not governance, it's a data swamp. This is the point where if you don't do the best fit, you don't have the right progression and have the correct forms for that, then your lake will turn into a swamp. So you have to have these things move along. We also bring in good old security and privacy and also the logistics are a bit more sophisticated because we're having more people touch the data and we have to be better at touching it. Now we will say, this is the real cool thing about this when it's not what we've come from, Kelly, but what we're going to. And that is the one with a lot more goodies on it. Everyone can pretty much guess that the next one we're going to talk about has all the goodies back on it, right? But there's some things missing from this one that you would think should be on there, but we'll talk about those. Anything else on that one, Kelly? No, let's do the example. So here we go. We still have kind of a dotted line, but notice it's not a real big dotted line. We're insight driven. So this organization wants to monetize data. So they built the Hadoop Lake. They're ingesting. They're working on the pedigree. They still have their ETL. They still have their data warehouse, but notice what's happening on the backside. They have on a monetization thing, they're coming up with products that are from their data. So they're going to take some data, make a product, maybe sell it back to their customers or something like that. They are combining the EDW becomes another data source for the scientists. And of course, the data scientists would like data from everywhere. I was telling my story over at EDW this week Kelly about the data scientists that use COBOL code to get legitimate reference table. That's appropriate for the organization and able to strip the reference data out of the COBOL tables. And I mean, they'll get data from anywhere. And so the EDW becomes another source and they're going to blend it and do something cool there. But at the same time, hey, guess what? We're still supporting our BI and our reporting. So here we have a second progression. Things are expanding. It's becoming real enough that the organization is seeing this and starting to build it into other aspects of the organization. So are you ready for the third one? Yes. Okay. Data-driven organization. All the goodies are back. Now, first person's going to say, metrics model, we're going to clear the management. Why wasn't models and metrics management in the other one? Why wasn't metadata bold in the other one? Or taxonomies or things like that? Well, it's because we need, again, we need enough to get it to work and get it to move forward and not fall backwards. Okay. So we need this minimum viable state of technology and mixes and matches elements to be best fit but not tax the organization with things that they're not quite ready for. So, but now if you're going to be data-driven and now you're going to start to take prescriptive analytics and machine learning and start to run your organization and be much more proactive with your data, now you have to get really serious about this stuff being right. All right? And so now the glossary becomes really important. Managing your models becomes really important. A couple of one-offs from the data scientists and they can get their heads around a small handful of models. Great thing. When you start to have models in multiple departments, even you start to have, as Gartner likes to call it, the rise of the data citizen, right? Or the citizen analyst. And the people are going to have access to vast amounts of data and do their own analytics. More of a self-servicing. Now I've got to have some tighter controls on that. And the same thing goes for your metrics, being all your measurements or your KPIs or your indicators or your scorecards. They need to be treated as like another source of metadata. So there's a metrics catalog. There's a data catalog. There's a model catalog. And in fact, we have web services in here somewhere at some point in time. Yes, there's a web service catalog. Because you're really, you've taken your asset now and you are exploiting your data asset to its absolute max. And all these pieces and parts are there to keep it running. They're required. Again, Kelly had a great word. They're requirements. All right. They're not objectives. We do see too many people still saying our objective is to build a really good way to access all of our data. That's a requirement. That's not an objective. All right. So your requirements have to match what the business needs are. So there it is, everything together. Kelly, insight on that or question or comment? This seems to, you know, what we're doing here is we're adding little boxes. Is it that easy to go through progressions? Oh, no. I mean, this is the abstract, right? This is the abstract version of this. This is just telling people what it does. When we go on to the real sophisticated type thing, you're going to see that it's not, that you have to put some architecture to this. For example, this is like, suppose we're going to build a house and we decide that our first progression is a one room shack and then, you know, a little house on a prairie. We build a one room and a big room. And then by the fifth season, it's a really big, beautiful home. And I'm giving many away my age by remembering that, right? So it's the same thing here. What we have here is we haven't built a house yet. There's a big pile of wood on the ground and we don't know how it goes together. That's kind of the abstract thing. You know that's a house by all the pieces and parts that you see, but you owe it to the audience to show how that fits together and that would be the next part. Did that answer the question? Yeah, I think that it is, you know, it's complicated, right? In the sense that what you're trying to do is drive to self-service and create this, you know, truly data-driven environment of, you know, data citizens, as you said. Yeah, go ahead. Cool, let's go through this. No, this is great. Let's see the example. See, no, this one has some x thing. I put some, and I will tell everybody, I know there's someone out there in our listening audience that is going, well, these aren't what I was expecting. I wanted to see full blown complicated. We have a couple of those coming up, right? But we have to be able to have something that you can see on your monitor there, okay? And, you know, if you Google, honestly, if you Google big data architecture or analytics architecture, you can see thousands of stuff that people have put out there. And some of them are top down, some are bottom up, some are left to right, some are inside out or outside in. There's all kinds of way to see these things. What's really important is from the standpoint of our talk here today that you understand how to get to the one that's best for you, all right? And we can't in an hour show everything. So here we have pretty sophisticated. Look at the sources, unstructured, structured streaming Internet of Things. By the way, we have derived this from a real example of some organizations that we have worked with. So this is an amalgam of work we've done in the last year. This is not made up, all right? So this also has the net new aspect to it, too. So you have ingesting, but with streaming, we can go right through storm. Everything's ending up into the data lake. The data lake feeds the existing data warehouse. So the data warehouse was, you know, the whole ETL thing was just not cutting it for this group of people. And the latencies are going down, down, down, down, down. Why have a separate stream for low latency EDW and low latency data lake? Let's just do the hard stuff, get it in the lake, and then pull it forward. We're using storm to do the high-speed streaming, other facilities to just get it. Of course, everyone with a data lake, there could be 20 arrows going into the data lake. That's just how they are. We have a Hadoop connector. So people can actually take a look at the Hadoop as though it were a relational database. There's a lot of tools out there. I was talking to one at the conference here last week. I think it was podium is one of many that do that. And it's a way to organize and take a look and navigate the lake and make it even look relational to a user. And we have our citizen data scientists out there, and we're doing exploration and analytics and data products and applications. We are feeding apps here because we have a lot of good control. And you can see this is pretty sophisticated, even at this apparent, this is what I would call an apparent form of the architecture, but it's going to... This is something you can step through and show to people how complicated and where you really want to take this thing. Great. It's good? Yes, let's keep going. Oh, okay, yeah, I got time. Here's a couple more. Just another way to show it, the technology stack. Again, you want to get a little more explicit. This one is an apparent. I'd say it's bordering on explicit. So we have intake from all kinds of sources, preparations, permissions, dictionaries, indexing, pedigree. We have transformation and we have reduction. Some phenomena that we're dealing with here. Back in the old days, we didn't reduce data until it hit the data mark, which was by definition a summarized thing. Now, there's so much stuff coming in that it has to be reduced before anyone else can consume it downstream or else it's like drinking out of a fire hose. So we've got some things going on there. We have our analytical assets, BI reporting, separate places, lots of tools on top, a model server, because the model can now be accessed in a common spot and reused by people. There are tools out there that support that. And then you're just general everyday data access and all kinds of people using it there. So you're going from source to insight through, again, and this is going to show somebody all the moving parts and where things are being touched, where various entities are touching the architecture. Additional insights, Kelly, or onward? No, I think let's keep going. I think we've got a great analogy at the end. So here's a couple of good examples. But I think the analogy at the end is really valuable for people. So I don't want to keep interrupting. Oh, no, that's quite all right. So this one here is sometimes you have to show where the analytics and data use fits into everything else. So here we have taken the liberty of producing a data architecture at an enterprise level to show that you have data integration, data warehouse, data marks, analytics, very sophisticated metadata, very consideration of archiving, which is something that you don't see very much. We have apologies for the font issue there. Sorry about that. We have our transactions because that we have to show people that some of our stakeholders might say, well, where's our transactions? Where's our data? We have to show that. And this organization is service intensive. And we wanted to show that we're really anticipating a lot of different kinds of services here. And so it's not going to be your basic app service. It's going to be a lot of different services. And we're going to put that sophisticated service layer over the whole thing. So this is kind of a really good, big picture. This is really very close to a data architecture versus just an analytics and big data architecture, right? Yes, absolutely. Okay. So key components here. Oh, my gosh. Yes, we still got to get through these guys. Here's our analogy. When you're building these out, you're going through the various forms. We want you to think like an architect. An architect has to look at a building or whatever they're building and make sure that it doesn't come falling down when it's built. Because all of us have seen really exotic buildings and wondered, how does it stand up? Well, architects know how to do that stuff. We are going to learn to. You've got the heavy weight of the business strategy of the organization. So that top level of the I-beam. So imagine your house and your house is sitting on some sort of foundation and there's an I-beam running underneath that and is holding the weight of the house and distributing that to the foundation. Well, the business strategy is your house. That's the weight of the businesses on top of it. We need to distribute it. So the touch point between the business and your data I-beam is the data management layer. Now you have to translate that load through and then into the earth. So your house stands up. And that's our data wrangling. All the data logistics goes in there. And then we have to actually use this or use our house. We live in it. So it's a stable thing. So we have to have good use out of it. And that's the data. So we kicked this around for a long time. Two years ago we came up with this I-beam concept and it's really holding up well for us. So let's evolve this here. So I have my business strategy. I have my management layer and here's some of my parts of metadata, lineage workflow models, reference, et cetera, et cetera. Now we have to consider the fact that our modern systems are broken into two areas. We have vintage systems or legacy or heritage, whatever you want to call them. And we got to be careful how much money we put in them because they're pretty expensive to mess with, right? So I like the sign that's on the side of a police car to serve and protect. We just kind of keep them going and do good service. But what we're really going to go nuts on is the contemporary type thing. That's where the new apps are, the new data structures, agile methods are deployed there because we have to be flexible and responsive. In between, the wrangling layer is going to have abstraction, mapping, virtualization, services, ETL, all that kind of stuff is going to be going on there so we can do BI reporting, analytics, and mobile. So now I'm going to take this to the apparent level and show you how I would put my various components of data insight for my organization and things that I'm going to do. So I even have something in strategy here, which is aligning the business with our architecture, all right? And we have some places to some examples of the elements that would be in there to manage the data. And in the vintage area, we're going to have legacy BI, our warehouse, our march, our ETL, pretty traditional stuff, right? You can all see that over there. On the right, the contemporaries where the lakes upon the unstructured data and those SQL stuff is going to go and data monetization is going to happen. Now, I might get some really good stuff on the right side that the left side could use, but we can handle that with our beam in the middle, with the wrangling part. And it starts to look like this. So I have my vintage applications and databases and my ETLs and data warehouses and data march and views, relational views, right? And that's what our cross-generational layer or the wrangling layer will have. And on the right side, I'm going to have my agile developed applications, external data being brought in, unstructured data being brought in, Internet of Things type streaming being brought in, going through whatever that needs to be to get those into a no-SQL lake type structure. I might also have relational views there, right? And I might have something that I'm going to monetize. And notice for monetizing, I put that in green to show that we're taking stuff from the left and stuff from the right and doing really cool things with it. And you can see this one can evolve. If this picture is allowed to get bigger and bigger and bigger, I can make this explicit. I'm going to put my alignment in here. I'm going to put my governance in here. I'm going to make this the full comprehensive picture that honors it. And that's where we wanted you to think about this progression and the different types of audiences that you have. Lastly, because we ran out of room on the other page, you can break your access later down into the major components and types of structures that we see in that area as well. Kelly, insights, questions, comments on that one there? No, I just wanted you to get through this, because I think that this iBeam thought process is really valuable. And the idea is that an architecture is what supports your entire environment and needs to be supportive. And it needs to be able to scale and change and progress. So I think that this is a key takeaway of this conversation is what's the iBeam within your own organization that enables you to bridge vintage and contemporary. I would imagine very few folks on the call have nothing in the vintage category. That's just not reality. No, that is such a good point. Now, Gartner Group has, they have terms for that. We were reading here, was it the Mode 1 and Mode 2 or something like that, right? Exactly, yes. And several other organizations have taken this. You can't ignore this heritage. I mean, an awful lot of your data still sits there, right? Exactly, exactly. So that's why recognizing that there's still value there. And maybe over time you slowly change things from vintage to contemporary and you obsolete some of the vintage areas. But you're currently contemporary. If you continue as an ongoing entity, it's going to be vintage at some point, right? Back to the flexible. Yeah, at one point I used to be contemporary. Now I'm vintage. Oh, John, like a fine wine. So now here's something really cool. We don't have a way to show this. I think later in the year we have a topic where this picture is going to come back. And we're going to show that exact transition. How do you start to move from the one side to the other? And we actually have some material on that. And I'm hoping one of our topics will allow us to dive into that level of detail. Let's just visit the best practices here really quickly. And then when we have a few questions out there, not a ton. I think some of you noticed we have a little different format here. We're based on what we're learning about our great listeners. Kelly kind of played the role of a person clarifying and asking questions that we were finding typical from our audience. And we hope that really helped you today. If not, just let us know and we'll fix it, right? Best practices. Apply a different lens. There are different lenses here. We had form and progression. Then reconcile your old Juneau. Identify what's old, what's new, what's old current state, what's new current state, what's new future state. That all has to hold the business up, right? That's part of the thing. So now I got to hold the business up. I applied the IB. All right. What goes on the left? What goes on the right? How do I talk to them? What does it have? And then have a plan to get there. First of all, priorities really important. You have to know where you're starting with this thing, because you will. You will not go to the ideal state to begin with. You will have a progression. So just don't do that as an afterthought, which is most folks do. They do the ideal state and they say, well, here's phase one, what it looks like. Now just build phase one to meet your needs now and then evolve from that. And it's a slight difference, but it actually is very, very meaningful. And lastly, have a methodology. Establish your initial, your isolated, but start to look at that vision. All right. Get aligned with the business. Use the V's, which we talked about. What is, you know, we don't ignore the long term, but we kind of look at it visionary and requirements driven. You know, define how you're going to use it, because how you use it will create your progressions that are driven entirely by how you're using that data and it's role in the organization. Then design the architecture that's appropriate for the operating model and the requirements. Of course, there's always a roadmap to get there and there's a transition to something that makes it sustainable. So we're not, we don't want to fall back. So it's always good to architect, so you can't fall back, right? You have to always move forward. On that last one, I'm sorry, I'm going to head there. Anything to add on those last two, Kelly? No, I think we've got just, yeah, we've just got a few minutes left, so let's dive into some questions. All right. Let's do that. Let's see what we have. I have a, there we go. Let's see here. Okay. There's a lot of questions about the data sources. What do you refer on the client data? Elaborate on the client data. Those are abstracts. That data could be everything I know about my customer. It could be 100 files. Okay. You would be feeding in data from your, you could be feeding in data direct from your clients. You could also be combining that with your own transactions and dealings with those clients. That was not meant to indicate anything other than kind of a category of that. Another question here is the first progression, this is a good one, Kelly, is the first progression only applicable when your data is stable, streamlined, and properly managed, meaning duplicates and valid redundant, all that has been addressed? I have my answer to that. What do you think, what's your answer? Well, I think part of the first progression recognizes that because it's isolated, you don't necessarily see the duplicates at that stage. Now, those duplicates come up later, which is why we end up with a mastering approach, but that first progression are isolated use cases. Yeah. Honestly, that first progression is usually at the end of the day when you step back and look at it, that stuff was a mess. Because you're grabbing everything you can and you're leaving it to the scientists or the analysts to make sense of it. And they're going to spend 80% of their time making sense of what they're looking at, and 20% of the time actually gaining insight from it. So no, the first progression is probably you're more likely to have a lot of redundancy, a lot of duplication, and then you'll squeeze that out as you evolve, which is kind of cool because you can do cool things early on without having to fret about a lot of that kind of stuff. Right. And it kind of proves it out, right? Absolutely. So think about it maybe as sandboxes. Absolutely. Let's just see here. There's some comments on our travel problems, but I don't think we need to go into those for questions. What's the ideal team composition structure for the first progression? This is something you've seen, Kelly, and I've seen. I will let you answer what you've seen first, and I'll answer what I've seen on that. So I'm not sure of what it means in terms of the which ideal team, but because it's isolated, you end up having this many times the business unit driven, right? So if we think about, we see marketing organizations going down that, you know, Hadoop big data path at the bottom, and they've got their data scientists, and then that's isolated from maybe their finance organization, their customer facing customer service organization that still potentially leverages data out of the enterprise data warehouse. And so I don't know if that's necessarily an ideal team composition, but because it's isolated, they are also organizationally isolated. And so, again, what you, you know, that brings up a lot of issues, right? If you at some point want to combine the two isolations, if you will, that's where you end up with organizational issues to address, because you have data issues to address. Yep, yep. All right. It is the top of the hour. We serve, as usual, we certainly hope we've helped you, and thank you for your time. I will turn it back over to Kelly and Shannon for a wrap up. Thank you, Kelly, and we'll see you all next month. Absolutely. Thank you, Shannon. John and Kelly, thank you so much for another fantastic presentation, especially after just rushing home after being here at our conference this week. We really appreciate everything, and thanks all our attendees for being part of the community and being so engaged in everything we do. It's so much fun to meet so many of you at EDW this week, and we hope to meet more of you at future events. And we hope you'll all join us for next month's May 4th webinar. The role of the data scientists. We're going to be interviewing Kelly and John. We'll be interviewing a data scientist. Talk about what's going on there. And just a reminder, I always send up all of email. It's by within two business days, so by end of day Monday, which links to the slides, the recording of this session, and anything else requested throughout. So I'll get that to you guys late Monday, and so I hope you all have a great day. Thanks so much. Thank you all. Thank you all. Bye-bye.