 Hello and welcome. My name is Shannon Kemp. I'm the Chief Digital Manager of DataVersity. We'd like to thank you for attending Database Now Online, the first occurrence of this online conference produced by DataVersity. We're very excited about today's event and have already heard some great presentations and continue to have a great lineup of sessions for you. And of course a special thanks to all of our sponsors today to help make it all happen. Just a couple of points to get us started. Due to a large number of people that attend these sessions you will be muted during the event. For questions we will have a short Q&A at the end of each presentation today and we'll be collecting questions via the Q&A in the bottom right-hand corner of your screen. Or if you'd like to tweet we encourage you to share highlights or questions via Twitter using hashtag DB Now. If you'd like to chat with us and with each other we certainly encourage you to do so. Just click the chat icon in the top right-hand corner for that feature. For this event we will send a follow-up email next Monday to all registrants containing your unique login to access the recordings and slides from today's presentations. Now let me introduce to you our keynote speaker for today William McKnight who will be discussing selecting a data platform in 2017. Just to give you a brief background William is the President of McKnight Consulting Group. He is an internationally recognized authority in information management. His teams have won several best practice competitions for their implementations and many of his clients have gone public with their success stories. His strategies from the information management plan for leading companies in various industries and William is a mentor in information for a startup accelerator and taught at Santa Clara University UC Berkeley and UC Santa Cruz. And with that I will give the Florida William to get today's keynote started. Hello and welcome William. Hello Shannon and everybody thank you for that and congratulations to everybody really for the wisdom in taking time out of your day today to get some great education here. The speaker list is impressive and I'm proud to be among them and without further ado I'll launch right into my subject here because this is a lot of what I do selecting data platforms for our clients. Not outright selecting them maybe it's recommending maybe it's creating the process by which a selection will occur. That has to be mobilized every client's a little bit different every one of you is a little bit different in terms of how you approach this subject but I want to give you some knowledge for that for that process and I think all of you probably would have heard about all of these platforms I'm going to talk about but maybe seeing them here in one place will help spur some of the proper activity around platforming your data out there today. So okay that's a little bit about me I think we got my my bio down there and top performers realize they need data. I think this is a foundational principle that everyone in data must adopt. If you don't believe this then maybe this is not right for you because we need to be pushing data out to our clients internally as much as they're asking for it and I know that's hard when you're underwater and maybe the the requests are more than what you're able to deliver right now but that's a good thing right that's the you have more more demand than you have supply and I want to keep you in that situation frankly maybe not to an overwhelming degree but we have to be leaders we are sitting on the gold asset really of our organization we're sitting on how our organization is going to compete and get competitive advantage and really sustain or not over the next decade it has to do with data all strategic objectives of your company can be supported if not outright solved with data and so they don't always know all the possibilities but you do or you should and we're going to expand that envelope a little bit today because like I said the top performers realize they need data and I've done some maturity studies otherwise and I have noted that the highly mature with data companies are the highly successful ones in the marketplace there's a strong correlation actually and those industries that are doing more by data are actually doing better than the ones that are not and the companies within those industries obviously that are doing more with their data are doing much better like for example big data you've been staring at this little study here for a bit that's one of many one of many out there the average performers are thinking about big data but the top performers are expanding their big data implementation but we got a platform this data correctly and we have to have the mindset of let's give them all data fast and effectively I've lived through all three of these pillars that you see here many of you have as well but hopefully we've moved our mindset past just give me some data and give it to me fast give me good data but do it efficiently and now it's give me all data fast and effectively and again if your user community if your applications community or is not asking for data that's a problem right there they need to be asking for data we need to be helping them ask if you will that's the leadership and so I will always leave that out there I think that leadership goes a long way in this business of data but we're here to talk about platforms so there are three major decisions when it comes to platform today there used to be fewer than this I'll put it that way and those of you that are doing the same old same old with your platforming decisions a same old same old that you've been doing for the past decade really it's time it's time to think outside the box and time to do something differently I can almost assure anybody that would be on a call like this that you have problems in your organization that can be fixed with something other than the one or the two or even the three that you've been kind of rotating maybe for the past decade there are other possibilities out there and I'm not trying to overload you with technology you know take on these 12 platforms not necessarily but today's world is we got to get the platforming correct for the workload and part b to that we got to make it all work together with data integration data virtualization I'll come back on that a little bit later but those are the two big things it's not about the data warehouse for example being the center of the universe everything goes in it all processing happens there then we didn't quite achieve that we've really backed off strong on that especially since our warehouse is typically on our database and we have these non-relational platform possibilities today that actually have a lot of value proposition for you so the first decision is the data store type do we want a scale out system a file based scale out system or do we want a database yeah it used to be everything with database that was the default that was clearly what we're going to do for everything and we can loosely call these scale out systems databases if you want but technically down at the bit and bite level they're not databases because they don't have the same framework around the data they don't have the same data page formatting and stuff like that going on I'll come back on this a little bit later but then this is a solid decision that you need to make in terms of what's best and I will give you some pointers as we go along how about data store placement emerging over the past I'd say five years we have a real viable possibility here to not necessarily put the data store into our data center you know what that you can walk over there and touch and feel and whatever it's the cloud option that's very viable today and I'll come back on this too because there's different clouds so that is definitely a major decision a third major decision and we're starting to lose sight of this but we should keep it in mind is is it an operational workload or is it an analytical workload now I know you're thinking maybe that well these are blending together and you know the lines getting blurred maybe a tad in my view but there's still distinct there are still distinct workloads and that means distinct platforming solutions for operational and analytical workloads there are they are architected differently to support those different architecture different workloads so these three or three major decisions and three is a nice number for a slide but heck number four decision I would put up here is the split between HDD SSD and in memory because we are starting to be able to exploit a lot more in memory these days so that's becoming much more prominent in the decision-making process so it's not about okay you know what have we been doing for the past few years let's just do that again for this new workload let's think about it at some point you've got to stop and think about it and bring some process into that maybe you maybe you get one or the next one or two wrong but let's start building to where you start getting them right because it has a lot to do with success of the platform of the application and of your business now this slide is a bit of an eyesore especially when I just kind of you know put it up here all at once as opposed to building it but my point is not to go through all the details on here but to show you in context that there's a lot of different platforms that make sense inside your organization if you're a mid-sized organization if that's what you call yourself or above obviously yeah a lot of these platforms are going to make sense so you need a good data architecture you need a good data architect you know one that is looking out for this that knows what I'm talking about here today in detail across the board understanding the cloud dimension to these things understanding what the value proposition of a multi-dimensional database is versus a data warehouse which would be a relational database versus Hadoop versus a columnar database which might be your data warehouse maybe not and then there's in-memory databases so having a tiered architecture especially on the analytical side is very important and knowing what platforms sit in the same tier that's pretty important as well so here's a little exercise for you I like to do this to my new clients come to a meeting and give everybody 10 minutes and a blank sheet of paper let's sketch out our data architecture here and you can you know if you haven't paid attention to date architecture you can see some pretty weird stuff coming back and some pretty disparate stuff coming back and I like to see some even if it's bad a bad architecture I like to see everybody on the same page because then we can start moving forward together so that's a pretty important exercise let's sketch it out and obviously we got start building towards what our future architectures going to look like now on the operational side of things over here where you see no sequel some relational database legacy sources operational applications and so on we have master data management that's pretty important for your so-called enterprise dimensional data and it's pretty important that that be in the operational side of things not over here in the data warehouse so master data management although I'm not talking about it a lot today because we don't put large amounts of data in there it is great for mastering I shouldn't use that word but for organizing the data of that is so important throughout your enterprise your customer list your product list and so on and I'll move on because I'm going to zoom in here on the data warehouse and help you make some of those some of the decisions around that ecosystem as we go along but further on this theme of data our information is exploding our business is real-time all the time and if it's not or you say it's not you need to work on that because your competitors other businesses out there the possibility to exist for you to be real-time all the time as well and we got to get there our information differentiates us our information quality impacts our clients I'm not spending a lot of time here on data quality I know some other speakers have yeah that that is very important throughout everything I say throughout all the platforms I'm talking about all the data you're putting in there and pushing out it's got to be of high quality and that high quality by the way and I'm sure the other speakers talked about this but it needs to be it needs to come from the business in terms of you know what the rules are that you're measuring that quality by we call that a data governance program right so one of the first things I want to do no matter why I'm out of client is to lay down at least a mini data governance program something we can build on over time but something where I'm going to get that business input to what I do because I do want the requirements coming in although that's not the only criteria I'm going to use to platform a workload and we'll get into that so we're in the business of data and we're all measured out there as data professionals our success is measured based upon a finite set of things obviously obviously obviously user satisfaction is very important user satisfaction in our work giving them what they asked for yeah that's all part of it but many of us stop right there as if that's it we've done our job check the box we're done today it's important for a data professional to also spur business ROI and growth through maybe data projects maybe ideas about applications that are in place today for maybe bringing more analytics maybe bringing machine learning and artificial intelligence into some of the things that we do as a company but bringing ideas to the table and taking them for than just the idea but starting to lay out some plans to get to some higher planes in terms of data maturity and finally we are measured on data maturity and what is data maturity is this is it some feral thing that you know we just we just sort of talk about in academic circles no it's about long-term user satisfaction and business ROI it's about being efficient it's about creating an efficient environment that we can add on to without starting all over again every time and obviously many many companies are doing that with their data they don't get any leverage out of what they built before the starting all over you can't do that for very long so data maturity and I have a whole presentation on that this is not that but it is a thing and it is something that's well worth you measuring knowing where you are acknowledging that acceptance being the first step to change right and and starting to get on a path to grow that because that will be highly correlated to your success and to your business success and then of course there's other things that were measured on that I won't get into here but don't forget about our obligation to our business to bring ROI and maturity too often we focus on the I we focus on the tip of the iceberg that's above the waterline we're not focused enough on the data platforms how to how to how to get real leverage is to focus on the data platforms you should be able to put any bi tool on top of a great data platform that has data that has been put into the proper platforms and you should be able to get value out of that immediately the data if well done should be screaming out hey here's what to do with me I'm telling you what to do with me you don't have to get in here and do 10 steps obviously you know we'll get more sophisticated as time goes on and we'll figure out what what else we need to be baking into our data infrastructure but for today many of us are upside down in terms of where our priorities should be let's bring it back to our data platform when I'm platforming I want to know the data profile and I want to know the usage profile I can get a lot out of the data profile though if you can tell me you know the size of the data the profile of the data in terms of you know is it structured or unstructured what some of some sample records what do they look like how frequently is the data coming in where's it coming from how frequently does it need to be accessed or what's the quality of it etc things like that if you can tell me that I can pretty much say without knowing you know the usage pattern of that data where it belongs in an organization because a lot of organizations over time at least they grow their usage capabilities they grow their desires for what they want to do with the data so to platform it correctly you got to look at the profile of the data and too many of us are just looking at the usage profile and I'm telling you that the data profile says a lot about where to put that data if it's unstructured big data and we're into Hadoop in this organization you believe as I do that Hadoop or whatever that you know ecosystem matures into is going to be around for a while that's where it belongs that's where it belongs and the data science of your organization will grow over time as we make more data available you see one thing I do with my clients is we look at all data in the organization and we say is it under management or not and how well is it under management if it's not under management that means that the data is happening but we're not storing it it just happens and it goes away it's very operational and maybe too much too much data is actually going through that process but if you're like let's say you're highly mature with that piece of data it's not only under management it's in the best platform for it to succeed and by the way you can do this across the board with all your data it's under management or not you still wouldn't have done the full job because there's all this third-party data that you can utilize as well it needs platformed as well and so really the there's not a lot of boundaries around what data you can use the only boundaries are your imagination and your data science because when you talk about third-party data there's a lot of data available so as we make our data platforming decisions we are obviously trying to get success out of the workload right okay so what determines the success of the workload performance is number one provisioning how quickly can you get it up and running that's the world today how agile is it scale can it can I start small and grow and not have to you know monkey with that process too much and cost of course of course costs we don't want to overdo the cost equal part of the equation right but performance can override a lot of things if we can give our users better performance out of our platforming decisions they can grow in their capabilities with the data they're not going to be limited because all this query is going to take five minutes and I've only got a half an hour here so this is you know I'll run a few of these and that'll be it they'll get to deeper levels if those queries are popping and that's not going to happen if you do the same old if you haven't thought about it for a while so I'm encouraging you to think about your platforming so that you can give your users performance quick provisioning high scale and relatively low cost I know it's a lot I know it's a lot so there is an increasing probability that platforming selection leads to success again I keep saying same old platform you might as well be throwing a dart against the wall in terms of whether you're going to have success today I know I look at our portfolio of deliveries over the past few years the requirements have gone up tremendously in terms of the number of users the performance expectations the amount of data the complexity of the analytics and so on right isn't that you too well we need to think a little bit differently we need to get into the best category for example if it's a dupe it's a dupe okay if it's for Hadoop data that's the best category for it I didn't say that well if the data profile says put me in Hadoop then the category is Hadoop now there's obviously different distributions of Hadoop if you get it into the right one for you and there is a right one for you you have maximized your probability for success if you get that wrong you still got a pretty good pretty good odds of success but there is an increasing probability that platform selection leads to the success as I've defined it now what about cost what I did is I looked at our the PO's that we've solicited for our clients over the past two years and looked at cost per gigabyte or terabyte or what have you storage level and I cross-referenced that with the functionality and value which is obviously much more subjective now don't take this to say that speaker said put everything in a graph database because it has great functionality at low cost that's not what I'm saying there's a place for all of this in your organization but you're going to but there are some things that are more specialized these are the things that that that these platform categories really tout like for example in memory super fast performance it's it's unparalleled in terms of versus everything else on here now is that important sure I just said it was important right but is it important at all costs are you going to get that exponential value out of it yeah maybe maybe not so for selective workloads you will so it has high special functionality that you're not going to get anywhere else master data management mastering that organizational metadata if you will yeah it's specialized for that it's specialized for that so that's high functionality then you get to things that don't give you a whole lot extra like I do but it is very cost-effective and obviously it's going to grow in its capabilities but keep in mind all these possibilities and we'll drill in here a little bit now let's look at the data warehouse ecosystem something that's been near and dear for me you know for a couple decades now so we got that source data stuff going over there obviously that's a complex ecosystem in and of itself but we want to bring that data over into analytics right so there's the there's the good old data warehouse and it's sort of I don't mean this in a disparaging way we need a data warehouse really pretty much every client needs to put more energy more more dollars really into their data warehouse and they'll get more back out of there than just about any place else in the company you know that's still where we are with their warehousing however however it is sort of the lowest common denominator in many organizations and what I mean by that is you already have 10 workloads in there 20 what have you however you want to count that you got a lot going on in your data warehouses out there I know that well it gets hard to add another one it gets really hard because you got this committee basically now going on that you have to impact and that slows things down a bit like who's going to want to step up and say okay yeah I'll jump in on paying for that that in memory database making our data warehouse in memory that is or I'm good for you know if you want to take a weekend and spend the data warehouse to be columnar or move it off SQL server move it into an appliance yeah I'm good with that no that doesn't happen so the data warehouse becomes kind of a bottleneck in some cases now that being said it's probably best for about good a good 80% of your good old reporting requirements but I want you to get away from just good old reporting all right and get into some other things so at the same level of the data warehouse we've got Hadoop looks pretty big it can be big it's not necessarily bigger in importance but for many of you Hadoop is going to be by far much larger than your warehouse in due time if it's not already then you got other appliances for specialized workloads like I said I've got some very specialized requirements now that I haven't had to deal with for quite a while so exponential uptake in terms of performance expectations concurrency etc you may or may not get that in whatever you put your data warehouse on five ten years ago that may require a separate appliance if you will maybe and obviously we have our data mart later and some of you are looking at going but my data warehouse is on an appliance yeah that's cool as a matter of fact no two shops are the same when it comes to this I am only saying that the interplay here between warehouses and mart and maybe some some some marked at the same level as the data warehouse and Hadoop should look something roughly like this all right we try to bring our clients usually more complicated architectures because of sprawl into some sort of architected fashion and it would look like this moving on you want to consolidate to right fitting right fitting platforms and so within the data warehouse you've got the data warehouse you've got these oldies but goodies they're still relevant and for younger folks that aren't exposed to some of this maybe this theory of behind data warehousing it's still relevant data mart have talked about that an operational data store a staging area an analytical application with specialized needs so the point of this is if you have a sprawl thing going on with your data even though we're trying to platform data correctly going forward you might want to rein in some of that sprawl that you have and put the databases into their right context and the right utility within your organization if it's really a staging area not a data warehouse let's quit calling it a data warehouse let's call it a staging area and then let's build a data warehouse because we all need one and data warehouses belong in on relational databases okay to be sure but what about some specialized workloads okay in-memory databases the benefit speed speed speed speed all day long and also that open speed just opens up the possibility of doing more doing doing whatever you're trying to do on a limited set of data to all data and hopefully that opens up many more possibilities for return on investment from this platforming decision so the considerations are really what data to put in RAM what processing to do in RAM how does the solution handle fault tolerance and how will it integrate with other systems so without going into you know the architecture of these in-memory databases I'm just saying that you probably have a workload or two that you know a single digit number of terabytes let's say but it has performance expectations off the charts and in-memory databases could be a nice solution for that got to make it all work together of course but here's some hardware perspective on that SSD yeah here's some rules of thumb to walk around with your mileage may vary but we're a lot of people out there are still hooked to their HDD this is that fourth decision I was talking about earlier SSD and memory yeah that's sort of the way to go without reading all the numbers that's sort of the way to go and so really open up your mind a little bit or open up the possibilities here to what more memory can do not I'm not talking about an in-memory database here necessarily but just using of using more memory than ever before because of the lower cost that it now has for us and the demand for performance I always want I want to wind up my sales when I'm making these platform decisions I'm not wasting money but if there's better performance to be gained it's not ridiculous to think about I want to do it and I want to do it because that puts the wind at my sales gives me a little more room for error as we go through our design process let's go maybe a little faster which is pretty important as well not that we don't need a great design etc but I just like that wind at my sales now when it comes to cloud that's that other dimension of the selection and this looks a little messy but cloud architectures can be that way you might have for example just honing in on the data warehouse side of things you might put your data warehouse in the cloud well what about your BI might you put them in the cloud what about four systems what about data integration could that be in the cloud what about MDM could that be in the yes to all of the above now is that right for you I don't know let's get let's break that down here in a few slides but I do know that the possibilities exist for all to be in the cloud and all to work together in the cloud and many are making the decision today that their data gravity is in the cloud maybe it's third party apps maybe software as a service apps that they are particularly you know kind of circling the wagons around as key parts of their company well that's in the cloud and so maybe other things in the cloud wouldn't be so bad as well and where do we start though you know I say start with the data now every client different obviously we have on-prem data warehouses with cloud BI with cloud data integration we've got it all going on all of the above right but when a client gives me greenfield we're thinking hard about the data starting with the data putting the data into the cloud and the other things will follow and I think a mature architecture not only has some cloud but has a lot of cloud in it today those are the leading organizations they figured out a way to make it work now there are different cloud models and it's pretty important to get into the right one for you here now private cloud and I kind of have to put I should put quotes around that actually because the term is sort of thrown around pretty loosely I know what you're talking about I think we all need the benefits the true benefits of the cloud and if you're getting that from your private cloud that is great if you're getting elastic scalability if you're getting rapid provisioning if you're getting great charge back if you're getting access to a wide variety of resources then good for you great keep on doing that but so many so-called private clouds don't give you all that that they used to be the system administration group and now they're exercising more control and I guess giving themselves a different label not to be too cynical about things but if it's done well a private cloud gives you control because it's yours gives you the ability to customize you don't have to worry about you know if every other client on this cloud gives you data residency that you can localize as good as possible and it gives you perhaps false but nonetheless trust in your IT organization and the cons of this are the cap X part of this obviously pretty important you got to bring your own dev ops to the private cloud yeah that's a big con to it in my view bring your own back up in recovery etc and I think it can be a con to place a lot of trust maybe misplaced trust in your own IT versus that of a cloud so that is some that is you know an equation to really look at and many are looking at that very intensely when they start to make their decision about private cloud versus public cloud let's talk about the public cloud we know what the big three or four or five are right pros true cloud benefits all the things I mentioned before you got it scalability in spades provisioning very quick operational expensing software as a service application you're already there great cloud gravity going on sharding across the world into all the resources of the public cloud if you need if you need and a big one to me is the dev ops all the non-functional requirements the availability the backup in the recovery the performance etc now cons with the public cloud sprawl so easy to get into anybody with a departmental budget can be getting into that and creating sprawl the long-term cost studies have shown that the long-term cost of the public cloud if it carries out from today linearly into the future would be prohibitive at some point in the I'd say you know upper single digit numbers of years I don't know anybody thinking too much about that because some think that it that will change and it will be it will be actually long-term cost benefit to the public cloud but stop when it's not at that point because if you're just thinking about the storage cost that's not enough that's not enough you've got to think about all the pros of the public cloud so are we leaning that way yeah we're leaning that way for a lot of things you know if the workload dictates that now there's also hybrid and the pros and cons really depends upon where you are in terms of exploiting the benefits and the cons of the private cloud in the public cloud because hybrid can be all over the place but one use for a hybrid cloud which I found pretty interesting is cloud bursting which is using the cloud resources as a failover if you will or as a peak workload type of resources that you can bring to bear you know that late in the month when you have the you know the the mass of processing or whenever it's tricky to set it up that way but many are exploiting that benefit of being hybrid cloud so cloud models it depends think about public don't write it off distributed file systems now this is what no sequel database is so-called database and Hadoop clusters are they are distributed file systems some of us have been around long enough to remember the older file systems pre-data pre-database okay yeah it's kind of like that and you got your data blocks that are spread out across your nodes which are low-cost commodity computers with everything built in but the the value of the distributed file system is it makes it all work together seamlessly now there's no raid and stuff like that the blocks are spread around and if the node goes down which does happen one of the secret sources of these things is that they'll pick up a new node that might be in the same IP range as these nodes and obviously as you can see it can restore all the data so there is a separate piece of understanding to have about how this platform works such that if you are contemplating platforming a new workload can you live without the benefits of a database which has an ID map on the page and references to where the records begin much better at random access and things like that so distributed file systems definitely lower cost but maybe not as good on some of the DevOps and some of the other things today Hadoop is a distributed file system it is the quintessential distributed file system there are many patterns that make sense some before some after the data warehouse we particularly like the data lake and we are architecting data lakes for our clients and finding great utility for them when the client has great data scientists or an emerging data science program I'll put it that way and if you don't you probably should you should work on that so it makes sense in that data science lab pattern for you but some of us are using it as a data refinery which is you know taking off some of the ETL workload which is great we use it for that as well some of us are using it at the back end of a data warehouse or other systems to archive off some of the colder data get it on lower cost platforms where it's still available it's much more available than tape okay so Hadoop is better than tape in this context and it's so much more than that but you know it is it is meeting that pattern quite a bit as a specialized application store one that needs all your unstructured big data your Spencer data your clickstream data your social data your server logs your smart grid data your electronic medical records videos pictures geo location data etc all this data all this data belongs in Hadoop where most of my clients are where most of you guys are I think probably is you got one or two applications going on for that data and so it's not exactly a data warehouse it's a specialized application store we'll learn more to do with that data but we might be summarizing some of that data off bringing it back into our relational database data warehouse which brings me to my final point Hadoop is the data warehouse I'm going to just quickly say no to that idea and move on to the data lake now I'm not going to go through the architecture of a data lake here I talked a little bit about it and one of the nice things of the data lake is it can serve a dual purpose it can also be the staging area for your data warehouse but it can be so much more than that once you get your data scientists involved it actually becomes the place where of course you've got big data but you have all data that the data scientists can utilize because this is their playground now I like to say one step ahead of my users I like to understand exactly what they're going to do with their bi etc I like to offer them data on a silver platter I have not figured out quite yet how to do that for the data scientists and I don't think that they want me to they want access unabated to all data and this would be the place for that in the data lake so we're having fun architecting the data lake and putting that in that ecosystem now there's also no sequel data stores some of you may have been waiting for this and thinking well when they want to talk about that isn't that pretty viable yeah yeah really is you've got key value stores you got columnar stores you got document stores okay they're all twists on the same idea these are scale out file systems for unstructured operational big data and here are some of the use cases for that personalization profile management catalog management that's a big one getting that customer 360 view in play in real time that's huge etc a lot of these things that you see on there now I can't forget about graph databases now if your workload that you're thinking about platforming if you would describe it with the words network relationship object even properties you know these are sort of key words to me to say oh we might want to think about a graph database here and I we think of sort of the quintessential one of these is like Twitter with everybody connected etc well that the nodes so-called nodes in the graph don't have to be homogenous they can be heterogeneous like you see here we've got people we got cars we probably got other things graph graph is great for that as well as a matter of fact if you need super high performance even if it's not you know complex web of a billion nodes you may consider a graph database and finally here's your saving grace if you will here's your catch-all for a case you made a mistake otherwise and you put data into the wrong platform or you spread it around maybe a little bit too much you got data virtualization if you if you accept what I'm saying here today that many platforms are viable within your organization you need the capability to do data virtualization because you're going to have those cases where you got some data over there on Hadoop got some data in MDM maybe some in your warehouse you got to bring it all together are you going to physically move that data maybe for the long term but for the short term I know many are turning to data virtualization to get those reports out etc you need this capability now some of you are going to work in some queries into data virtualization to be just sort of normal that's okay too you don't want to overdo it but it is for those edge queries and for some things that you architect into this pattern and for that catch-all because let's face it we can't put all data everywhere just in case we put data in the best place or two sometimes to succeed and that's it so sometimes that's not enough so you've got all your traditional selection vectors for selecting a data platform here's some new ones these are not to you not your table stakes anymore table stakes yeah do that what kind of indexes do you have you know what your SQL etc but robustness of SQL means more than just you know which SQL version do you have there are some newfound capabilities within SQL that makes a lot of sense for you which I don't time to go into but it's but SQL is still very important built-in optimization across the cloud across data virtualization the optimizers have to do a lot more work today we've been keeping them busy and we're going to keep them busy on the fly elasticity we talk about elasticity do you really have it you need that dynamic environment adoption are you going to be able to take on concurrent usage different patterns of usage at the same time eventually a lot of data platforms will go there so that is something to look at separation of compute from storage very important for the cloud so you can scale those two things independently that is great support for diverse data because obviously we have it today JSON XML other forms of unstructured data coming right down the pike Avro etc so my conclusions for you many data platforms are viable today get the platforming right start with a data store type placement and workload architecture that narrows it down right there you're going to have a handful instead of a whole army of possibilities at that point use the data profile as a strong determinant of the correct platform is it unstructured how big is it what's the frequency of that data etc make sure it will perform now and for unspecified requirements so you might want to get some help at looking ahead where we where we probably going to go with this so that I don't want to I don't want to undo what I'm doing right now in three years it's too soon so let's make sure we get into something scalable analytic platforms should be either staging ODS data warehouse data mart or Hadoop now what I mean there is when you look across the spectrum of your analytic platforms you should be able to label each one of them with one of these labels and if you can't it's not playing a proper role because those are the proper roles for platforms in the analytic ecosystem and finally the cloud now offers attractive options with better economics but there's different flavors of it so you want to get into the right one information is the next natural resource our economy is entirely dependent upon it it's just like sunshine it's just like water we need it it's not going anywhere just keeps replenishing we're not going to run out of it but we may be overwhelmed by it if we don't get our data into the right platforms which brings me to Shannon and I think it's time for Q&A William thank you so much for this great keynote appreciated as always if you have questions submit them in the bottom right hand corner in the Q&A section of your screen and to answer the most commonly asked questions we will be sending a follow-up email on Monday to all registrants with a unique login containing links to those slides and the recording of the section of these sessions and so William why is OPEX a pro is it is easier to have CAPEX project approved than an OPEX project did you say is it easier to approve one versus the other it they are saying that it seems to be easier to approve CAPEX versus OPEX so why is OPEX a pro I guess it would depend on the environment in terms of whether you know which one is easier to approve as I get into the financing of a lot of these options I find that many companies they they don't want to deal with the opera capitalizing the expenses they would much rather operationalize them and and this is sort of the the cloud model right you're going to pay as you go and you're going to be able to expense it as you go as opposed to large capital outlays at the beginning and and it being over thresholds that make you put that on your on your finance financials over the course of years some sometimes it's seven sometimes it's nine depends on a lot of different factors but I know all my clients are pushing me to operationalize expenses as much as possible and I guess it's just sort of the mindset we have going forward trying to bring things in line with you know the quarterly idea and so yeah it just makes a lot of sense that way and if you look at software beyond platforms I know we talked about platforms here but if you look at software software is really good in the same way in terms of their their new pricing models so I think it's just something worth that we need to get used to you know and William but this presentation is really you know it talks a lot about medium and big business and not necessarily micro and small business but for micro small data storage usually decided by application do you have comments on that I think that that's a that's a good observation I think that a lot of what I had to say may be overkill for a small company but the principles are probably still true still hold true they just may not have as many platforms ultimately as a mid-size to a larger company they still compete on data they still need to platform data appropriately within their budget etc etc but yeah that's a good observation I mean they're not all going to be you know maybe deploying in memory databases or even deploying Hadoop you know at least not today I'm this this this presentation is about making a selection today this isn't a futuristic kind of presentation we're you know 10 years from now we really don't know what the platforming possibilities are going to be we can come back and have this presentation look it probably look quite different but we're nonetheless we're making decisions for today and small businesses need to do the same thing and have all the same criteria in place whether they know it or not great and you know what about trust of public cloud the control of your data yeah I mean that's that's number number one or two consideration as people think about the public cloud there's the trust factor you know will my data be secure we need data to not only be secure we need to be compliant and one of the things that you get with a private cloud is you get control over that data you know that it's not going to be you know on a whim Alyssa let's store that data over here in a different country which would for example which would obviously big problem be a big problem today so but I think trust in the public crowd cloud is growing in terms of the security aspect I think occasionally the availability of the public cloud takes a black eye but if you look at it in context you've got it you've got to consider the cloud in context with your in-house capabilities and are they truly any better when it comes to even trust even trust is it truly any better and that's those are some things that you have to look at is it truly better performing are you truly more compliant in-house versus in the public cloud sometimes that when you really look at that equation you find the answers no William thank you so much that's perfect it brings us right to the end of this session and but I'm afraid that is all we have time for and thanks so much to you and for sponsoring and thanks to all of our other sponsors who have make today possible we now have a 10-minute break where we encourage you to network with each other as you hear us get the next speaker set up the next session will begin at 3 30 p.m. Eastern time where we will hear Tim Bergman talk about graph theory you need to know William thank you again so much and thanks to our attendees for joining us so far it's been a great event so far thank you William thank you