 Hello and welcome. My name is Shannon Kemp and I'm the Executive Editor for Data Diversity. We'd like to thank you for joining this month's installment of the Data Diversity Webinar Series, The Heart of Data Modeling, moderated by Karen Lopez. Today Karen will be discussing Survey of NoSQL Support in Irwin, ER Studio, and Power Designer. Just a couple of points to get us started. Due to a large number of people that attend these sessions, you will be muted during the webinar. For questions, we will be collecting them via the Q&A section. Or if you'd like to tweet, we encourage you to share highlights or questions via Twitter using hashtag heartdata. As always, we will send a follow-up email within two business days containing links to the recording of this session and additional information requested throughout the webinar. Now let me introduce our speaker for today, Karen Lopez. Karen is the Senior Project Manager and Architect at InfoAdvisors. She has 20-plus years of experience in project and data management on large multi-project programs. Karen specializes in the practical application of data management principles. She's a frequent speaker, blogger, and panelist. Karen is known for her fun and sometimes snarky observations on data and data management. Mostly she just wants everyone to love their data. You can follow her at data chick on Twitter. And with that, I will turn it over to Karen to get us started. Hello and welcome. Hi, Shannon. Thanks so much for that. Are you having a great day today? It's a fabulous day today. Excellent. And you? Thank you. Yeah, running to the airport later. But yeah, thanks. So I want to welcome all of you who joined us live today. That's always a really exciting part of doing these webinars. I wanted to point out that there's, all right, that I do welcome Twitter discussions, discussion in the chat. You guys get to talk to each other, not just to me. And then if you have some formal questions, put them in the Q&A section of the webinar software. So I'll try to keep looking at both of those. But the Q&A is the things that I will actually see. And I have a couple of Q&A breaks scheduled just to remind me to take a look at those things. So normally we would do a poll here now. But if you guys want to just throw in the chat, I did put in an opening question about what brings you to today to hear about NoSQL and how those things are supported in data modeling tools. I'd love to find that out. So get your Q&A questions in now. The chat is how we make a great community. And yes, there will be slides and recordings distributed early next week. So I know people will join in late and ask that question in the Q&A. I think the Q&A just ought to say that. That would be great. So what are we going to talk about today? Why this topic? So given as part of this webinar series and its previous life as big challenges in data modeling, a lot of overview talks about what the basic types of NoSQL databases are, why they're used, how they're used. And so all of those resources are out there. I'm going to include a tiny, meaning a one-slide introduction to each of these. But I expect that coming into this that you either have a search browser in front of you so you could go look up some things or that you've attended one of those webinars. So I'm going to have a little bit of demo light, mostly just showing how the native features in some of these tools support these things. But nothing really heavy. We just only have an hour, so not enough time to go through all NoSQL things and some resources for you. But a disclaimer, these are not product reviews at all. It's to discuss the current state of feature support in the top three data modeling tools. And this should be considered a baseline, potentially with future webinars doing updates. So right now, I'm recording these. I'll be blogging about what feature sets and how some of these things are supported, specifically a specific version of a NoSQL technology in a specific data modeling tool. But I wanted to set the baseline of where we are today. That means if you're also, if you're watching a recording of this later, please know like anything else in the cloud and the Internet, most likely it was out of date before even got to this webinar. So some disclosure things I do need to make. I'm experienced. I've been doing this for a while. That means at some point I've done business with these vendors, either the NoSQL ones or the data modeling tool vendors and my company Influ advisors in the past has participated in partner programs and the data modeling tool vendors. We're not currently a partner with any of them. And I formally and informally have participated in product advisory groups for these vendors. Having said all that, I do do business with them and still have that ongoing, but I have no business that's tied to sales of these products. So I think that modern data architectures will have hybrid technologies. And by hybrid, I mean both relational and non-relational technologies, much like it does right now already. So one of the problems of talking about NoSQL is it really doesn't mean anything. It means not relational, even though people are interpreting now to mean not only SQL. That's only because of its history of starting out being an anti-relational movement. Clearly that people have progressed beyond that. And so now we have relational and non-relational technologies. The non-relational technologies that I would guess every enterprise class shop on the planet has is at least XML, common delimited files, spreadsheets, Microsoft access, and what have I left off? Anything else. But in today's topic, as I'm talking about NoSQL, I'm really talking about the newer technologies that most people fit into a classification system that I'm going to go through. And of course some existing, some legacy systems or NoSQL, think IMS. And I know that still exists in place. Think common delimited files. Think all these sort of non-relational, I even think of AS400 is kind of a weird outlier. You can think of it as relational, but it's not really, it's some hybrid thing. So if we look at a classic data warehouse data architecture, it is typically made up of transactional databases, some ETL and staging and processing that happens, then an enterprise data warehouse that kicked out some data marks or contributed to the data warehouse, and we did lots of reports. A modern data architecture is one where we're going to take certain data stories, certain use cases, and we're going to use something other than the primary relational databases, whether that's Hadoop or a graph database like Neo4j or a key value store or a columnar database. All of these things have a place. And the reason they have a place is a modern data architect or enterprise architect is going to choose the right tool for the job because we want to do cost, benefit, and risk as we look at each of these things. So typically people dizzy up the database world into these types of databases, data stores or stores. Now today I'm going to use the word database, but I know when I speak to a lot of these, the communities where people are really in-depth working in these things that it's not, they find the term database confusing because most people assume database means relational. Data stores is another way of saying it or just stores. I'm going to treat all of those equally. I'll probably say databases a lot just because I'm using those terms generically. So we have relational databases, so things like Oracle SQL Server DB2. We have graph databases, columnar and column family. Now those are in the same blocks on this graph or on this graphic. They are not the same thing, but the confusion over the use of these terms means that I often talk about them together. They are, there's a big difference between sort of how they're physically implemented and what they mean, but to a data modeling tool, the distinction isn't that big as we'll talk about. Key value pair databases. So those are things like React or features in hybrids, and we're going to give you some examples of those, and document databases. So things like MongoDB and anything that's JSON or JSON base, and then all the others. So one of the things I think is funny is Hadoop, it doesn't really classify in any of these likely because it's not a thing, it's a framework of things, and therefore some parts of the Hadoop infrastructure are relationally and some parts are columnar and some parts are, well, we'll just get into that. So some other terminology I want to talk about is while I was looking at data modeling tools and how they supported these technologies, I want to talk about how they have native support. So native support is, you know, the holy grail of having a data modeling tool support, something that you need to build things on. By native support, I mean I use a client, a database client, driver or connection, directly connecting to a database, and that connection and your data modeling tool are feature and version aware of all the differences. So if I use an example of SQL Server, I want a data modeling tool that can connect to my, not only just connect to my SQL Server databases, but connect to it and know that SQL Server 2012 is when they introduce sequences, and if I connect to a SQL Server 2008 database, I don't want my data modeling tool on the physical side to allow me to specify a sequence. So that sort of native direct connection feature and version aware is ideally what a data modeling tool is, what a data modeler who does model-driven development wants to have. And we want to have that so that we can do round trip engineering. And by that, I mean I want to be able to forward engineer and generate those databases or the structures in them. I want to be able to reverse engineer them. I want to be able to point my data modeling tool to say like MongoDB or Neo4j and say reverse engineer this, show me what's in there. Bring back all the properties that are in there. Show me what the security settings are, what the attributes are, what the properties are, everything. And then I want to be able to compare one version of a data model or a database to another one and be able to have the tool be aware that I'm resolving these differences. There's also always, we've had for a long time in all these tools, just an ODBC generic connectivity. And through some of these technologies, you're only able to do that because there's some sort of ODBC JDBC bridge or something like that. ODBC is sort of the last stop on connecting your data modeling tool to a database and getting something useful out of it. It's wonderful for people who need to support a DBMS that their tool doesn't directly support, but it has lots of limitations in that ODBC is really focused on SQL-like structures as well as the fact that you often can't generate an alter script or you can't do the types of compares that you would like to do. So it's really where round-tripped engineering starts to fall apart. And then there are the various import-export ways of getting data about databases into your data modeling tool. The number one way is through some sort of meta-integration bridge, and that's a product. That's a modular feature that is sometimes embedded in the data modeling tool that comes from a third party. And it allows you to bring in other modeling tool models. It allows you to bring in some table structures and connect to some databases for which there is no native support. That is definitely a point-to-point integration, and you are doing just importing. There's no round-trip engineering there. You can force some compares by importing two different models than comparing the models, but there's not a way then to take those deltas and then generate an alter script or something like that. You would just be able to generate a full-blown new schema. There's our various XML hacks. So if you have some way of dumping the structures out of your database into an XML or a JSON format, in theory you could bring those back in into the tools. And then there are the lowest level of things of some individual hacks, I'll call them. So back when we couldn't support certain types of objects in our database because a data modeling tool didn't support it, there used to be this Excel trick where you would export your data model into Excel and then manipulate your table and columns in there just as a list of terms or add some new features to it by using Excel formulas to be sort of a generation engine to generate the proper syntax for DDL. And then you could either reverse engineer that text file, that DDL, that SQL file to do that. And then there are probably ways of doing that with macros or pre- and post-DDL features in a tool. But once we get down to this level, if you're using import and export, that probably is just to do some investigations understanding documentation and not truly doing round-trip engineering. Because I want this data model-driven development. I want to have this round-trip engineering of requirements in a data model and a database or multiple databases and then more requirements and changes and tuning and when's to happen and just to keep going around and around there. The great thing about model-driven development is you don't have to start at the data model. You can start at the database and apply if you have legacy databases or databases that were done by a bunch of developers and now you want to figure out why it's not working. We do all those things. And in order to truly do this engineering-like data model-driven development, you have to be able to connect to the databases, do those compares, and do those alters. So before we start here, I'm just going to check the Q&A. Where do you consider RDF-based data models tools in your diagram? I have not. That would probably be a great discussion for a whole other presentation on supporting all of those types of semantic-based technologies. And I hope to learn more about them soon. Someone else said, I've written data hacks with Python to find out the data models of a NoSQLDB like to find a better way. That's a good reason to come to this. So we're going to start with graph databases. So graph databases, pretty much Neo4j, is the one that a lot of organizations use. And here's some language for creating nodes and relationships in Neo. And then on the right is sort of one of their visualizations of this same data. And if you look at the link at the bottom of this slide, that's where you can actually go hands-on and play with it right in your browser. And you can run those things, and you can insert your own data into the nodes. Now one of the things you'll notice about this data is that even though Matrix is a movie, Keanu is an actor, Keanu acts in... the actor and movie part look a lot like entities we would have in a relational model. But they're really sort of like roles that these occurrences play that have been categorized. Those are not really any of the structure in a graph database or in Neo. And that's one of the hard things that it would make for a data modeling tool to support a graph database. We hear all the time, the data model is the database, and the colary to that is the database is the data model. So while people talk a lot about NoSQL technologies being schemalists, it's not that there's no schema, no understanding, no structure. Those tags and those properties are kind of a structure. What they aren't are constraints. They don't constrain the data. I could put a movie in as a role, and I could put an actor in as a movie. Of course we could do that in the relational world, but the difference is that all the properties I'm going to have on these nodes could be highly variable for each node. So if I just wanted to keep track of what Keanu's hairstyle is this week, I could do that without having to keep track of that for all the other nodes. And there's all kinds of differences between graph and relational. But I think this one is the most difficult one. There's a reason that there's no native support with any of these tools for graph databases. Now I'll throw in, like a lot of these NoSQL things, there are some ODBC, JDBC connectivity bridges, drivers that allow people to do some sort of queries to this data, but I wouldn't consider that anything that your data modeling tool is going to be able to do much with. So then that brings us to key value pair. Key value pairs, just like it sounds, is that to these database systems, you have a key and it has a value. And there's a little bit more, there's some row identifier type things and partitioning keys that go into this. But in a lot of examples we see for a key value pair, they're very simple. And even this one is very simple, so I have a little bit more complex one coming up in a second. But what you can see here is that there's a whole lot of information embedded in that key. So I know that Facebook user 1-2-3-4-5's favorite color, I think this is red, and that Twitter user 6-7-8-9-0's favorite color is brownish, and that Google Plus user 2-4-3-5-6, her favorite libation is dry martini with a twist, and that this LinkedIn user's hero is top sales performer. Now this sort of thing drives relational thinkers crazy because we want to have libation being some type of entity and favorite color, at least being a column, and hero at least being, you know, something. The reason key value pairs are architected this way is they are for getting data out and looking up things by a specific key. I'm oversimplifying that again, but that's what they're built for. So in another example, a key value pair with the concepts of database and table, even though it's not the same as a relational table. What we really have is a partitioning key and a row key, and then the properties can all be stored as a big group of things. In this case, it's a camera store, and the product is the seller of it's this, the price is that, the date of the price, the seller type is online. That would be in the relational world one giant comment limited column, which is something we would never do. The reason we would never do that is that would cause all kinds of query problems and aggregation problems if you had to find the average price of online products in that product class. Oh, my gosh. But that's not what these databases are designed for. They are, again, primarily for doing great things by a key lookup, and in this case, if we were doing a product catalog, this would perform like crazy because we would just pull it out by that row key and we'd be able to publish all those things on a webpage. And because of that, you'll see that these really don't have native support in the data modeling tools. And the reason I think this is is that our data models, meaning in the traditional data modeling world, the ERD world, they might fit best in these processes as being great documentation and great wealth of information for telling our data stories, for explaining, there's a typo there, policies and metadata around them. So a typical thing, we might have in an enterprise data model from our previous examples that products over $250 have a price point over $250 should be kept in secured storage. Now, that's something that's not in those data, not in the data, but we would want to keep track of some of those business rules that way. And we could contribute that to anything that, where this data gets published. Another example might be, we might have in a transactional system about these products, we might have some constraints that we'd put on them for being creative, like let's say that every product must have a price, even if that price is zero. That sort of constraint really doesn't apply in these or would be applied in the application and not in the database. That would be a business rule that we would need to give an application that is using this data. The next type, columnar data storage. And remember, I'm going to be talking about both column store, so true columnar data, as well as column family ones during this section, even though they're slightly different. Columnar data storage basically means that if you look on the left, we have rows and columns, so in a traditional way we think of tables. And what we end up doing is taking all the columns and storing them together, while encoding and compressing them, and I have an example coming up with that, so that then we can make, we can query them very fast. And the reason we can, so if we had a table that had just first name, last name, some phone information, and some address information, this is typically how we do it in the relational world, we'd split those into columns, so now we can see all the first names would be stored together, all the last names, the area codes, all the states. And you start to see that, for instance, states, we're going to have a lot of repeating information, even cities and street types and names. And that allows us to encode and compress. We get high degrees of compression. We use a columnar approach to these things. And we get it both through doing either dictionary substitutions for the whole term or compressing repeating values. That also means that one of the features of a columnar data store is that you end up, not just doing this, you end up making chunks of columns together and storing them because they are queried together. So you actually, one of the features of doing design for a columnar database is to look at the type of query workloads you're going to have against that data and physically persist only the columns that are used in that query as well as persisting them together as a group because they're in the same query. So that's a whole other presentation I could give. But what you need to know is instead of storing things by row, we're going to store them by column and that's going to give us lightning, lightning, fast query retrievals. Now, one of the great things about columnar and column family is because, let's say they're not row oriented, they're column oriented, but just the fact that they're aware of columns being used together and because they have columns, they have SQL like query tools. So for instance, even though Irwin and ER studio have no native support, that direct connection and being able to compare to the database, I was able to reverse engineer HP Vertica columnar data store using their SQL like query using a SQL query language to do that. And I did that via the ODBC connections in the data modeling tools. Now, when I first did this about a year ago, ER studio failed on that. It would do the full reverse engineering, but then it would the reverse engineering would not bring me up a model because it wasn't valid in ER studio's understanding of the SQL that was in it. So it basically disregarded anything that didn't meet its validation test, but Irwin was able to do that. Well, then since then, since I gave a presentation on this just a few months ago, ER studio is now able to reverse and just connect to HP Vertica via ODBC drivers and reverse engineer those, I'm going to call them tables, but column structures that are in there. Now, power designer because it's made by SAP shockingly enough supports SAP HANA, which is an in memory columnar data store, and that should not surprise you at all. Also, the meta integration bridge is going to help with import and export features even if ODBC doesn't work. That brings us to document databases. So document databases typically are based today on JSON or JSON. So JSON is what I call the hipster XML, totally unfair, but definitely a text based with tags and brackets, and JSON is just a binary form of JSON. So here's an example of two documents. They're based on families, and one of the things you'll see is that for the Anderson family, they have a last name and parents who have first names and children, which is a repeating group. They can have many of them, and they have the children have pets who have names, and they have an address, and they have some registration thing. The Wakefield family on the same thing have parents, they have children, and they have pets. But you'll notice that some of the names of the tags can be different. So unlike in a row-oriented database in a relational world where every instance of an entity must have exactly the same properties as every other instance, in a document database that can vary, the structure varies, the many to many-ness of things can vary, and the names of what we call the data, the tags, it can also vary. They can imagine, so there's arrays and nesting and recursion, there's a way of linking like foreign keys in a document database. All of that together, you can imagine it must be really difficult to support that. Well, right now, Erwin has no native support for document DBs. ERStudio has support in its most recent version for MongoDB, and Power Designer doesn't have any support for those things. I think we'll see this changing not just for the support of MongoDB, but because of just how prevalent JSON documents are. So even if modeling tools aren't going to support, you know, a full-blown MongoDB document database, I think we'll start seeing them supporting just like it took data modeling tools some time to start supporting bringing in and producing XML documents which are really kind of the same thing, not exactly, but from a trying to support them in a modeling tool, definitely XML and JSON are some of the same things. So with ERStudio, I'm able to do a native connection to a MongoDB, so on the right here we have just in that post-it note looking graphic a sample restaurant JSON data file that I imported into a MongoDB database, and ERStudio came up with this data model. Now it's able to do recursions to look through, but it's important that everyone be clear on what this model represents, because remember how there's no, in the XML world, you know, we have a schema document that says here's what our XML structure is, there's not that in this restaurant sample data. So it has derived this data model by looking at all the instances of the data which may or may not have the same properties in every single one. So you would use this model to understand what the database, what the document store is telling you about the data it has in it, and it's having to concatenate or take the cumulative view of all those things. So then there's Hive. Hive is a SQL query language that sits on top of Hadoop features, and in this case it started as an abstraction on top of MapReduce, so that MapReduce that sits on top of the Hadoop distributed file store, and it's a meta structure on top of that. It looks a lot like SQL, and actually it was invented to add SQL to allow people who had strong SQL query skills, actual in the SQL language, to do queries on top of Hadoop structures. Well, since Hive is very SQL-ish, it should be no surprise to you that Erwin, ER Studio, and Power Designer support Hive. So ER Studio takes the approaches I'll show you in a bit, the various flavors, which means distributions of Hive. ER Studio uses a generic Hive.1 standard as does Power Designer. So it just makes sense that these tools support Hive. So here's an example of basically what I did here is on the left you can see a snippet of a Hive table that I created, external table, and this is just the SQL for it, and this is based on some IRS data, and I was able with Power Designer and all the tools to either import that in or reverse engineer that into the tools, but they also have that native connectivity to do that. I was also able to compare it. I was able to make a change to it and compare it to the original Hive table, and I was able to import changes that happened in the external Hive table as well into my model. I'm going to look at questions again. I have heard from informed sources that Erwin is moving to support no-SQL databases in the near future. I don't have a lot of inside information, but from the discussions I've had, informal ones, as well as some presentations I've been to, I think that all the modeling tool vendors are doing what they can to support what they can. One of the both amazing things about the no-SQL world and also the biggest pain point for modelers and modeling tool vendors is that unlike in the world 20 years ago where you pretty much had to be part of either some very large global technology company or be people who split off from a very large technology company and formed your own company, that's what it took to build an enterprise class database system, but now we're entering this perfect storm of the open source business model actually working with more formal structures for venture capital investments with the advent of online collaboration tools and communities with sort of a generational shift on building things for the common good. All of those things have come together and now there are literally thousands of database systems that one could look at and one could support. That's wonderful from an innovation point of view. I think if someone wanted to build commercial products out of these no-SQL underlying data models, if they wanted to do that 20 years ago, we would not be sitting here looking at thousands of them. We'd probably be looking at five. The reason is if some of these no-SQL things go back well beyond the advent of the web and the internet, it's just it took all that to come together for the business model to work for all of these. I think what the data modeling tool vendors are working with now is trying to figure out which ones to spend the time to support, which ones make the most sense for the current underlying architecture of their tools, as well as which ones are going to be the fastest to do. I think the ones that either already have relational features or someone has built a relational query language on top of it will be first. What about data where health appliance tools like natisa? Yes. So there's all these flavors of not core relational tools that have been around natisa's one. You know, they might be quasi-relational, teradata. I mentioned AS 400. There's all of these flavors of things that I really would have liked to have done the research and investigated today. But as it stands, just putting this high-level survey together, and it's going to be ongoing work for me, is just trying to do this. The other issue is it's hard for your average data chick to get hands-on with some of these appliance technologies to do this sort of testing. So a lot of the testing and research I do is either because I've downloaded and installed something on a VM on my machine or I've been able to spend something up in Microsoft Azure to play with it and connect with it. But all of that takes, you know, a lot of time and sometimes a lot of dollars to do those. So I'd love to go do some more investigation into the appliance products, even the ones from Oracle and from Microsoft, but that's a very difficult thing to get access to. But great question. So let's look at some tools. So I did some talking about these things and did I just get dumped out of the, I think I did. No, you're here. Okay, good. So let's see. I have the sample model open in Irwin just because it's nice to show that. So these are the native connections in Irwin. So the DB2, Teradata, MySQL, MySQL, ODBC generic in Irwin, that's going to be your friend. And we're not really seeing a lot of diversity there right now. The key for Irwin is going to be the Meta integration bridge. And you can see the first one at the top is Hadoop Hive. And you'll see all kinds of modeling tool things here. There's more Hive with Cloudera, Impala, all through Hive, data stacks through Hive. You know, these are sort of the edges of NoSQL. The one that stands out there is Google BigQuery, which is a sort of column-based NoSQL option in the cloud. There's more Hadoop stuff, more data modeling tools, SQL server, Oracle stuff, some SAP stuff, but not HANA or modeling tools. So basically what you're getting out of this extension to Irwin is the ability that import-export thing that I said was probably going to help you a bit, but not going to support a lot of the round-trip engineering. So if we go back to Power Designer, the DBMS that I can choose. So database, raise your hand if you're that experienced. There's some AS 400 and green plum and Hadoop Hive and some regular stuff. There's your Nathisa stuff, ODBC, going to be your friend, and HANA. So and then some of these other things are also Teradata and others are NoSQL-ish. If I go up to, yeah, so that's the same thing. So one of the other features about Power Designer is that the other tools don't have is the ability to go in and through their extensions and their ability to add properties and features to basically build your own underlying meta-models. So you can build your own or enhance your own. That gives you a lot of extensibility perhaps for supporting some of the more SQL-like NoSQL features. So you might be ahead of the game there. I know Irwin has the ability to change the forward engineering templates to do something similar than that and there are those features as well. And then if we go to ER studio, there's my ER studio Mongo database that I got that graphic from. And basically, you know, it's going to have the same, almost the same things as the, because it's also using the meta-integration bridge. Now individual products get to choose out of the full-blown meta-integration product which of the features, which of the things are going to show in this dropdown. But, you know, we're basically seeing the same sort of things of hives and some modeling tool things. More hives again. And also the Google BigQuery, it's almost exactly the same which would make sense. Lots of SQL server things and Oracle and SAP. So while the import-export features of these tools is not going to give you easily that round-trip engineering, I think it's a good start for at least a good data modeler to be able to point their existing tools at some of the databases that are being used on the projects that people think they can't be part of, the modelers can't be part of because we only do ERDs. You can see right here. I have a JSON document, you know, basically through a MongoDB that I can report on, that I can do complete compares on, that I can generate. That all to me, very exciting stuff. So we looked at the three tools. I did talk about the doing of hacks through macros, through the APIs, through extending whatever is available in your tool sets, whether that's the forward engineering templates or being able to actually customize your own metamodel under there. All of those are wonderful things if you're trying to support a technology for which your tool doesn't natively support, but what's wonderful about where we're going in the modern data architecture world is eventually we will start adding modern to support to these. Now the other option for a professional data modeler is to find out if there's another modeling tool that comes from the open source community for that product that will allow you to do modeling. That's nice. A lot of the issues that I run into with those is that the definition of data modeling there is really reverse engineering diagrams. Still very valuable, very valuable to have a diagram of something in your database. The key is that that of course is not what we think of as data modeling. You can't add metadata. You can't add here are the privacy requirements about this or here are the restrictions on this or this is why we won't find customers older than or from this state that exists before 1984. Well, it's because we bought the rights to those customers. We bought a company in that state. All of that rich metadata and understanding of the data stories can't get documented or written or shared in something that's just a diagramming tool. I'm going to check for more questions. Please comment on storing metadata for NoSQL. I think I just did. That was good timing. You're correct. There are so many data store choices today. It's overwhelming. Yeah, so the story I told, I gave a related talk in the Midwest last week. This great thing about having 1,000 databases, what I call 100,000 vendors to work with now, because of course Hadoop isn't one thing. It's a framework of things. But each of those framework of things have many vendors distributing versions of them. This is what's so excited. So back when I had to support Oracle DB2 and SQL server, I thought, you know, I was so frustrated because I couldn't keep track of for those three things on Sybase, those three and a half things, I couldn't keep track of what the name of the feature was in each of the tools and which version supported which features and what name lengths could be. That was so hard. And then that was only a handful of tools. And now we've got these thousands of tools with 100,000 vendors because every open source person is actually another vendor, if you think about it, that those just boggles my mind. So now it's not that I can't keep track of which features and which tool. I can't even keep track of what the differences are for the tool. So in my no-SQL naming rant, which is recorded and available on dataversity.net, I make fun of all the names. And the reason is we've got so many products and so many frameworks that we have to come up with increasingly odd names. So I cannot remember what's the difference between spark and shark and snark, I kind of do. But we've run out of names and therefore I don't even know how to keep track of what features are in each of these. Next question, what about ETL tools as modeling tools to generate and convert database structures? You know that's kind of next on my list is I want to see how tools support things like data lineage and like the underlying models in say the Microsoft stack, like the tabular models and the cubes and all of the data models that are buried in those tools. But you know, SAFRA is preparing to modify their tool for HANA because SAFRA provides the bridge between modeling tools and SAP and other ERPs. I don't know that. That's a great question. And maybe someday we can talk about... I find SAFRA is a really unique tool set and sits in a unique space. It would be interesting to do a webinar about that as well. And from the chat, I'm not familiar with any of that tool, but thanks for sharing that link there. And someone saying, I've been told that developers in my company will soon need to learn how to model for no sequel, MongoDB, just being one, Cassandra and possibly VoltDB. Any recommendation on how to? Well, that's going to be our next section. So that might be a good segue. So to summarize what we've talked about so far, the more sequel like features that are available in a no sequel database, the more likely modeling tool is going to support it. That just makes sense. It's a great fit. It's almost for a data modeling tool vendor. It's just like supporting a different relational database almost. Those vendors are going to support the features. I used to say that users ask for. They're going to support the features that win them deals, but that's not a bad thing. Like I said, 100,000 vendors to work with, 1,000 different databases to pursue. Let's have the enterprise market, because let's face it. Three people sitting down to build the next big Facebook probably aren't buying enterprise class data modeling tools. Even the most popular no sequel databases that are free and open source aren't all supported by all the data modeling tool vendors because people who don't want to pay the fees for enterprise class databases generally don't want to pay the licensing costs for enterprise class data modeling tools to support those. So you need to ask your tool vendors, hey, we've got MongoDB. Do you support that? Hey, we're starting a Cassandra project. How does your tool support that? Those are questions you should be asking, not just tech support, but sales person from your tool vendor. But all in all, they're going to support the features that help them win sales deals. This is how the commercial enterprise software market works and we need to be part of that. The serious no sequel vendors are going to understand that hybrid is the enterprise data storage by hybrid. Remember I mean sequel and no sequel databases. They want to help us find a way, a way to continue to love all the data in our enterprise, not just the ones that they support now. Notice I put a little star, serious no sequel vendors. There are still some crazy vendors out there in the no sequel world. And remember I'm using the term vendors to mean even the open source people. So developers, but not just technical developers, the whole thing. There are still some people preaching that relational is dead, it needs to go away, it's always failed. They probably will not be selling to the enterprise class market and therefore probably won't be of an issue to enterprise class data modeling tools. Our data models continue to have value and even the physical ones, even your db2 model, your sequel server model still has value even if your no sequel solution that you're deploying doesn't require a lot of constraints. One of the reasons why the no sequel tools can give up consistency and integrity and not worry about all those constraining constraints is that they are solutions for doing consumption of data. They're on the analytics side, they're on the reporting side, they're optimized for reading data, not writing data. That's perfect. That's the same story we have in our data warehouse world, optimized for reading data. We still use transactional systems and relational systems, relational still has a wonderful place in enterprise. Our models have value because the people consuming that data still need to understand some of the tougher questions about the data they're hosting. Maybe not the people managing that no sequel database but the people who have to make use of it. Just because the data has come out of our transactional systems gone through a process and is now sitting in Hadoop on some file store, our users and our reporting and our analytical systems are still going to need to understand what that data is, what its provenance is, how it can be used. So some places you can go to start having these discussions. Embarcadero has community. Embarcadero.com, CA has Irwin.com for asking these questions and having these discussions. There's also a knowledge base. SAP has a community for power designer for having these discussions. There's also still the old news groups around if you're very experienced like me and like to play there. I recommend Dan and Ann's book on making sense of no SQL because it focuses on all these differences and it's very approachable and consumable for these. If you want to know about graph databases, the folks from Neo4j have written this O'Reilly book and you can go download the ebook version for free and this second edition just came out and I find this really helpful. Steve Hoberman who is very active in traditional data modeling world as an author, speaker and publisher has written data modeling for MongoDB so I recommend you approach that. There's also still my favorite is to get some hands-on experience with a variety of these is seven databases in seven weeks which covers these DBMSs that both no SQL and SQL. And I've even based a presentation that I do sometimes with a co-presenter Laura Rebelke called seven databases in 70 minutes where we don't cover these or some of them we cover but we try to talk about all of those things in 70 minutes. Also coming up at no SQL now, Dataversy who hosts this webinar also hosts no SQL now, I'll be presenting on graph and relational databases along with my friend Joey and we're also presenting on DevOps in a polyglot world and for some reason my picture is not part of this so it eventually will be consistent. The reason I want you to go learn these things is now that we have so those thousand databases to choose from I think data modelers and data architects need to be involved even earlier in the project so we've always been asking about being involved early but now it's very important because these technologies are being chosen sometimes without an understanding of why we might want some constraints or why our data workflow may or may not fit. So it's good to have a data professional be part of those architectural infrastructure decisions because every design decision should include cost, benefit and risk. So seven reasons to go explore, I think it's fun. As much as I love my ARDs and my modeling I've modeled enough invoice line items over my life to never really want to do that again so seeing how people have optimized new technology for really making things perform fast is fun and these aren't yes, no decisions, we need to know them all. And unlike before where if I wanted to learn db2 20 years ago I had to have a main frame and I had to have access to it. Now I can just go spin up a VM in about 15 minutes in Azure or Amazon and be hands on playing with it with all kinds of tutorials out there. I think data professional needs to know more than one tool and I don't just mean data modeling tool, I mean database. And using the right tool for the job is important and it's also a lot of fun. So where I go is Azure and if your shop has MSDN subscription benefits there are trials, same things true. There are free inexpensive or sometimes free trials for Google or Amazon. I'm just most familiar with Microsoft Azure. And that's all I had for here. Let's see the last little bit of questions. Yeah. So I know SQL solutions still require some schema structure that needs to be documented absolutely. There's no such thing as schema less. What we just have is multi-schemas a lot of times so poly schematic is the invented word I like to use but schema less is just a meaningless term to go along with no SQL which really has no meaning either. It's either schema on read or the fact that someone still needs to write down what that data was. Excellent book recommendations. Yes. The no SQL book is by Dan McCurry and Ann Kelly. My picture isn't there because it's compressed. That's great. Yeah. Yes. Learning all these things is fun. So I think we're right at the time. Oh yeah, one minute over. Anyway. So what we do is we turn off the recording now but I can stick around for a few more minutes before I head to the airport. Let me turn it off for you here. I'm going to turn it off for you here and.