 All righty, well let's get started here. Hello and welcome. My name is Shannon Kemp, and I'm the executive editor for DataVercity, and we're broadcasting live today from the NoSQL Now conference in Expo. We would like to thank you for joining today's DataVercity webinar, Big Challenges in Data Modeling, Modeling on Structure Data, Schema Design, sponsored today by CouchBase, CA Geology, some filmmakers of Irwin, and Sandhill Consulting. And as the series is moderated by our Steampant moderator, Karen Lopez. So just a couple of points to get us started. There's a large number of people that attend these sessions. You will be muted during the webinar. There are questions. We will be collecting them via the Q&A section in the bottom right-hand corner of your screen. Or if you'd like to tweet, we encourage you to share our highlights or questions via Twitter using hashtag BCDModeling, Big Challenges in Data Modeling. As always, we will send a follow-up email within two business days containing links to the recording of this session and any additional information requested throughout the webinar. And then please to introduce to the moderator for this webinar series, Karen Lopez. Karen is a senior project manager and architect at InfoAdvisor. She specializes in the practical application of data management principles. Karen is a frequent speaker, blogger, and supports us on the professional data issues. She is a Microsoft SQL Server MVP specializing in data modeling and database design. She's an advisor to the DAMA International Board and a member of the advisory board of Zachman International. And Karen wants you to love your data. And joining Karen this morning are our three SDS panelists. First, I have Dipti Bokhar, the Director of Product Management at CouchBase, where she is responsible for the company's flagship product, CouchBase Service, and works with customers and users to understand emerging requirements for low-latency scalable data stores. Dipti has been a technical experience in the database industry having worked at IBM as a software engineer and development manager for the DB2 server team and then at MarkLogic as a senior product manager. Over the next month is Alex Peek. Alex is passionate about extracting value from data and has been for over 20 years. In the early days, it was with startups, several of them that he has owned. He has also worked with larger companies like Iron Mountain, PayPal, and Intuit. He has designed and implemented systems for managing data and analyzing it. Then the relational world, he has worked with OTP systems like Oracle and BoltDB. And for data warehouse systems, I'm getting tight here. Vertica. And the non-relational world, he has also worked with to do ecosystems, no SQL databases like assigned to HBase and MongoDB and streaming solutions. In the non-relational world, he has built systems for marketing segmentation and used machine learning methodologies with tools like R and MATLAB. And I'm certainly not least pleased to welcome Hamaslyn Hayes. Ham is a senior consultant for SandHero Consultants. Over time, Ham has led much of the evolution of the CEI Erwin product suite and its support education courses. He has provided his extensive expertise in information, process, and enterprise modeling to numerous major North American corporations and government agencies. Ham has authored articles and delivered presentations to industry groups on enterprise modeling and its role in improving performance. And the focus of his consulting and teaching has helped enterprises bridge the space between technical modeling and business success. He has also researched and modeling using Erwin products to model nonlinear social reactions. So please welcome Karen and our panelists. And with that, I will turn it over to Karen to get us started. Hello, everyone. Welcome. Thanks, Shannon. It's so great to be sitting across from you in the next few weeks. And we still have two more on the phone. And I wanted to also warn my panelists on the phone that we might just sit here in the room and totally forget about you. So you might have to edge in to get your comments. I also wanted to thank CouchBase as a new sponsor for this panel. That makes me really happy. And both CA Technologies and Sandhill Consultants as previous and existing sponsors. It's really important that we have sponsorship from the vendor community for our community of data professionals. And one of the things it says to me is that these vendors are interested in making sure that you have access to people who have great opinions and expertise in these areas. And the way they show the fact that they care about you is by sponsoring these things. We couldn't do it without you. And of course, Tony and Shannon are fine friends at Dataversity because they make this happen. But I also wanted to thank you, the audience, because I consider you also panelists on this because you have a chat available to you where you can also give your insights and opinions and try to watch those and multitask and do all this. I'm sorry, multi-thread and do all this. And if you have a formal question for the panel, please do post it in the Q&A and don't wait for the end. We're going to try to grab some of those questions as we go. And also your comments for those things. As Shannon mentioned, you and I are on Twitter and we'll try to watch Twitter if you use that BCD modeling hashtag to see what's going on there. That's also how you, the audience, help be a panelist by sharing the insights you're hearing here with the rest of the Twitterverse. The next topic is on sort of NoSQL, schema-less design, modeling for unstructured data. When does it happen? Does it happen? And one of the reasons why we have this topic this week is because we're here in San Jose at the NoSQL Now conference. And this is the last day of the conference so I've been absorbing a lot of stuff and updating my questions as I go. In fact, there's a session going on right now that I'd like to be in on modeling for our database. And so there's a lot of modeling chat going on this year as opposed to previous years. But what I think is important is most of us in the audience I'm guessing that you are data architects or data professionals. But you want a DB2 data architect or a SQL server database designer. Not really. If you're a data professional and those platforms change, I think it's important that we as data people have a good understanding about all the places where data goes to sleep at night. The technical term is persistence, but I like to think that data is data no matter where it is and that we have a professional responsibility to understand not just how to persist data in traditional relational technologies, but to understand how it's persisted in these non-relational ones. Also that we also have a great set of domain knowledge that we can lend to data as it's being consumed. So most of the schema lists are no-SQL design platforms, those models of persisting data. There's not a lot of modeling done up front, although we're going to talk about that. But understanding what data is, what makes for a valid instance of that data, what makes for good data, what makes for probably wrong data, what makes for valid ways of using data together, that's something that I think data architects have a lot to contribute to. It's just that we're used to providing that expertise as the database is being built instead of as the data is being used, consumed, shared, and analyzed. I also think that if we continue to ignore our no-SQL or non-relational platforms, we become less and less relevant to our companies and to our company's customers. I also think there's a lot of myths and mysteries and questions about the roles of data architects and database designers in no-SQL projects, so I want to make sure we talk about those. So having said that, let me jump over to the questions. So which of my panelists would like to, in a sound bite, explain what we mean by no-SQL? Thank you. I'll take a short of that, Karen. Coming from the relational space, moving from Davy to the no-SQL itself, I've learned a lot along the way. We look at no-SQL. It is the next generation of databases that allows you to store data in not rows and columns, as I would put it, something that does not have to live in a relational format, but that can be a document. It could be stored in a slightly different format, maybe a column family, or maybe even in a graphical structure. And so no-SQL, in general, means many things to many people, but the way we really think of it is it is the next generation database for semi-structured, unstructured, poly-structured data that really doesn't fit well in rows and columns. Is that you who wanted to also say something? Yeah, I was going to sort of give a slightly different viewpoint. So I agree with what Dipty has said, but I'd also say that the emergence of a lot of these no-SQL databases has beaten around simplifying the model of the relational database in order to gain different optimizations. So in many of the no-SQL databases, there are design decisions taken that make certain aspects of them more optimal for particular use cases, but in doing so, they give up the generality of the traditional relational database. So that's sort of not in contrast to what Dipty said, but in addition, it's really about taking a different set of trade-offs, giving up the generality for concepts like availability, performance, reliability, so on. I'm going to contribute, too. This is Ham. I think I almost see it as a philosophical level as well that when dealing with well-structured data design using relational principles for several decades now, it's quite a mature area. But we've also seen with the end of the internet its tremendous growth in both the quantity of data as well as the accessibility, as Alex pointed out. One of the properties we're now dealing with is that we really have to take a fresh look at how we design and manage our data systems. No SQL to me doesn't mean no structure. It means that we have to account for new requirements in our overall data management. And it's evolving. I think we're people on the path of breaking through some new thought forms and some new approaches. It's quite exciting. Jump right away and say that in one of the contentious issues, it's just a term, no SQL in general, that it originally came out as really sort of an attack on relational databases. It was no SQL. We don't use SQL anymore. And there are some really valid reasons why a solution should be done in a relational database and very valid ones why it just makes sense to put those into storage data in other platforms beyond relational. And for someone like me who's been doing relational, building relational systems for decades now, that just means I'm experienced, not old, is that it's hard for me to think outside of the SQL box because that's where I live every day. But I don't definitely see all these valid uses, but I'm still hearing, I come to not just this conference, but other conferences. And I'm still hearing SQL versus no SQL thing tossed around as it's an either or and that SQL is broken and I'll hear things not really from the speakers or the experts, but sort of a myth around the development community or the engineering community, which is SQL doesn't work. So one of the things I've been tweeting at this conference is sort of these throwaway facts that, you know, once a SQL database hits 50 gigs, it just falls down and can't do something. And we know that's not true. We have lots of valid implementations of very large multi-parabyte systems, even petabyte systems implemented where the data is persisted in traditional RDBMSs. But what's the best way for a data architect to participate in the conversation about what's the right tool to be used, which is the new definition of no SQL, not only SQL? You know, what's the best way for a solutions architect, an enterprise architect, and even a data architect to participate in that discussion? Now, the underlying elements are just sort of to understand the strengths of each and to be able to explain the strengths of each. You know, the relational database, you know, still has a good place in the world. And, you know, I worked at PayPal eBay for a while and that's one of the largest teradata installations in the world. Many, many petabytes, you know, double digit petabytes. So there's nothing about relational that says it can't scale. So, you know, size is not the issue. There are a number of issues which do differentiate. And as long as, you know, the data architect can explain those differences, then you see that it's quite a harmonious world of, you know, we've just added more tools to our toolkit. I mean, as I mentioned earlier, by using, you know, just pick an example like Cassandra because we use that here into it fairly extensively. You know, using a database like Cassandra, you get, you know, great economics for performing certain kinds of tasks. So, you know, we can get very large data, very low latency, high availability, you know, great performance as long as the task fits what Cassandra is good at. And, you know, if you step outside of that, then suddenly, you know, it's not magical anymore. So in explaining, you know, one of the world's all coexist, and, you know, all we've done is add more tools to the toolkit. It's just that understanding and the strengths of each. Yeah, I'd validate that with the understanding what the requirements are for your particular environment and it's just not an either or in our minds, it is, as Alex is saying, it's a set of tools to tackle an expanding set of requirements with regard to data design and data management. So it is something that I think we do embrace as let's look at the technology, let's look at the architecture, maybe that's an oxymoron, maybe the architecture of NoSQL. I agree. I think there is architecture to NoSQL. So let's look at that and just embrace it and look at it against the requirements of dealing with as architects in our particular environment. So it's a real environment. Go ahead. As I mentioned, you know, the same companies that they have these large installations of relational databases, particularly the data warehouse inside, the same companies are looking at NoSQL databases. They're the same companies that we are talking to for the generation applications. And so the one thing we see is, we see NoSQL as something that fits on the side, right? It probably sits next to a relational database on the OLTP side. We see a lot more interactive applications or OLTP-like applications that are using NoSQL databases like Couchbase and MongoDB. There's obviously Hadoop on the analytics side, but there are obviously relational databases that will continue to be used for certain use cases. And so I think it's extremely use case-driven. We see some migrations in cases of interactive web-scale applications where if you want to reach out to a user base of 10 million users or 100 million users, perhaps the performance and the scale that you need may not be something that your relational database can provide. And that's one kind of use case. The second use case is completely different applications. Applications with data that's being actually managed is semi-structured. It's object-oriented. And so when you are building a data aggregation platform, for example, for social reasons, let's say you're collecting Twitter feeds, Facebook feeds, LinkedIn information, all this is so varied and diverse in terms of the data model. You need a system that is flexible and that can handle it. And that's actually where data modeling fits into NoSQL. Some people might think that if it's, you know, the key model, if it's semi-structured, you don't really need to worry about data modeling. But data modeling is actually a very important aspect of NoSQL. And the first step that we advise people and developers to think about is actually how do you design your documents, what are the kind of objects that you would create and represent these documents with? And then where do you go from there? So what's the next step from document modeling to staging and then deployment? So I'm glad that you made that segue there because that's also one of our questions. So in most of us when we talk, when we think about data modeling and logical and physical data modeling, we're thinking ERD, traditional modeling tools, and some of the questions that come up is are there going to be sufficient notations and tools for doing the data modeling I'm talking about is not what's the profile structure of the disk, which is what traditional data modeling gets closer to by designing tables and columns and rows and indexes and all that stuff. So where does still modeling, where does it fit in a no SQL project? I'd like to take that one if I could because I've given a few talks on the topic of modeling in the no SQL world and I use examples like Cassandra, the column family idea and document databases like Mongo and Couch base. And I think the fundamental difference between modeling in the relational world and modeling in the no SQL world at least the subset mention is that in the relational world you are modeling the data. And with the idea that I don't a priori know all the uses that I need to make of that data, I'm modeling in a way that's application agnostic. I can use it for new applications at a later time. In the no SQL world, the modeling is basically very application specific and what you're really doing to all intents and purposes is modeling the answer to a query. So for example, as you're modeling in the Cassandra world, what you have to start with is what questions are going to be asked to arrange your data such that it answers the question, which is rather antithetical to the way that it's done in the relational world. So one of the fundamental differences is no SQL databases are typically application specific and the whole concept of the relational database, you know, this is sort of common dates, great ideas, we're all about, I'm agnostic, I can use the data in ways I hadn't previously anticipated. It's a slightly different perspective on that. I think it is application specific in that you need to think about what are the objects, the entities you're modeling. So Karen, you mentioned it's further away from the database structure on disk. I completely agree with that. We look at it as a more logical modeling exercise where it's closer to the entities or the objects you're trying to represent. And then based on that, you create structures or document just rough schemas for those documents, what they might look like. And of course, this doesn't have to be fixed because every document could look different. You have different attributes. That's obviously the power of no SQL. But that's where you would start off. And so it's, in some sense, we think of it, it's closer to an object mapping kind of an exercise where it is about completely abstracting the underlying database or the persistence here with what the application developer or the data architect is building and just thinking about it logically in terms of entities, representing those in terms of document databases and then data change over time where those documents might evolve. That's where we need more tools. And where the existing tools that we have need to be either extended or new tools need to be built to be able to see how these schemas are evolving over time. The big issue, though, is that when you design schemas in the no SQL world, you have to design them around your access patterns because you don't have the concept of joins. So you have to pre-join. And so some new join requirement means create some new table. And there's a similar issue with aggregations. You typically have to pre-aggregate because you don't have effective aggregation mechanisms in no SQL databases either. So really what you have to do is start from what are my queries and model the answers to the queries in the no SQL world. So it's not an arbitrary model what data I want to put in. You've got to think about how am I going to ask for the data out and that will drive the shape of the model far more than what I want to store. And in fact, one of the things that you notice with no SQL databases is that you will think because of query patterns, you're going to have to have duplicate copies of data because of the different patterns by which you query them. And it's typically up to you, the user, to do that duplication. The database is not going to do that for you. It's ironic that we really have a multiverse here in terms of no SQL, which architecture do we really have? There's multiple architectures involved. We're also dealing with a finding problem. You know, it's easy to take data and to file it away and we can file it away in any way with a variety of schema structures. But what Alex is really pointing out is how do we find the relevant information that we need when it comes to the time of actually using what we've stored away. And I think that's a very, very large challenge for a kind of schema. This is more architectural than it is implementation. I think no SQL derived from the need to handle the larger quantities and volume of information and the unstructured aspects of information. Now it's coming back to, we still have to be able to define that structure, that architecture, and gee, we just have queries now built into our architecture rather than basically coming about as a result of a relational design. So I agree with what everyone has said here. And so this reminds me as a modeler of really more how we data model for data warehouse and business intelligence systems, where we've also taken similar changes to how we persist data. So in a data warehouse design, we aggregate data, we optimize the design of the relational database to maximize performance, and we don't know all the questions that are going to be asked, but we try to guess them, or at least support the ones we know we're going to get. So it reminds me of more of that type of modeling. So I'm wondering if a data warehouse dimensional modeler might be happier modeling for these no SQL databases than say someone in my cell who tends to stride in transactional processing data integrity constraints, I don't want any flexibility. Not entirely true. I think it's somewhat worse though. When you look at a star schema and you see your fat tables and your dimensions on the sides, you would think that and that's built that was designed for a data warehouse, that kind of schema. In a similar sense, the document modeling or the column family modeling that you do for different types of no SQL databases is similar, so you think about what are the objects, what are the pieces of data that really fit together, so you can think of it as being normalized and data warehouses normalized in some sense. And I think that a data warehouse architect might find it easier to conceptualize this compared with an OLTP architect where you're just really just looking at tables, rows, columns, orders, customers, users where you normalize as much as you can. So in that I would agree. I think it's a better way or closer way to think of it. And in some sense, given the size of data for that no SQL database store, you could think of them as data warehouses because many of the use cases that customers are building are the aggregation platforms. In some sense, it's a lighter weight version of a data warehouse. Of course, you have Hadoop and the big data systems on the side that do a lot more number crunching and batch processing. The OLTP server, OLAP side, we actually kind of support both these use cases who are starting to see a lot more larger data sets than your typical OLTP system would. I think the difference is a greater than perhaps we might first think. So if you think about a typical star schema, you're still in the realms of second normal form and you're still in the realms of joining data. You're joining the dimensions to the fact table and you're still in the realm of when I update a dimension I have relatively few rows to update. Whereas if you try and do that in, say, a column family or document database, you have to pre-join because you don't have the concept of joins and you have to duplicate all the data. So every row has to contain the related value of every dimension. So there's huge duplication. So I don't really think that NoSQL databases are very good at all at data warehousing and you can even go and look at all the companies that have huge data and you'll find that what they use for data warehousing are more traditional relational star schema kinds of data warehouses. But I think there's a real sweet spot for NoSQL databases but I do not think that data warehousing is it at all and I don't think their models are similar at all to star schemas. The other aspect of NoSQL is an issue in my mind has to do with the maintainability of the data with the extensive amount of duplication. There really is the ability to, in fact, maintain any necessary changes in the data consistently across all of those duplications. I think the data warehouse, that's an issue but it's still a manageable issue because of things like the dimensional modeling. Yeah. That's one of the things about NoSQL databases is that because we denormalize, because we've duplicated, then maintaining integrity is an exercise left to the programmer and we know how good they are. And I think one of the big differences, and correct me if I'm wrong, I see the use cases for NoSQL, just like in data warehousing, it's not about updating data. Even though some of the NoSQL databases are moving towards acid or other ways of being more transactional, I just think that the discussion point, the decision point is based on why are you building this data store? I'm just going to be generic that way. It really is more optimizing for the use and mining is probably the wrong word because it means something to a lot of people, but just the use and consumption of data and less about the transactional data. So if I have a billion row table, it's probably equally likely that I'm going to have updates to some fact in a transactional system to increase the data in that billion row table. So it's bish, and I might need to have to model it in a way that's kind of more flexible or generalized than I have in the past. All things that might have to do with me moving towards NoSQL, but all of these decisions are cost-benefit and risk decisions that have to do with finding the right tool for the right job. So when I say it's like data warehousing, I don't think it means that NoSQL is the only way to do data warehousing or that data warehousing should be done for the transactional things. I'm just thinking the thought process and the architecture process is closer to that. I think it was a real sweet spot for NoSQL. And that is not from the issues that we talked about of availability, so they've done some wonderful things for availability, linear scalability at reasonable cost, but where they've really seen to shine is a very high ingest rate, and they do that rather cleverly by not trying to update, but what they actually do is an append-only model, and that is a good architectural model for very high ingest rate. And then similarly, because the data is already preformed to answer questions, then they can have very low latency, very high throughput ways to answer those questions. So just to give you an example, one of the ways we use Cassandra here in Twitter is Customer Profile. So as a user arrives at an application, we want to quickly look up their profile in terms of preferences, and a database like Cassandra is just wonderful for that because we can have an absolutely huge number of queries given the profile of this user, a perfect use for no single database. It's looked up by key, and here's a complex document which describes their profile, so that it can do that very, very fast, very reliably. So they're a real sweet spot, but I don't think Analytica is a sweet spot. Yes, what we see is with most of our customers that I talk to concur expense reporting, for example, or Orbit, a lot of their use cases are, I call them OLTP-ish because there is really no, there is lightweight transactionality, it's massive, there is consistency. I think a lot of these use cases need strong consistency. There's different databases that support different consistency models, but user profiles towards that Alex mentioned, send stores, metadata stores for large pieces of content, these are some of the sweet spots for no FQI databases, particularly where key value access is part of the access pattern. And so, with some cases like CalSpace, you have a built-in cache that allows you to consolidate the caching tier along with the database, and that's what gives you the low latency, as well as the high performance. And some of our systems that customers deploy are running hundreds and thousands of operations per second. So it would be very difficult to tune a relational database for that use case, particularly for right scaling. So right scaling is a hard problem in a relational database. You could use Oracle has systems like RAC, for example, where you could start, but all, again, based on shared databases. So true shared nothing architectures really give you some of the scalability and the availability benefits that are harder to achieve with relational databases. So they're possible, aren't they? So let's consider VaultDB, for example. There is a true relational database, shared nothing, scale out, 3 million transactions per second, and how many of us need more than that, so let's not say relational databases cannot do this. Let's say that there are very good reasons to use no-SQL databases and sort of economics is one of those reasons. So how do you feel, having heard a lot of all this stuff, how the modelers and data architects are actually should participate in projects where these types of contentious discussions and try to define where's the right place to let day sleep at night? Well, the first aspect of this is really understanding the business requirements. I mean, ultimately, at the end of the day, every customer that I talk to has some direct tie with what the business needs are. So if you're looking at an organization that has lots of transactional requirements, has lots of security requirements, data integrity, things of that nature, data architect in dealing with a change is going to need to understand those requirements and also what the capabilities are for the various systems. We've got one customer who's doing agile development for the applications, and they've got, you know, traditional, relational database architecture. And right now, they're finding that it's an interesting discussion that they're having between the need for rapid response to changing business requirements, which is represented by a frequent addition, modifications, or subtractions of columns to their data architecture and the need to maintain the integrity and security of their system so they do no harm to their customers or to, you know, the risk equation. So the approach has got to be one of skepticism, knowledge, and fundamentally, working through the nitty-gritty what is it that we really need to accomplish here and how do we fit the capabilities of the various systems together. I see no single solution on this kind of question. It's really going to be that kind of a process. That's an interesting study that you made about it. So I just wanted to throw in a thing there for him. It's just a little bit of snark here. Having attended a few demos here this week, I'm wondering if one of the best ways we can help out some of these teams is helping them identify and form meaningful names for these different items, math nodes, and everything. I'm seeing an awful lot of stuff like I used to see with database designs in the early 80s where people just picked short and non-unique names for things as well as, you know, helping people write down and I won't use the word document, what decided that thing was going to mean and what it should be called, and then all the other metadata that right now I can't find or I'm not sure where that would go or whether it would go into these databases, but all the metadata that we would normally keep about any data that's persisted, whether it's personally identifiable information, what are its security requirements, what it can be used with, what it should never be used with, what it should never be used for. I don't see a lot of those discussions in the NoSQL design world. I won't even use the word model. Is that some place where we could contribute? You know, it definitely is. At the core, the end of the day, when you get down to, you know, names of objects and definitions or properties of those objects, whether they're relational or non-relational, without paying attention to some of those core basics in terms of autonomy, naming standards, all things of that nature, you know, any system is going to succeed or fail based upon how well the atomic level is addressed. Thank you. Yeah, a couple of issues. So, picking up on the first phase of what Ham was talking about, you know, that flexibility of data model is a very interesting topic. And I think sometimes, you know, mission is understood in the relational world. So, I'll just give two examples which sort of operate at the ends of the scale. So, when I was working at PayPal, we had one table in the system that is absolutely hammered constantly all day every day. And the idea of adding a column to that table was almost impossible because the only way to do it was to log the database for some reasonable period of time, and that man can't take payments. So, flexibility of data model in a situation like that is just a wonderful thing. On the other hand, I think we often forget that you can have flexible models in relational databases. You can actually store name value pairs, guess what, in a relational database. So, you know, I think we ought to understand both ends of the issues of flexible data model. To comment on the second phase of what Ham was talking about, you know, I think metadata is absolutely crucial regardless of what kind of data store you have. If you don't know what data you have and what it means and where it is, then your business is not going to go very far. Amen. And I see a lot of the developers who are embracing these technologies very excited about making data move fast. That's just sexy, hot, fast. The performance demos are always great to see, but the old-school data architect love your data, respect your data, take care of it. I always kind of cringe, and I know these are features of the tool, and I embrace those. But just three words, like eventual consistency, and, you know, that makes traditional data people get itchy, is what I say. Yeah. And why we do it? Like, we don't need to have perfect data for all use cases, and we trade that off for something else, right? But what if my bank started using eventual consistency? I would have a perfect example. I always do. So, I'm happy with my bank continuing to use strong assets for their transactions, right? But there are a lot of other applications, right? There are a lot of different kinds of content applications, and particular collaborative applications where, you know, there may be a need for eventual consistency, or I was just posting here on the chat, eventual durability. So, what we do is that user requirements for these applications vary quite a bit. I started seeing it as a spectrum. I think that... So, cashmere is strongly consistent, but Cassandra, for example, is eventually consistent with the ability to have strong consistency. And you can do that, right? And it's tunable. And I think that's where we're actually headed, is a tunable model based on your requirements. Again, it's going back to the application requirements. One thing you'll find, though, is that many systems that you think are strongly consistent are actually eventually consistent. So, you know, you talked about banking. Banking is actually an eventually consistent model. Credit card processing is actually an eventually consistent model. They are not acid-consistent models. It's part of them, and I totally get... That's why we still think in banking of batch overnight updates and posting dates and all of that stuff is the type of eventual consistency where different users accessing what appears to them to be the same data set, getting drastically different answers. You don't want two tellers sitting next to each other with different account balances for me. So, that's a lot of tunability as well. Yeah, so if you think about the two-party transaction, so I send money to you. Yes. So, it's important to record that, you know, I have the money available to send, and I did send it, and I sent it to you. So, all those pieces of information are important to keep an atomic transaction. But when the money actually gets to you, you don't even know I sent it. So, it doesn't have to be part of an acid transaction. That can be an eventual consistency. But, you know, if you design the data to be consistent, then that eventual validation or completion of the transaction, you've got confidence that the result is correct. But if you have an inconsistent design, you know, that bad design will never reach consistency. There will have to be some post-transactional reconciliation, which is very expensive. So, the importance of the metadata design, I think, Alex, you touched on this earlier. You know, we need good metadata design no matter what the architecture is of our system. Yeah, I think so. Going back to the tunability, right? We see, and it's not just consistency, right? It's ASCID. Yeah. It's all these aspects. We've already had added knobs for each of these because some customers are not okay with eventual durability, for example. So, you know, but you have availability in different scenarios. For example, if a node goes down, things that are in memory may not have been persisted to disk, so you lose them. But you have a replica sitting on the other side. And so people are thinking about durability in a different way, where they might be okay with having a replica for persistence. They are okay with isolation where you need to see your own changes, but you're okay if, you know, another agent sees a different view or part of the transaction. But strong consistency and strong durability are the two requirements that come up very commonly with our users, particularly with user profiles, for example, right? The one that we talked about. If I update my preference and I go back to look at and view that change, if I don't see that change there, I'm confused. And so we see that if it's the same user who's made the change, he's usually able to see his own change. And that's one of the requirements that's coming up. But over time, you'll see a spectrum of knobs for assets. So we have just 15 minutes left and we've got some great questions in the Q&A that I want to make sure we get to. I'd like to get to a lot of them. So if we can keep our responses as efficient and highly high performance as possible, that would be great. Just as a really good observation, I'd really like to know the answer to how do I model NoSQL data? The panel seems to be dealing with platforms and projects, not modeling. So as an architect, I'm interested in exposing metadata about NoSQL databases. So, hey, I'd like you to answer that one really quickly. How are we going to model this data with our current modeling tools and environments? Well, as far as what we need, we need basically to understand that we need to have the data about the data. We need metadata design that says, what are the objects in the system? What are the properties of those objects? No matter whether it's a SQL model or a relational model, or a SQL model or a relational model. So fundamentally, without going beyond that point, that's the first thing we need to have. Do you have any tools for that now? Do you think? Certainly, you can use existing relational tools just to build a data dictionary. I mean, really, concepts have been around for a long time in terms of accurate depiction of what is it we're trying to do? Is the meaning of the patterns that we have or the symbols that we have in our system? So a customer means the same thing that everybody within the span or the scope of the system of interest that we have multiple definitions. These kinds of things that we see consistently not happening properly with a lot of customer models is you will get zip code. I had one model that had four different data types and 11 different definitions of zip code. Most of the kinds of inconsistency in a risk system are going to lead to a lot of problems. Some of the metals is go back to the fundamentals. Do the basics, use definitions. Then if you need your schema, whether it's relational or non-relational, you can end to build your architecture from that. That's a good point. We need to keep our answers shorter. Sorry about that. I know. We need two hours for these webinars. One of the first questions is if you're modeling for a specific application, we'd actually end up crippling your data for other uses. So I think that the answer to this is yes, and that's on purpose. I think this is how a traditional architect approaches data modeling. We're thinking about a lot of reusability, but is that necessarily one of the goals? So I think I'm going to use the ability you're talking about multiple applications accessing the same data. And for multiple reasons. Correct. So I think that at the moment, while no SQL databases do not have joins, it's at the moment. These are early products in the lifecycle. So we will have some notion of joins across objects in the future. But that doesn't mean you have 100 tables to represent one object. It means that you would have a couple of related objects and that's the way we think of it. So you would have a few key objects or data about a specific object belongs to one document. And so you still have the ability to use these data across multiple applications because hopefully the kind of objects that you're representing are what we call a business normal form, which is actually business objects that can be used across multiple applications. Excellent. Alex, one of the questions we have in here is the shift to notes as SQL is due to the advent of big data because we've seen that people say lump-ups, especially traditional data architects, tend to think of all these new technologies and platforms and the whole big data growth as being very similar, mostly because they're not about additional data systems. So where do you draw the line and where do those things fit and how do data architects think about notes, SQL, and big data? Interesting. So I think we should separate the evolution from what you have available to you on your toolkit. So from an evolution perspective, how it all started, I think the idea of very large amounts of data and very high-end ingest rates and the structured nature of data all drove the movement towards no SQL. The idea of the huge cost of really large relational databases, commercial relational databases, was also a motivator. So I think that's what kind of kicked off the whole thing. But I think today what you look at is you have a toolkit and you can do big data in relational or non-relational. You can do fast in relational or non-relational. But really what it's about is understanding the trade-offs and picking the solution that's most appropriate for your use case. Well, you sound like an architect there. Sorry about that. Cost, benefit, and risk, you know. That's the whole thing. So the question is, the one that said, after we do these definitions, now what? Do we do an ERD model? Should the data management profession and inclusive here of SQL, no SQL, big data, whatever, appliances, whatever, should we be developing some different notations or different ways of modeling? Because one of the key factors in modeling is I personally don't do modeling just for the nice pictures in the documentation. I want to be able to do round-trip modeling where I can reverse engineer things, forward engineer them, compare one version to another. And I think, is that going to be going or should we be using our current ERD-based tools and handing them over in nice documents? Well, I think the shape of the data. So the shape of the data is kind of a somewhat tabular sort of like a Cassandra Collins family. That's basically a big flat table. Or it might be, you know, a hierarchical document shape. So, you know, think XML JSON. So if you've got those two shapes, I think our current tools already deal with those. And so I don't think there's anything special about modeling that requires new tools. Mostly, you know, what are these new databases that are about are physical implementations. We usually talk about physical trade-offs. It's the physical side. But, you know, the logical modeling, they're familiar models by the tabular or, you know, hierarchical document. I know what those tools are that I can point to Cassandra and compare it to my logical or my physical ERD-based model and be able to say what's different, what's the same. Like the real sort of, you know, getting a data modeling tool like an engineering tool, not just a diagramming representation of some business model. Like I want to do both, definitely. But we're not there yet. Oh, so, I mean, we've really had, you know, good tools in place now for, you know, not many years, certainly decades, yeah. And the long-trip engineering is not so much a limitation of the notation as the technology and also the procedures in which we use. You know, the way we actually conduct data management, I think, is at the core and how we use things like standards in our processing as well as things like oversight, you know, governance, you know, quality control and things of that nature. This has been well documented for years and courses and books and so forth, so. Yeah. So, that's a good point. Yeah. Going back to your point about, you know, how do you run it against Cassandra or couch base, that does not exist right now. For one, there isn't a common language or a common framework that can be used across databases. And one has their own notion of how to do an update, which looks very different. Everyone has their own notion of metadata. Couch base does not have any data other than databases. Yeah. There are no tables. There are no, you know, that's pretty much it. And so until you have every, so what I'm getting to is it's early. It is early. I think that modeling tools might exist for your logical definitions and modeling, but when you want to relate it directly with the underlying database, that's something that will evolve over time. And we will actually need to be able to do reverse engineering to find what a schema is for a common schema across documents is given data that already exists. So you might actually have to do it backwards. Yep. Data derives the model from the data itself. That's a really good point. And I joke that's going to be my next startup. Well, time. So we're coming to the end of all this and I really would like to talk more about this. So maybe we'll have another more focused webinar later in the next year or something like that so that we can continue this discussion because our data architects and data modelers to be part of those types of discussions. So I'm going to ask each of my panelists, do you have any short 10 seconds, one sentence takeaway that you want everyone to hear? So, Ham, I'll start with you. So, I think that the, thank you for the opportunity. I think we have here new opportunities to look at, you know, changes in our problem set. And I think that if we start looking at the measure of relational methodologies and no-sequel methodologies that we will wind up with some exciting new changes. Good. You remember that the broad range of no-sequel solutions adjust, you know, new additions to your toolbox. You still need to model. And, you know, so metadata and modeling are still important. Excellent. I would agree data modeling is important no matter what backend persistence mechanism you use. And I think relational data architects have a huge opportunity to actually reuse and reinvent themselves. The concepts remain the same. It's just that the backends are different. Excellent. And I didn't pay any of you to say all that. Did I? Except they and our champagne here. And Shannon. So we're coming just up at the end. I really want to thank Couchbase for sponsoring this webinar for you being here to see. That's so nice to have people from all the communities and all the right, wide-ranging technologies, exciting new technologies that are happening. I also want to thank CA again as well for being a long-term sponsor and Sandhill Consultants as well. And we're all in this together with the trying to figure out where we're all going to fit. I thank Shannon again for being an excellent cat herder for all of us and getting us here and getting the audience there. I wish we could have gotten to all the audience questions because I still consider all you audience people part of the panel. I want to thank Christina for lending us her room so that we could do this. And my panelists and Alex, thank you so much for providing your insight. I have attended a few of your talks and I really appreciate your making this type of knowledge. Any data architects out there that get an opportunity to hear him talk about NoSQL implementations and these design considerations, it's definitely worth doing at EDW. His room was packed and I got one of the last actual seats in the room so I was very happy about that. And Ham, we've known each other for a long time and have come through the traditional data architecture world so I'm really glad you can join us. And Shannon, off the top of your head, do you remember what our topic is for next month? You can look that up right away. While she's doing that, I should be better prepared here. While she's doing that, I wanted you to know that really soon we'll be turning off the recording for the session and we're going to stay on as long, probably about another 10 or 15 minutes so that we can continue some sort of offline discussions about these things. And Karen, I just want to reiterate what you said and thanks to Couchface and thanks to today and thanks to Sandhill for sponsoring you guys or just if you can't do it without you guys and just so appreciate it. And the next month's topic is data model pattern. October's is data model pattern. Okay, in October we're talking about data model pattern. What's wrong with teams? Data modelers and project managers. Oh, excellent. That should be good. Thank you guys. You know some project managers that want to talk data modeling. Thank you everybody for attending today. I will turn off the recording so that we can have our little open discussion off the books. And again, everyone, thank you so much. I love how much you guys interact and participate in these sessions. So it's awesome.