 is Shannon Kemp, the executive editor of DataVersity. We'd like to thank you for joining our second installment of the monthly series of DataVersity Webinar Series, NoSQL Now, Dan McCrary. And then we'll be discussing migrating security policies from SQL to NoSQL with two guest panelists today. Just a couple of points to get us started due to the large number of people that attend these sessions. You will be muted during the webinar. For questions, we will be collecting them by the Q&A in the bottom right-hand corner of your screen. Or we'd like to tweet with you to share our highlights or questions via Twitter using hashtag NoSQL Now. As we will send a follow-up email within two business days containing links to the slides, the recording of this session, and additional information requested throughout the webinar. For our two esteemed guests, joining us for the panel today, Adam Redder, founding partner of Exist Solutions and one of the core developers of the Exist Database Open Source Project. And also joining us is Michael Allen, the security architect for Squirrel. Moderate panel is Databrase's partner and friend, Dan McCrary. Dan is the principal of Kelly McCrary & Exsociates. He is a enterprise architect and author specializing in merging database technologies. He is his wife and recently published the book Making Sense of NoSQL, A Guide for Managers and the rest of us, which you can find in the Databrase Today bookstore under Featured Books. As I will turn the floor over to Dan to get the discussion started. Hello and welcome everyone. Thank you, Shannon. You always do great intros for us. I just wanted to, first of all, acknowledge the fact that I have two guys that are way smarter than I am here and are really experts on actually, not just the security policies, but actually implementing security in NoSQL databases. Adam, I've known for a long time and Michael just started to get to know through the Squirrel project or the Apache Accumulo and he works for Squirrel Data. Before I do the intros, I just want to kind of set the stage for people and there's kind of a quick description of what we're doing. And then we're going to really focus on four areas of security in general, but then we're going to really focus on the hardest things to do within the database, which is authorizing users. This is to give people a background. Security policies, when we talk about migrating policies, are really English language statements about how policies should be enforced and it might be policies about authenticating users or authorizing or audit or encryption. And the idea is how we're going to migrate the policies from something you do know well, which is the relational model to the other models. And then to make sure that everybody has a good background on what we mean, the model models we call the relational in the upper left-hand corner here and others are also familiar with the analytical or the OLAP models. But no SQL stores bring us four new patterns. Key value stores, column family stores, graph and document. And Michael is an expert on Apache Accumulo, which is the column family in the lower left and then Michael is an expert on security within document stores. So with that, what I'd like to do is just introduce, have the speakers introduce themselves and tell us a little bit about your background and how you got into the NoSQL space. Adam, would you like to start off? So my role, I had to describe it, it's quite difficult I suppose. I'm somewhere between an open source hacker and a consultant. And I predominantly work in the region of databases, which is probably as much a surprise to me as anybody else. Having left university, oh, I don't know, 15 years ago, I thought I'd never touch databases again. I am surprised to find where I am now. However, in about 2005, I was doing a lot of work with experiments and I was looking for some sort of way to restore and correct things and after a bit of trial and error, no one existed. The community was very good. We asked lots of questions. They helped us out. They were very, very responsive, which is one of the reasons that my current employer chose it at that point. However, it is open source and we had to keep bugs and problems. Eventually, I exhausted the goodwill and it was really up to me to start fixing bugs and sending in patches. You complained a lot because I asked you to fix the source, right? Yeah, you complained a bit and they helped you and then you complained too much and they get fed up with you complaining and go and help someone else so you end up helping yourself and everyone else, obviously. I started contributing patches. The sort of reward of doing that in an open source manner was quite peaking. It was a very slippery slope and by the year, as a full-time committer on exist, I have been since. So that's kind of got involved in the internals of databases and the security side had a bit of an interest in security both from sort of looking at systems from the outside and obviously, contributing to a large database project. Security is a pretty key thing. I guess that's kind of my fairly informal background of how I ended up here. My question is about yourself and how you got into the NoSQL security area. Sure. Thank you so much for having me and let's see. I think exactly unlike Adam, I have always sort of liked databases but I've never worked on one as kind of my graduate school. About 15 years ago as well, I went to work on Microsoft's systems working in their class. Nine years in PG, which you may know as a fairly well-known cryptographic doing a lot of their development work on their server-side products, by Symantec, and then after that, primarily because I was very interested in gaining some experience in the big data world and the world just happened to be looking for a security architect at that time. So I think our backgrounds and our interests meshed very, very well. It's interesting to come into this group as a security architect because one gets you into this more, but one of Accumulo's big claims to fame is that it was built with security Symantec built into kind of the fiber and waft of the database itself. But as you can see on the top of the board and one of the hardest problems we have is deciding the kind of authorization question of who will be able to see what in the database. And Accumulo has some interesting, Squirrel has some interesting takes on the area and I think it's definitely an area of good research and interest within the community in a nutshell. That's great. That's a great, great introduction for those of you guys. So let me just kind of start off with some very general questions and just for people that are not familiar with kind of the traditional security policies in relational databases. You frequently do this with something called grant statements where you're going to grant maybe a read and write and update delete access to some things in the database, often they're tables, but sometimes they're views in tables. And you can also grant access maybe to a stored procedure, but the key thing about relational databases that people have got used to is the fact that they have what we call fine grain access control. So you can effectively protect any column. If a column contains, for example, social security numbers, you can protect that from most users. Even if they have a reporting tool and the reporting tool is only granted access to the views, not the underlying tables. One of the things I know is that almost all the databases have very good ways to get users and that can be done to application level. And all these systems have audit and encryption tools. But actually what I'd like to now ask both Adam and Michael is, let's talk a little bit more about the security models in your respective systems. Adam can talk about security models in document stores and Michael's does and after that talk about security models in Apache Cumula. Adam, do you want to give us a start here? So obviously the documents that I'm really familiar with exist. I've done a bit of research about Hadoop as well. Let's see how that compares. Now in Hadoop it exists. Originally what we had is we tried to follow at a very basic level the UNIX kind of philosophy. Each resource has a owner, a group and is assigned as a mode of permissions. So for each collection of documents, which is much like a directory or a document, you can set a mode on it which determines if the resource can read, write, or open that collection, for example, or a group that determines the resource can do that, or anybody else that isn't the owner or the group can do that. And that's a nice question in many ways because you've got very, very simple set for missions. If you implement it correctly, you can do millions and millions of these kind of security checks every second. So to query millions of documents, you only want the user to see what's in a particular document. It's important that you can do these security comparisons quickly because otherwise, you know, the security performance and really the problem is not the query necessarily or the indexing or anything else. It's the security implementation. We have this kind of UNIX security model. So whilst in UNIX, you have read, write, and execute. It controls whether you can open a directory or in our case, a collection or whether you can execute a compiled application. We didn't have execute. We had update. And this was done before I got involved in existing. I always felt that this was kind of strange because we got write and we got update. Update flag was really related to our platform insofar as we have an experience to our query language, which is called XQuery. And in case of the XQuery update, we were able to have separate permissions so that you could write to a document if you want to replace it. But if you wanted to update a node in the document, you have this update flag. And to me, it never really made any sense. So this had to change because really an update and a write are pretty much the same thing. It's just nonsense from my perspective. I replaced it with the sort of standard UNIX, that was the first step of modernizing our security architecture. But then we wanted to go further because what we realized is there's quite a lot of pain in having UNIX-style permissions when you start getting into kind of building self-taught applications. We can only have one owner or one group that owns direct files or a collection of documents or documents itself. And you want to really find grain security so your business says, you know, users can access this and these users can access this, but these ones can write this and these ones can read this. So if you want to implement that and you've only got owners and groups, what you end up doing is having an explosion of groups. Basically, you're creating hundreds and hundreds of groups and putting the right people into the right groups that have these kind of little permissions. And it doesn't scale. It's not manageable. It doesn't have a scale for a lot of users, right? Yeah, it's really not manageable. So you kind of create this thing and then nobody who uses or administers your system will know that security works at all. At that point, you've completely lost the plot because the security model is great, but if nobody understands how it works, it's effectively useless because somehow someone will assign the wrong security to something and it breaks your whole intent. So we had to go beyond sort of the Unix-style commissions of pre-executing you to implement access control lists. So we took a lot of time to think about this because I was warned by a few people that discussed security with that aren't necessarily security experts that they're just bug-bet when they think given advanced security controls is understanding how to apply them. So I said the role-based access control systems that we've seen in Solaris and things like this, they're far too complicated. And other people said, well, access control lists are too complicated. I can never understand which access control inventory is being applied. So we thought about it a bit, but in the end, we felt that access control lists are the best way to go. And what we did is the same way that you get in ZFS, which is the Seedbyte file system that you get in Solaris and that Linux, which is the access control list that's applied before the Unix-style permissions. So this means sort of keep your Unix-style permissions on all your resources, unless you're default. So we can add kind of like additional rules that stated first to say whether a user has access to a resource or doesn't have access to a resource. With a little bit of clever implementation, we actually made it faster than the existing implementation. So we were able to add third-party security, which is memory and disk-based use, and both runs faster. So it was a bit of a win all round, really. And we're quite happy with that, in terms of sort of resource-based permission. But it's certainly not the last thing that we'll do, but it's a good step at the moment. Very good. I have to say that the Unix-style permissions exists for many years, and that the Unix-style permissions definitely was a lesson. So I think the new access control is going to give people, especially people who are sharing complex rules on many projects that involve document management and content management. That's going to be a very nice feature to have. So I'm looking forward to trying that out. Yeah. Sorry for that. The thing that finally triggered the implementation of access control was basically a lot of our users were building their own organization systems. Yeah, my name. On Unix-style permissions. So they were doing stuff at the application level that then mapped down to a few database users. I did it. Yeah. Particularly complicated custom, we integrated with LDAC for them. And we were trying to make LDAC users onto a few database users. And a few weeks of tearing my hair out, I just bit the bullet and added LDAC and implemented access control. Yeah, that's good. Necessity is a modern invention, as they say. So, all right. So, Michael, let's kind of take a turn off the document model and talk about commonly stores and the patchy cumulo. Sure. So, let me show you the text. I'm going to turn over the presenter to you, Michael, if you want to flip the slides. Sure. Right. We talked about squirrel and the cumulo, so a cumulo is actually an Apache project. It's an open source value store project, one of these highly scalable databases. The main vein is HBase and other things. It's built on top of some parts of the Hadoop ecosystem. A cumulo grew out of a project that was started inside of the National Security Agency. So, it was built from the ground up with sort of security semantics in mind. That you're going to have a lot of different people accessing the same pool of data, the same time, and you may have all of the data all of the time. But you may need the database to scale out to a massive amount of data. We're talking about terabytes, terabytes, up into the megabytes, upon thousands of machines, all working in concert to collect, and be able to query over in a more or less real time, with a large set of data. The squirrel was founded in the middle of last year to crystallize the Apache cumulo technology and to put some functionality on top of it that makes it a little bit easier to use. So, when you buy a squirrel enterprise or product, what you get is an interface that allows you to treat the cumulo like a document store. You can put JSON documents inside of it or look at graph database. You can have links between those JSON documents that are pre-instarched and queried exactly like a graph. And also, different views on the data basically get decomposed by the software into the key value pairs that are stored inside of a cumulo. One of the biggest claims to fame is what it's called data-centered security or cell-level security. So, every key value pair inside of a cumulo, and you can think of a key value pair as like an atom of data or an entry within a traditional RDBMS, like a fraction of one row and one column could be considered a key value pair. Every one of those little key value pairs can be labeled, labeled or labeling, with any sort of arbitrary strings that are kind of the security assertions of your data. So, that should only be seen by people with secret access, for example, or that it's personally identifiable information or PI if you're familiar with that kind of terminology from the financial or medical world. The security cell can be tagged slightly differently. These security labels can be ANDed or ORd together to create some fairly located and expressive Boolean and the other part of doing all of that is it really allows you to look at lots and lots and lots of different kinds of people acting the same pool of data safely and securely and have a security model enforced at all times, not by the applications accessing this piece of data but by the database itself. I'm going to put a couple of slides here. I think the summary, Michael, is that because a patchy cumulo was really designed from the ground up to put a very small 64-bit visibility field in every single cell, that 64-bit can be used to do pretty much whatever security policy that you want on access control. Is that a good summary? Well, it's a 64-bit limited, but yes, it's a pretty good summary. The kind of assertions you can put on there are completely up to basically the security designer of the system and the kind of data that you're putting into it. If we think about like a doctor's office or a doctor, a major hospital, for instance, that may or may not want to use, a squirrel enterprise for their security and for doing their database work, you would be able to make appointments to be able to see some amount of personalized travel information like someone's name and date or something like that. You want a building department person to be able to see what kind of procedures were done so that it can be built correctly, but you may not want them to be able to see a diagnosis out of those procedures. That might be reserved for only nurses within a particular practice. And then there are some diagnoses that are extremely sensitive like if someone has been diagnosed with HIV. Those diagnoses should then only be seen by a particular person's primary care physician. All of these kind of semantics, all of these rules can be broken down into this set of sort of security labels that can be attached to any piece of data inside of a cumulo. And those security assertions are evaluated by the very core of the database as it's through and operating on your queries. So it doesn't have to reach out to exterior systems in order to understand who should be seeing what at a given moment. All of those decisions are kind of made beforehand before your query runs. The query database itself is able to do those evaluations many times a second such that the sort of sale to which you can apply this kind of, the set of security labels is massive. And it continues to fail out very, very well as the size of your data grows and as the size of your user base grows. Those are some of the key tenants that went into the very founding of the cumulo project and continued through it to this day. We're going to sort of extend that data model to make it a little bit more user friendly, a little bit more approachable by people who don't have familiarity with key value stores as a concept, columnar databases as a concept. You can track exactly like a document database or a graph database if you're more familiar with those ideas. And we sort of take care of the work of organizing and implementing those kinds of schemas in terms of the Apache cumulo, so value universe. Okay. So I just want to make sure that we're clear for the audience. There's a lot of databases out there and no SQL field. And a lot of them, when they're first created, it seems like they haven't really put security as a primary concern. And so security seems like in some cases it has to be kind of retrofitted in. Is that a fair amount of statement? I would say it's really fair. I mean, most of the databases, most of the RDBMSs that were developed in the 70s and 90s, their goal was to index and query data. They weren't really thinking that much about security. Security was applied if it needed to be by any chance. And then it's fully a number of applications against the database grew out to an extraordinary number of people realizing all of them are applying slightly different variants on whatever security policy is supposed to exist. And it becomes sort of this untangled map of security to be pushed back down level into the database somehow. And then the sort of tenets of how the data is organized, of how it was accessed, of the kinds of variants that a particular database engine could ask and then he'd give them a methodology kind of set in stone, if you will. And it was hard for them to reapply those security notions back into it. I believe the kind of same scale and the same operational level that they had in the past. Yes, I totally agree with the experiences that I've had as people went down and got an early NoSQL database in and tried to protect it at a core screen and then they couldn't get the same control that they had in relational databases. We had one person chat a question on the Twitter account to Dataversey and he asked, if you have a policy as certain columns in a table are private, can you use your software to protect effectively columnar data from a view? Michael, you should take it first and then Adam, you can jump in. Sure. Yes, absolutely. And in fact, sort of selling points of the school software on top of Apache Accumulo is a lot of help with consistently applying labels to data as it comes in. Accumulo basically leaves it up to each application to insert the correct labels into whatever data it's writing into the database. We call our labeling engine which allows you to define human-readable sets of rules that say, hey, if my data happens to look like this, it happens to be built inside of a JSON document called RAP to be labeled as PII type data and so it needs to get the label PII before it gets placed into the database. Absolutely, you can use our kinds of software to translate those sorts of policies directly into what you would be using Squirrel Enterprise for. Okay. Adam, are you with us in the... Yeah. Okay. I want to tell us a little bit about if I wanted, if I had an XML document that had some elements or had, you know, product numbers in them. How might I protect those within the existing database? That's an interesting question. So obviously, if you'd bought the entire column in as a single document, you would just set the security on the document. However, in terms of protecting what's in the document, so you've got something in the document that needs a different security level to something else, you would have to create something like a view. We would probably do that using a people-stored X query or something like the REST interface. You would call a URL, and the URL would effectively be executing a stored query. And that's that stored query that's really in full for the permissions from the data that you're trying to access in the document. You effectively have a view, a dynamic view. There's another way you could do that from the permissions that you place on the query to determine who can execute that query and if the query executes setGID or setGID, so it's got elevated privileges, or if the query itself, you could enforce the security so you can integrate the document and the database and the people in the database and perhaps some sort of poly XML that you've created to determine whether that query should allow or permit access to the document. And effectively, while it's kind of a little bit application level, all of the properties that you're using are really baked into the database. Kind of, I guess, somewhere between application level security and database security when you start doing that. Yeah. We had a question in the Q&A there that the UNIX file system is really for file security, but using views in exists or the cell level labeling affects much finer-grained control. I think that makes sense. So I think we've answered that question. And another question for Michael here, then, is for the cumulative cell level security features are similar to oracles' labeled security mechanism. Is that correct? I'm not familiar with that feature of oracles, so I can't kind of definitively one way or another. It's very good as a company at identifying really good ideas and putting them into the oracle products and selling them at a high price. Right, yeah. I should also say that the W3C does have an attribute called PI that's associated with... I think it's a micro standard, but with XML files, if you have an element and you give an attribute PII equals true, you can apply certain filters that will pull that out in standard views. And you can kind of do that at a certain level within a lot of your queries. So I know that's a feature that I've seen in some of the queries that I've worked with on federal agencies too. All right, so one thing I wanted to mention we're at about halfway through the point. And we have people in line here, and if anybody has specific questions, please feel free to open the Q&A text box on the right-hand side of your index console and go ahead and enter any questions. All right, so the next thing I'd like to ask, and I'll make sure you guys pick this, is do you have any sample stories of an organization that has had connections about security and how some of our NoSQL security engines have been able to meet their requirements? Yvonne, just jump in here. Do you have any stories you'd like to share? I don't know that I can specifically talk about any of our particular naming them, but in terms of, like, I'll say this much, in terms of broad sort of adoptability kinds of questions, we've talked to any financial entity that many of you have heard of holding out these kind of large terms so that they can do fraud index or other kinds of analysis on them. The primary blocker, 10, 4 deploy those kinds of things into these sorts of organizations is how can I comply with my security and privacy? You know, a cumulative level for one and score one example gives you a long answer to those kinds of questions in terms of the ability for you to kind of translate policies like Fox compliance and other things into the set of primitives that you need to fulfill in order to be able to see a piece of data. You need to be in the auditing department and it needs to be a piece of data with, you know, someone's or you want to see someone say name or address. You can use these kinds of degrees of freedom on individuals and pieces of data. When you're doing it with squirrel, you can do that in a JSON document so you can have, you know, someone's data be protected by one label, but then they're trained to be protected by another set of labels that are available to the auditing department so that you need to sort of come through in order to identify a particular individual, but you can do sort of analyses without violating any privacy and have those semantically guaranteed not by any particular application or any particular individual, but by the database itself and by sort of core running technology. There's one question kind of related to that. In the slide that you were showing, you have this policy box in the lower left-hand corner and the policy then applies both the labeling engine and the policy engine. How do you describe these policies? Are they JSON documents? Are they text things that the average business analyst could read or is it mostly program code? Well, I think the first thing is what are the people that you're trying to sort of think about accessing the system at any given time? What kind of people are they and so on and so forth? We just have to pass that level into trying to figure out how am I going to model my data in order to enforce these decisions that start from this English document. One approach is this is there are two kinds of policies. One sort of applies to data that's coming in and one that applies to data as it's being read. So the sort of right path and the read path have separate tasks because they're trying to answer sort of separate tasks. When you're writing data, you're trying to figure out what kind of data is this and due to accumulose semantics and how it works, you need to figure out what labels do I apply to any particular piece of data. Those rules look like a specialized language that's developed in-house that looks like a SQL query where it's readable as sort of a SQL tends to be. Okay, level DSL, domain specific. Exactly, it's a domain specific language and they say, you know, apply this set of labels to this part of the JSON document if it meets these sort of requirements, like if it has this kind of derivative bit or if it's at this level of the document or, you know, so on and so forth. For the policy engine on read time, the question is trying to answer and he is in the auditing department of my bank. So what sort of labels, what sort of set of labels does that give Adam at this time a question of what are our options and what their sort of security are? Our policy engine is probably the least fleshless picture but it's definitely an area of active for doing authorizations. Right now your options are things like, well, tie it back to the sort of members of the security group notions inside of an active directory. Those can become your labels if that's what you want to do with your particular data at a given time. You can also just give, squirrel and accumulo sort of a list of saying, well, Adam's labels are this and I compute whatever this is by running a SQL query band and generating a text file and loading that into accumulo itself. There's a lot of options here and it's definitely an area where we are actively developing more flexible answers. Exactly, Zacamole is something we're looking at as one sort of potential piece of this puzzle in terms of how you define policy for a policy engine that sort of defines whether a particular people can see a particular piece of data. Zacamole is definitely an interesting mature standard. It also, in its intent, answers a slightly different question in that Zacamole sort of policies tend to say can have the thing act as a question that this policy engine box is really trying to answer is what is the answer. But we're looking at customers that are working outside, partners on trying to figure out how do you sort of, officers, business analysts, these kinds of folks that have a program in Java that I haven't forbid, but also need more expressivity than all kinds of options in an area we're continuing to look at. Well, Adam, some questions here. And I know that you've been in the security domain for a while and I know that in the past the existing system also had some support for Zacamole and for people that aren't from in with Zacamole, it's an XML standard for trying to express security assertions and policies from that. Adam, do you want to talk a little bit about that and some of the work in there? Sure. I can certainly talk about a little bit about Zacamole that exists. I can't particularly talk about my work with Zacamole, simply because Zacamole was implemented and exists before I even started contributing to exist. That's really been in exist for some time. Although, at the moment, we actually mark it as a deprecated feature. The main thing for that is the use of Zacamole implementation that we have doesn't actually control access to the data. What it does is it controls access to the executable code in so far as a very fine grained level of Zacamole to control who or what circumstances can execute various functions in the database. So even if you have a modular function written in XQuery or Java, you can actually turn on and off individual functions based on user criteria at that time. It's very, very powerful for sort of controlling what you can do with the data as opposed to access to the data effectively. Now, the reason that it's deprecated is because it's an implementation, I think, of Zacamole 1, possibly 2. And it's, whilst it works, I think the implementation isn't really up to scratch. Performance isn't new. And it often is. Then I wouldn't recommend using Zacamole and Exist at the moment. I think Zacamole is quite mature and Zacamole 3 is kind of out there and not really with it at the moment. It might be, you know, could you basically a re-imitation of Zacamole and Exist? The key thing is that, you know, you can control the absolute new functionality in Exist, but also trying to apply some of those concepts to accessing nodes inside documents and security down below the document level without any sort of dynamic views. Okay. We had a question just come across about viewing and labeling. And it sounds like, and the question is, do they go hand in hand or do you use one or another? You either comment on that. I think the eyes sort of think about is very tied into the our model of the world, where you are basically, you have, you know, they have rows and columns, and then you are, you know, always being select query on one particular set is joining all those together, creating sort of another virtual table that's driven with data within the views tables. NoSQL tends to take an attitude of, like, joins or not, if you will simplify it for a minute in that, you know, it's very expensive to do, and, you know, if you're just running sort of within the auspices of one node, one network of nodes, then joins become very expensive when you start talking about doing them over a distributed system. I think we'll sort of take an attitude of, like, hey, let's simplify it a little bit and not do joins as much as possible, and if there you can do joins up at an application level. The notion of a view on the data, we consider using labels as one way of sort of thinking about a very security-oriented view in terms of, like, if you use labels and they're attached to your data and this user comes in, they're only going to see a particular or a set of slices of any given set of documents. It's not like a view in the system of doing some kind of transformational combining columns or doing aggregations over columns or things like that. First features that are kind of like those features in terms of being able to do those kinds of transformational things and aggregations and things like that, that we call it a view per se. Right, right. Sorry. So I've been using the term view. It's obviously not really what I call it, but I'm trying to pass a bit more into a relational day-space parlance. So if you could really think about it as a materialized sort of property results of a query, but that probably doesn't seem to anybody that hasn't used existence. Yeah. I think of views as a subset or as a superset of a single physical table, sometimes joining in other areas. I think the one thing that many people in the three-digited processing area have is that when you talk about joins, they in general think of moving data between nodes in a cluster, which is kill performance, and no sequel mantra seems to be much more moving queries to each node, having that node processed and sending maybe a small subset of a large table back, and that typical thing you'd see in a map-reduced transform too. So that makes sense. All right, so let me ask, even if you guys were about quarter to, and so we got about 15 minutes left, there are topics that you guys think are really critical for people that are kind of new to this area. I think one of the things our audience really has is they really have a strong familiarity with relational databases and relational security. But coming to this area, what are the things that they should think about? And long enough to make some assumptions, I'm assuming that almost all these databases can authenticate users, support things like OpenID, some of them are starting to put in Kerberos authentication, and a lot of those have much more mature audit tools. The one thing that is relatively new for a lot of databases is encryption-level tools within the database. Any big thoughts that you'd like to share about those cautionary terms about selecting these systems? If I could give any advice on security and I think Michael's really the expert here, so feel free to contradict me. I would say start with the simplest thing possible that works for you. I've seen many customers who think that they need the most complicated security thing. And a lot of it is because they're worried because it's their job or something like that. It's not because they actually need this security because a lot of my customers aren't banks and things like that, some of them are, but most of them have failed in all kinds of organizations. Yet, a lot of people power out security so they haven't kind of over-prescribed this idea. So that's not to say that there are people that need that. When you sit down and sort of talk to them, you actually realize pretty soon that they need everything that they think they need, you can still achieve the same level of security. You just really need to talk through the requirements. So the only piece of advice I'd give is start simple. I've seen a lot of people that build these incredibly complicated security before they've even decided really what their system needs to do. And it never really works out quite well. I'll echo some of Adam's sentiment there. When I look for these kinds of systems, it's not really for database systems and these new NoSQL systems that are highly distributed, tons of data, and may or may not cross even geographic boundaries. They're sort of trying to orient people towards this. Like, really what is the threat you're trying to protect against? Like, what's the human-level thing that's going to happen that you're worried about that will help you sort of figure out what the right level of stuff to do about it? If people start in with, well, let me do X, Y, and Z. If I got to do X, Y, and Z, then they kind of lose sight very quickly of, wait a minute, what was the bigger thing? What was the thing I was really worried about? What's the thing that's not going to let me sleep at night? One of my jobs is letting people sleep at night. So if you're thinking of that. So the thing about something you brought up, Dan, encryption is basically an encryption module for its data and its rest. It's coming out very soon in the next release of a cumulative. One of the interesting parts about it is it doesn't implement the encryption. That means paying by the numbers engineering tasks. Like, just make sure you're using your primitives correctly, which takes away knowledge, but isn't a super amount of rocket science. The only way to implement an encryption is working with all of the keys that you've just created or sent into consideration. Essentially, you've got all of your data inside of a very secure safe, and you have locked that safe with a crazy long, twisted combination that takes four to three men to do all of that combination, and you've written it on a Post-it note and you've pasted it to the side of the safe. So you've really got to think about, okay, well, now I've just created this big security apparatus with the keys for that security apparatus. Who can get those keys? What actors? Some of these people, most of them have time for these big notes equals systems that are out of the servers. And certainly, aren't people that are always on, they need to crash sometimes, they need to because that's just something that happens. So when they reboot or come back up, they're going to need to get that key again. So how do they sort of prove themselves to whatever system you're using to manage those keys? Those are really, really important questions in order to really make sure that you've implemented a security feature like encryption correctly. And that's one of the things that I think Skrull and Accumulo really bring strongly to the table is a lot of expertise in that area. So I'd just add that if you can get it right, it's a beautiful thing. You know, it's a centralized key server that the systems have access to. That's recently working with a client who hadn't got it right. And they have a centralized key management system and users were effectively getting keys from there onto a USB disk. Walking into the server room, plugging into the server and uploading the key into the application to unlock the data they needed. I knew that they were putting on this USB key. It was completely unencrypted. There was no clarity at all. It completely defeated the entire point. One of our listeners, Ellen List, just asked a question and she asked how can the solutions presented prevent a marketing firm from finding data sets with graphic information to find out maybe a unique person identifier? Well, that's kind of a general question. You guys want to take a crack at that? No, I'm just on the question. I'll take a crack at it. So you're oriented and answered to that question of like, can you sort of separate someone from their private data, from their personally reliable data or their sort of what they have, a particular web application as sort of marketing kind of information from where they did those actions, is to set different labels onto those different pieces of data. And people inside the marketing department cannot obtain labels that they should not have access to. So that's a very interesting question about particular piece of data and what constitutes a personally identifiable piece of data. One sort of very interesting fact that I picked up is that if your date of birth and your zip code identify you with about 37% accuracy as a unique person. I can remind that with another thing that sort of lists out who you are. Yeah, that's pretty good. So we're talking to a lot of context about what we're talking about in terms of like not being able to combine these things because one side of that fence or another can lead someone who's got some skills into being able to identify a particular person based on what would certainly be a limited set of information. Another sort of random fact is that HIPAA has a lot of rules about what constitutes a non-publicized data. One example is that ages within a data set, if they were, say, about 80 or 85, they have to kind of stop there. You can't actually put someone's age in past a certain point because two people are identified then from the various other pieces of data within a particular data set. So I think the answer to the question is really you really have to limit the amount of data that you provide marketing if they really have the intent of reverse of actually escalating different level people for marketing campaigns. Other data that you give them, the easier it's going to be to narrow down who that person might be from publicly accessible data. So we're about five minutes to here. I just want to make sure that either of you guys, if you have any upcoming projects that you're working on, things that the audience should know about. Adam, you want to tell us a little bit about the book coming up? So myself and Eric Siegel, another sort of pretty experienced exist user, have been co-offering a book called Exist for Marketing. It will be available early access in mid to late January and we hope to have it in print for the first couple of weeks in February. It will be well. You have a lot of very detailed chapters on security and I was very impressed by the technical content of that. You're excited in the security chapter. So it's probably one of the happier ones in the book. I know how that feels. I'm good. So that's pretty cool. And if anybody's interested, there's a great conference in Prague in February called XML Prague. All the developers will be there including myself. And if you're really desperate, hopefully we'll be giving away a few copies of the book there as well. That's a fantastic conference. My wife and I went a couple of years ago. It's had a great time and some really smart people there. So we love that. Michael, what are you up to in the next couple of months here? I don't have anything that I can sort of personally plug, but I'll plug someone else's efforts in that there is an Accumulo O'Reilly book that is under development right now. It's in early access. So if you want to learn more about Accumulo and kind of how it works and what you can and can't do with it, I highly encourage you to go to the O'Reilly website and get that early access edition. The e-book of the week or something like e-book of the day, a little while back, might come back around. So keep your eyes on that if you're looking for a deal. I think the full book isn't due out until the year, but the early chapters are there. If anyone has any follow-up questions or wants more information or bounce ideas off of things, I'm available at Michael at Squirrel.com at sqrrl.com or from Alan on Twitter. All right. Thank you very much. You guys have been fantastic. I do want to thank you guys both for helping out and sharing your information. We will be posting show notes for this. I think, Mike, we have some good presentations out there in the web and Adam has some there too. We'll make sure to get those in there. With that, I'll turn it over to Shannon and thank you guys very much. That was a great discussion and to our attendees for our fantastic questions. I just love it when everyone gets involved. Again, as Stan said, I will get that information out to you, the recording session along with the slides and links to everything out there that has been mentioned throughout the session. Just so you know, there is a U.S. holiday on Thursday and technically Friday. The email may not go out to end of day Monday. It goes out within two business days, but I'll try and get it out as early as possible for you. I hope everyone has a happy holidays and thanks so much for attending. Thanks to Adam and Michael and Dan for this great discussion. Thanks a lot, everybody. Take care, everybody. Bye-bye.