 years, and I joined the Guardian team. So Guardian was a product that IBM acquired about three years ago, a really great company, market leading, data activity monitoring and protection software. And so I joined that about a year ago. And one of my very first assignments upon joining the team was to help the architect write requirements to support MongoDB. Now I'd heard a little bit about MongoDB, because I knew that DB2 guys were looking at it. Wasn't sure exactly what it was. Got involved in the project. Sundari, who's on my team, she was working already. We were working on the Hadoop support. We are actually first to market with data activity monitoring for Hadoop. And she's like, I want to be on that project too. So I said, OK, great. So she and I started working on the MongoDB support. At that point, our big boss said, we have a client. We have a customer. He said, we need to get validated. And we need to get validated with Tengen so that they can start, it's a big financial client. You're going to hear more about it. So they can start moving MongoDB throughout other parts of the enterprise. So that's how the three of us came together. Sundari and Matt worked very closely on the phone for many hours, making sure that our solution would work with Tengen. So we're now validated. And then the three of us put together an article. And so we decided, what the heck? We're going to come here and talk about some best practices, some things you can think about doing. I'm going to try not to make it too product focused. You're going to hear a little product stuff because that's how you'll see when we get through here how it all works. So the agenda for today, I already talked a little bit about why we started working together. And then Matt's going to go into a little bit more of the use case of this particular client and why they chose MongoDB. I'm going to go a little bit about the business drivers for data protection and a little bit of the architecture of our activity monitoring solution. And then we're going to go through six steps to kind of get you on the way to thinking about security and protection. And then hopefully there'll be time at the end. Sundari is going to do a live demo. If we do run out of time, there's a break after this so people can gather up around and we can do it for anybody who's interested. OK, so with that, I'm going to turn it over to Matt Callan from TenGen. OK, there it goes. So yeah, this is only just in case people aren't familiar with MongoDB. It's not obviously the focus of the session, but just so you understand, because it'll be useful when it comes up later, so you understand that MongoDB is a document database and gives you the ability to store document data structures, like on the right side here. Is there a mouse? Anyone see a mouse? Oh, there we go. Yeah, can I point? Oh, there we go. Point. Ah, great. So yeah, so you can see an example of a document data structure. And I'm sure many people here are familiar with it, because it's a NoSQL conference. There's not a predefined schema. There's things that are introduced differently that people are accustomed to in a relational world that you don't get in really most NoSQL, or maybe all NoSQL databases. And so that represents some challenges. And of course, in our work with Guardian, they do have some solutions and help for this. Also, MongoDB, you access from a native API, so it's a little different. There's not a SQL layer there. And then, so you're aware that MongoDB has kind of primary secondary replication. So there can be up to the 11 nodes spread all over the world, so it really is a distributed database. And horizontal scalability as well. So behind a cluster, you just keep adding more and more nodes. And it auto-partitions or auto-shards. So many of these are kind of different. It makes a cluster very large. It makes monitoring it maybe a little harder. But just to give you a sense of kind of a snapshot of Mongo. Now here was the real case study that brought us together, just to see that NoSQL really is getting really huge business value, driving real huge business value in many firms. The problem here was that this large investment bank, top three in size, they ate up to 36 hours in distributing their reference data globally. And what happened is they would get charged multiple times, because reference data would maybe start in New York when they loaded it from a Bloomberg or Reuters of the world. And this is in capital markets, as it says, their investment bank. And Australia wouldn't get that data for so long, so they would ask Bloomberg or Reuters for the same data. So you have this large bank getting charged quadruple maybe 10 times for the same data. So that was a huge cost there. There were regulatory penalties for missing SLAs because this data took so long to spread. And there were some 20 distributed systems with the same data. And if you look at it graphically, this is really what they were doing, loading the data into New York. And they had batch processing. They were using ETL, but not doing the T in ETL. They were kind of just moving data along and not really transforming it much. And of course, each of these systems represents people dollars, hardware, license dollars. I talked about the penalties and other downstream issues of doing this, right? So their solution with Mongo was to, and this is the main one that got a large DOSOM 1,000 surveyors sold. And this was maybe 50 of them or so. And what happened is they decided, let's make a primary in New York. And this was just use real-time replication to send install data all over the world. Because actually, we're not doing the T in transform. So we're just replicating data. Why not just do it this way? So if you look at the results and the benefits, they're planning on saving $40 million in costs and penalties over five years. So a really great business case. Only charge once for the data. You get all the operational benefits of all the data in sync globally. And they have extra capacity to put more and more reference data onto the same platform, which is nice. So it's kind of a one-stop shop instead of 50 different reference data systems or so. And why Mongo was a great fit is a dynamic schema, as you see in many NoSQL databases, the fact that you could bring all this data together into one place and not have to manage the schema in yet another location. It's already managed upstream. Why do I want to manage it in another place when it's just a pass-through to getting reference data to all the applications built downstream? And other benefits, certainly, that were kind of a cash and a database. So you can read from those local replicas at in-memory speeds and such. So just so you know, kind of the context of this large customer, large to both of us, said we needed to work together and it actually did help in general. Because now we can talk about really architecting a secure, compliant system. And it helped get the deal done, too, which is always nice. So back to Kathy, and I'll come back later. So I got this. I was reading, I started cruising the web about system administrators and the different system administrator personality types and et cetera. And I found this in one of Bob Cringely's columns, and I thought it was really appropriate. So that's there for you guys. But seriously, system administrators, hacking, I mean, there's all kinds of threats. We know about this. This is probably why most of you are in this room, because you're either in an industry that's regulated, that you have to comply with, or maybe you're just a good citizen. But the point really is there are many more external threats, a lot of organized crime getting into it, national security issues. There's internal threats, just plain dumb things that people do, or getting too many people's admin privileges, as we witnessed with the NSA. Compliance. And that's really probably the major reason why most people, by our solution, is because there's a lot of mandates around an industry, payment card industry, data security standard, which is actually a very excellent standard in terms of actually protecting your data. Data privacy, I was thinking about MongoDB, and the fact that so much information is in a single document. You don't have, like with relational, you could have a driver's license here, and an address there. And it might be a little tougher to piece it together to actually violate a standard. But with data privacy, with everything in one document, maybe there's even more of an issue. I don't know, maybe. So these are just use cases of various breaches. This was hackers, you think, well, all these NoSQL systems are very new. Hacking isn't necessarily going to be the biggest issue. People aren't really targeting yet. But the thing is, it's going to happen. As they become more mainstream, there's going to be a lot more opportunities for hackers to exploit it. SQL injection, well, we've heard about JSON injection, JavaScript injection. So that is something that could happen as well with the NoSQL databases. Unprotected test data. This is a case where even if you're working with test data, you need to de-identify it, or mask it, or whatever. So there's a lot of the data privacy standards and PCI are very strict about that. So these are all things that you need to think about. OK, so this is very simple. Why do people need to comply? One, prevent data breaches. Because a data breach is not only expensive if you have to pay fines, or you have to mitigate the problem. It's also very embarrassing, and it's very bad for your reputation as a business. Assure data governance. You want to make sure that either the internal users are not making inappropriate updates, or changing data to, in some way, invalidate or misrepresent your business. Finally, to do all this, people don't. I mean, it's not necessarily everybody's favorite topic. Compliance can be quite cumbersome. There's a lot of overhead. You need to work with auditors. You need to validate and prove that you're doing the right things. So you want a solution that's going to hopefully be automated and is well-integrated into your business process, and therefore reduce the overhead. You also probably don't want to get into the business of writing your own tools to scrape logs, read logs. And even by doing that, if you want to do that additional overhead, you have the issue of separation of duties. So if you have the same people who are managing the database, who are managing the audit data, then you have a real problem there with a possibility of privileged user modifying or hiding their tracks. So here's my exciting animated graphic. This is a little bit of the architecture. And the reason I'm going into this now is because it'll help you understand a little bit of some of the visibility that the system brings you. So the way it works is that the clients are issuing their Mongo calls. And in a sharded environment, they all go through the routing server, so Mongo S. And that gets routed out to all these data shards. When what we're doing is we have these little lightweight agents that are called software taps that are sitting on the data server. And they're basically doing a quick copy of the network message. And then what happens is the overhead on these servers is extremely low because it's a very low cost operation. It's sent over to this hardened appliance. There's no root access to anybody to this appliance. It could be hardware, software, appliance. And that's where all the heavy lifting takes place. So this is where the message is parsed. It is broken into our repository there. And from there, the only way to access this is through reports, graphical or tabular reports. There's a quick search facility. Also, while it's processing this, you can be getting real-time alerts. So if you have a security policy, for example, that says a privileged user is not able to read certain sensitive data, you can generate an alert. And those alerts, of course, can be sent off to a SIM system or other enterprise-wide monitoring tool. And as I mentioned, there is separation of duties because this is all controlled by a separate administrator. Even that person doesn't have access to the root data, to the system where all that other data is stored. And finally, the solution has been validated by our friend Matt over here on behalf of Tengen. So here are some six steps that we're going to go through. So I'm going to go through these pretty quickly. Know where your sensitive data is. You don't necessarily want to need to monitor and protect as deep below everything. Evaluate the risk factors. Restrict access to need to know. So make sure that you don't, as much as you can, leverage the controls to whoever needs to really know the data. Encrypting sensitive data, so this is encrypting data at rest or on the wire. So we're going to talk a little bit about that. Implement database activity monitoring. These are all things that are very common in many of the state of security standards. This is what gives you visibility and auditability. And you're going to see a little bit more about the visibility here and the benefits it can bring you. Because not only are you doing this for security, but you're going to see that you can also see what's going on. What are new fields being added? Is somebody using JavaScript? We now have visibility into what's going into the system that you may not have had before. And then finally, you need to centralize this management. You can pass alerts or audit data to a centralized system such as a Q-radar or a Tivoli or something like that for centralized management. So you don't necessarily want or have to monitor and audit everything. So I saw earlier where somebody said, if you have sensitive data, put it into a separate collection. I mean, I think that's really good advice. One issue, of course, is that you may not have control. So you're going to have to set up some kind of control around what goes into your database and where and which collections. I was thinking about this on the light rail the other day. And I thought to myself, how are people going to know? We don't right now have an automated discovery tool to go through and crawl like we do on some of the relational systems. We can actually do discovery of patterns, like a credit card pattern or an email address or something like that, a sensitive data, driver's license numbers. But what you could do is with the monitoring solution, you can say, well, you know what? I know what all my fields are. This is my application that's using these fields. With monitoring, I can have that be a known set. And if anybody, because it's not a fixed schema, right? So I don't have a catalog I can go to and say, oh, MongoDB has these fields and these, you know, whatever. Well, if somebody does insert a new kind of data, a new field, extends a field, you know, you could be made aware of that through these reports. So that might be a way to kind of at least get some level of control or understanding of what's going into the system. And again, try to think about your methodology for how to control sensitive data before you start building it. OK, step two, so evaluate your risks and vulnerabilities. This is where you need to kind of prioritize the risk factors. So obviously, what you're going to do is always make sure you keep in touch with the vendor and what their best practices for security is. And the Tengen guys have a really nice wiki. I think it's really excellent. And they have some really good advice on there about managing security. Are you using the default ports? Well, they say changing the default to a non-default isn't really going to buy you anything. And I think they're right. But you could change the default port, but the port scanner tools, it might buy you five seconds. Make sure that you're using authentication, because it's not the default. With the new release of MongoDB, maybe take a look at their new rules, and Matt's going to talk about that. And that'll help you control granularity of access. Make sure your IP bindings are correct. You're not exposing anything to the outside world. And they also have security notifications. So make sure you're subscribed to whatever the security notifications are. And the next thing I want to talk about in terms of risk is a way to identify connections. So this is where you can determine, is this a risky connection, because it's something I don't know. Is this IP address, and is this known to me? Should I be letting this through? So I'm going to talk a little bit about that, and a way to automate the process for reviewing those unknown connections. OK, so there's many types of risks. This is kind of a banking example. Who are the unauthorized users? For example, I may have an authorized user, the application user is allowed to access this database server. I mean, this is what it does. But is Joe supposed to be accessing the private customer data, my privileged user? Probably not. IP addresses. So I know my application server IP. He's supposed to be coming in here, and this is correct, but this is not an IP that is allowed to connect to my system. Programs. So I know my application is using online banking. I know that's an authorized program. Joe's using an export utility or something. He's not supposed to be using that here. So these are all the various risks and connections that can occur. So this is what we're going to call an unauthorized connection. We have user, IP, program name, et cetera. So all these together, you can sort of create this tuple. So if you have a tuple of this, this, this, and this, source IP, client IP, username, you can actually start understanding. If you know what your authorized connections are, you can put those into what we call a connection profiling list. Anything that is not a known connection. So here, for example, these are all connections that I don't know about. I can actually add those to a report and automatically forward those for review to the appropriate reviewers. So this is a way to kind of get a handle on who's connecting into your database. And the fact that we can automate that process for review and then add them back into the group. So the next day, I can update my security policy. These guys are going to be known at that point if I've approved them. So I approve them. They get added to the group. Everything's cool. The next day, I never have to think about it. I don't have to worry about it again. But if there's another unknown connection, I will be alerted. OK, step three, restrict access. So the role methodology, one of the things you want to think about is who can access what data and what level of authority do they need to have? And one of the things you're going to learn with Guardian is that if you have this plan in place, so and so can access information, so and so can access that information, you can actually automate that will actually feed into how you build your security policies. So anytime I have a privileged user, that person cannot access this collection using this type of command. So those are the kinds of things that you can build into a security policy. So evaluate the current entitlements and then monitor new ones going forward. So we're going to show you a little later how you can actually see whenever anybody adds a new user to your MongoDB system, you can get a report and that you'll see that every as often as you like. And you can make sure that nobody's adding inappropriate users with inappropriate levels of authority or control. And if you're really desperate, you also have the ability to block. So you can have a security policy that says if I have some unknown connection accessing, maybe you have some highly sensitive data, you don't know where these people are coming from. You can actually cut off the connection before the data is actually returned to those users. These are all capabilities that you can use to help restrict access. And with that, I'm going to turn it over to Matt, who's going to talk a little bit more about authentication. Perfect. Yeah, just so you are aware of what's specifically in MongoDB. And I think most people understand that in NoSQL a lot of the security features are less than relational, because it's a 30-year-old technology, almost all the features you can imagine have been built into it. So I'm going to review what's in MongoDB. And it's definitely simpler. So your two options really currently in version 2.4.x are basic username, password. There's a hash stored in the database, so at least not a password stored in the database. But then certainly in our enterprise version, most large enterprises want to centrally manage their passwords. So instead of the passwords being in Mongo, we can integrate with a key distribution center within Kerberos over Sassel. And so most importantly, enterprises can still manage their username and passwords centrally in their permissioning system. You can enforce whatever special characters or three tries for a password locks the person out. So that has been a nice win for enterprises to centrally manage things. And in 2.6, which is currently planned in Q4, we'll have some LDAP integration as well for authentication. And in terms of authorization, as Kathy alluded to, there are roles that are fairly coarse. I'll go through them in a second. There's not like a super-user in MongoDB. It's really a segregation of rules, which is kind of a better practice. I think that you don't have a root access to everything. In the database, you add roles together if you want to build that to a certain user. And I'll discuss that. And then custom roles are planned also for later in the year. And so you know what the roles are currently available. And it's pretty coarse. So we don't allow authorization or we don't enable authorization on a document or record level. It's really at a database level. So per database, you can give a user or an application access to read only or read right. And then you have database administrator and a user administrator. So DBA can't read and write. So that's where the segregation of rules comes in. And that's why you can combine these roles together. And then if you want to, so this is just for within that database, if you want to deal with things at the node or the server level, like adding to a replica set or sharding and partitioning across nodes, you set that permission in the admin database. You can also do roles that go across all database, like giving a user access to DBA for all the databases in that server or a user admin for all databases on that server. And same with read and read right. So how that looks for a larger enterprise. It's simpler for smaller, right? There might be two people that manage that. But in enterprise, you would give, let's say, in a central permissioning group, you'd give those people user admin any database permissions. But that's all they can do. So again, you're only given the access they need. You might give one DBA access to be able to configure the cluster. And maybe he can create indexes and collections and things on each individual database. But you can also delegate access for each application. You might have one database within a larger server. That's the kind of nomenclature in Mongo. You can have a server with multiple databases in the collections or tables are within each database. So for each application, you might give it its own database with read-write access from the application in production. Developer users only in Dev and Test. You might give them read-write. And then for a DBA just for that application, you might give them DBA admin access just for that database. So just to kind of put it in context and give an example of fairly coarse authorization in Mongo. And to that end, if you want more single record authorization, that really is built into a data access layer. It was kind of up to the customer or the user to build a data access layer. I would not have multiple applications go to MongoDB directly if you needed to put a lot of authorization in there. I'd recommend a well-encapsulated data access layer. And this is kind of a common paradigm with Mongo in general, in fact. Put an API in front. So the applications all go through one common access layer. And there, you can have governance and really processes around giving access. You probably don't want to expose all of those applications straight to the database, OK? Go ahead. Right now, yeah. We are aiming, I think in 2.6, oh, sorry. The question was, do we give authorization at a collection level? I believe in 2.6 towards the end of the year we are. But yeah, not right now. It's just at the database level. OK, hand the back over. OK, so here's what I was leading to earlier. Here's an example of what you can, maybe a report you might want to automate. It's kind of nice. So who created the user? In this case, for example, we see Sundari as the user name. And so she created these users. And also, if you get a more detailed message, you can actually see the roles that they were assigned that you was just referring to. And again, that can be completely automated using the scheduling, what we call an audit process builder. You schedule this process to run periodically. You can add who the reviewers are. Do they get a PDF, CSV, or whatever, or a direct link to the report? So these people can review, and what their ask is, do they have to review? Do they have to sign, whatever? And here is the actual report that you can run. And so here's just an example. I set up this audit process all by myself, which is, of course, a miracle. Anyway, user privileges on the here. I just got in my Gmail account that this, here's my report. And I can just click on it. And then I can go in and review it and approve it or whatnot. So this is a good way to kind of keep track of what's happening in the system in terms of automating any of these reports, not just roles. So obviously, encrypting data is sort of a basic, fundamental thing. We all know what, most of us know what encryption is. What we're going to talk a little bit about here is encrypting data at rest. And we're using the file level encryption for that. And on the wire, so MongoDB does now support SSL between the client and the servers, and they actually enter server as well. Here is one of our offerings. I know that they had mentioned that they partner with Gazang for encryption. This is the Guardium offering as well. And it's a file system agent, again, a very similar architecture and a way to our Guardium dam, the Data Activity Monitoring. And it's got a lightweight agent that's sitting on the server. We have a separate hardened kind of a console here where the key management and security policies take place. And this is really completely heterogeneous. I mean, you can use it on any databases and applications, et cetera, because it's sitting at this file system level. So anyway, this is a recommended process. It guards against root access, the shape falling off the truck, et cetera. And it's very low overhead on the data servers as well. OK, so a little bit about Data Activity Monitoring, which is maybe the core of what I came here to talk about. So this is actually a step that is required in many of the standards. So you need to have a detailed, verifiable audit trail of the database activities. So some examples of what things most people need to monitor at some point are user activities, mostly specifically privileged users. That's something that a lot of people need to do. You can look at user creation and object creation and manipulation. So you can see who's inserting, deleting, who's dropping collections, who's adding collections, who's changing data. And the thing that I think is of particular interest is we had read that arbitrary server-side JavaScript can cause security risks. So you will actually be able to see the use of those JavaScript in our reports. So that might be a handy little thing to do if you're doing code reviews or anything like that. You're trying to QA a new application. Just make sure that nobody's using this practice, or at least make sure that they're using it safely. Again, gaining visibility into all database activity involving sensitive data. So again, you're not necessarily going to monitor everything. You're going to really want to focus on the high priority sensitive data. So who, and what, and when, and how they're doing it. So you're going to get all this activity. And in real time, you can generate real-time alerts for anything that's outside the normal. Any anomalous activity, any unknown connections, anybody who's doing something they're not supposed to do, you can generate real-time alerts. And again, I mentioned briefly that we do have a pretty nice workflow process that you can disseminate. It's at each level of review in the process can be signed off on, and that's part of the audit trail. So you're able to keep that, store it. Once you get the report and all the audit trail and all the signatures, and it's saying that everybody's reviewed it and done what they need to do, you can offload that report and keep it for as long as required by your organization for meeting your compliance requirements. Here's just a couple examples just to see the kind of visibility that you get. So here in MongoDB is a simple credit card find. And so you can again see here we have the time, we have the client IP, the server IP, the username of who issued it. This is called SQL, but these reports are completely modifiable, so you could call that detailed message or whatever you want to call it if you don't like that. And the object, and basically what this is saying is the fact that we actually parsed this message out, the collection is the object name and the verb is find, which is the command, so you can actually do things like create security policies that are built around whatever particular object or action you should do if I get a find or if I get a delete or whatever. So this is all kind of based on what is actually flowing across the wire using the MongoDB wire protocol. Here's an example of an update. Here's something that we just added. This is basically a kind of an ad hoc quick search type capability, so you can see over on the left we have the histogram or the faceted search, so I can say my DB type is Mongo, I want to see who dropped something. So somebody calls me up and says somebody just dropped one of our collections, what the heck's going on? So you can get in here and say, OK, select the DB type mango, select the verb drop, and then I can see the filter result, and I can see here that somebody dropped my credit card object and when they did it and who did it. So all that would be in the result of my search. OK, so that's like the super high level of the activity monitoring. Again, I didn't want to make this a pitch about activity monitoring, so anybody wants more details on that, I'll be happy to talk to you more about it. And again, I mentioned the ability to send alerts. So we have the ability that all those alert messages, like privileged user access and credit card data, that alert, you don't have to sit there and monitor the guardian incident management thing. You don't even have to have it sent to your own email. If you want to, you can have it sent to your enterprise monitoring system that you have. And so we support basically four formats that can go to any of these in any enterprise monitoring system or SIM system that supports certain message formats we can send that to. And the nice thing about that is you get this sort of enterprise view. So any of the alerts for any of the databases that we support, NoSQL, Data Warehouses, Hadoop, any alert from any of those, because our security policies, all of this is cross database. It's not just Mongo. It's not just whatever. So all of this is cross database. You get an alert from any of those systems that gets generated from here, and it gets sent to your centralized management system. And also the nice thing about that is something like Q Radar with the security intelligence, is they can correlate it with other things that are happening in the system, like port attacks or anything like that and see if there's something bigger going on. And one of the things about breaches is most people don't know they're happening, but they'll go months before it's even discovered. And when it is discovered, it's usually discovered by somebody, a customer, or somebody that you really don't want to have to discover it. So it's nice to have this sort of heads up before things get out of control. And just a brief thing about the scalability. You're dealing with a scalable, Mongo is a scalable platform. Garium is also scalable. So this is just an individual collector's, so you may have a certain number of nodes, or that can actually send data to one collector, but you can use an aggregator to kind of federate the view across all those different collectors. And again, many of our customers are using Garium across international boundaries as well. And I wanted to mention that, as I did mention that earlier, is that you want to be able to use common security policies across. You want to be business driven. You don't want to have to be this database versus that database versus that database. Just security policies are usually driven by the security organization and what your business requires. And so we do have a common platform that works across many, many different systems. No SQL, many of the Hadoop distributions, traditional warehouses. Here's a little bit of that, the international kind of look at MongoDB, which we're really proud of doing and working with the Tengen guys on this. It's been really a great experience. And so that's that. So I wanted to mention that, the three of us, as I mentioned, we did a pretty detailed article on how to set all this stuff up and some use cases, how to do alerts, how to do the search. So that's on DeveloperWorks. We did a webcast together, I think it was in May, end of May, and that's on there as well. And an ebook, and I oxymoronically have here, several copies of the ebook, so it's on paper. If anybody's really interested in taking a copy, I only have a few. But for those of you who are really interested, you're welcome to it. It's about considerations for SQL security. We do support more than just Mongo. And I hope you all will think about staying. We do have a couple extra minutes. Sundari is going to do a demo. And how are you going to do that going to come here? So we can take questions while she's getting set up. Questions for me or Matt? Matt?