 Hello everybody and thank you for joining us today for the virtual Vertica BDC 2020. Today's breakout session is entitled End-to-End Security in Vertica. I'm Paige Roberts, Open Source Relations Manager at Vertica. I'll be your host for this session. Joining me is Vertica Software Engineers, Fennec Fox and Chris Morris. Before we begin, I encourage you to submit your questions or comments during the virtual session. You don't have to wait until the end. Just type your question or comment in the question box below the slide as it occurs you and click submit. There will be a Q&A session at the end of the presentation and we'll answer as many questions as we're able to during that time. Any questions that we don't address, do our best to answer offline. Also, you can visit Vertica forums to post your questions there for the session. Our team is planning to join the forums to keep the conversation going so it'll be just like being at a conference and talking to the engineer after the presentation. Also, a reminder that you can maximize your screen by clicking the double arrow button in the lower right corner of the slide. And before you ask, yes, this whole session is being recorded and it will be available to view on demand this week. We'll send you a notification as soon as it's ready. I think we're ready to get started. Over to you, Fennec. Hi, welcome everyone. My name is Fennec. My pronouns are Fe Fer and Chris will be presenting the second half and his pronouns are he, him. So to get started, let's kind of go over what the goals of this presentation are. First off, no deployment is the same. So we can't give you an exact like here's the right way to secure Vertica because how easy it is to set up a deployment is a factor. But the biggest one is what is your threat model? So if you don't know what a threat model is, let's take an example. We're all working from home because of the coronavirus and that introduces certain new risks. Our source code is on our laptops at home, that kind of thing, but really our threat model isn't that people will read our code and copy it like over our shoulders. So we've encrypted our hard disks and that kind of thing to make sure that no one can get in. So basically what we're going to give you are building blocks. And you can pick and choose the pieces that you need to secure your Vertica deployment. We hope this gives you a good foundation for how to secure Vertica. And now what we're going to talk about. So we're going to start off by going over encryption, just how to secure your data from attackers. And then authentication, which is kind of how to log in. Identity, which is who are you. Authorization, which is now that we know who you are, what can you do. Delegation is about how Vertica talks to other systems and then auditing and monitoring. So how do you protect your data in transit? Vertica makes a lot of network connections. Here are the important ones basically. There are clients talk to Vertica cluster. The Vertica cluster talks to itself and it can also talk to other Vertica clusters and it can make connections to a bunch of external services. So first off let's talk about client server TLS. Securing data between this is how you secure data between Vertica and clients. It prevents an attacker from sniffing network traffic and say picking out sensitive data. Clients have a way to configure how strict the authentication is of the server it's called the client SSL mode. And we'll talk about this more in a bit but authentication methods can disable non-TLS connections, which is a pretty cool feature. Vertica also makes a lot of network connections within itself. So if Vertica is running behind a strict firewall, you have really good network both physical and software security, then it's probably not super important that you encrypt all traffic between nodes. But if you're on a public cloud, you can set up AWS to AWS firewall to prevent connections. But if there's a vulnerability in that, then your data is all totally vulnerable. So it's a good idea to set up Internet encryption in less secure situations. Next import-export is a good way to move data between clusters. So for instance say you have an on-premises cluster and you're looking to move to AWS. Import-export is a great way to move your data from your on-prem cluster to AWS. But that means that the data is going over the open Internet. And that is another case where an attacker could try to sniff network traffic and pull out credit card numbers or whatever you have stored in Vertica. That's sensitive. So it's a good idea to secure data in that case. And then we also connect to a lot of external services. Kafka, Hadoop, S3 are three of them. Voltage, secure data, which we'll talk about more in a sec, is another. And because of how each service deals with authentication, how to configure your authentication to them differs. So CR docs. And then I'd like to talk a little bit about where we're going next. Where our main goal at this point is making Vertica easier to use. Our first objective with security was to make sure everything could be secure. So we built relatively low level building blocks. Now that we've done that, we can identify common use cases and automate them. And that's where our attention is going. So we've talked about how to secure your data over the network. But what about when it's on disk? There are several different encryption approaches. Each depends on what your use case is. Rate controllers and disk encryption are mostly for on-prem clusters. And they protect against media theft. They're invisible to Vertica. S3 and GCP are the equivalent in the cloud. They're also invisible to Vertica. And then there's field level encryption, which we accomplish using voltage secure data, which is format-preserving encryption. So how does voltage work? Well, it encrypts values to things that look like the same format. So for instance, you can see date of birth encrypted to something that looks like a date of birth, but is not in fact the same thing. You could do cool stuff like with a credit card number, you can encrypt only the first 12 digits, allowing user to validate the last four. The benefits of format-preserving encryption are that it doesn't increase database size. You don't need to alter your schema or anything. And because of referential integrity, it means that you can do analytics without unencrypting the data. So again, a little diagram of how you could work voltage into your use case. You could even work with Vertica's row and column access policies, which Chris will talk about a bit later, for even more customized access control, depending on your use case and your voltage configuration. We are enhancing our voltage integration in several ways in 10.0. And if you're interested in voltage, you can go see their virtual BBC talk. And then again, talking about roadmap a little, we're working on in database encryption at rest. What this means is kind of a Vertica solution to encryption at rest that doesn't depend on the platform that you're running on. Encryption at rest is hard. Encrypting, say, 10 petabytes of data is a lot of work. And once again, the theme of this talk is everyone has a different key management strategy, a different threat model. So we're working on designing a solution that fits everyone. If you're interested, we'd love to hear from you. Contact us on the Vertica forums. All right. Next up, we're going to talk a little bit about access control. So first off is how do I prove who I am? How do I log in? So Vertica has several authentication methods. Which one is best depends on your deployment size slash use case? Again, theme of this talk is what you should use depends on your use case. You could order authentication methods by priority and origin. So for instance, you can only allow connections from within your internal network. Or you can enforce TLS on connections from external networks, but relax that for connections from your internal network, that kind of thing. So we have a bunch of built-in authentication methods. They're all password-based. User profiles allow you to set complexity requirements of passwords. And you can even reject non-TLS connections, say, or reject certain kinds of connections. It should only be used by small deployments because you probably have an LDAP server where you manage users if you're a larger deployment. And rather than duplicating passwords and users all in LDAP, you should use LDAP off, where Vertica still has to keep track of users, but each user can then use LDAP authentication. So Vertica doesn't store the password at all. The client gives Vertica a username and password, and Vertica then asks the LDAP server, is this a correct username and password? And the benefits of this are, well, manyfold. But if, say, you delete a user from LDAP, you don't need to remember to also delete their Vertica credentials. You can just, they won't be able to log in anymore because they're not in LDAP anymore. If you like LDAP but you want something a little bit more secure, Kerberos is a good idea. So similar to LDAP, Vertica doesn't keep track of who's allowed to log in. It just keeps track of the Kerberos credentials and it even, Vertica never touches the user's password. Users log in to Kerberos and then they pass Vertica a ticket that says I can log in. It is more complex to set up. So if you're just getting started with security, LDAP is probably a better option, but Kerberos is, again, a little bit more secure. If you're looking for something that works well for applications, certificate off is probably what you want. Rather than hard coding a password or storing a password in a script that you use to run an application, you can instead use a certificate. So the, if you ever need to change it, you can just replace the certificate on disk and the next time the application starts, it just picks that up and logs in. Yeah, and then multi factor off is a feature request we've gotten in the past and it's not built into Vertica but you can do it using Kerberos. So security is a whole application concern and fitting MFA into your workflow is all about fitting it in at the right layer and we believe that that layer is above Vertica. If you're interested in more about how MFA works and how to set it up, we wrote a blog on how to do it. And now over to Chris for more on identity and authorization. Thanks, Ben. Hi everyone. I'm Chris. So we're a Vertica user and we've connected to Vertica but once we're in the database, who are we? What are we? So in Vertica, the answer to that question is principles, users and roles which are like groups and other systems. Since roles can be enabled and disabled at will and multiple roles can be active, they're a flexible way to use only the privileges you need in the moment. For example here, you've got Alice who has DBM in as a role and those are some elevated privileges. She probably doesn't want them active all the time so she can set the role and add them to our identity set. All this information is stored in the catalog which basically Vertica's metadata storage. How do we manage these principles? Well, it depends on your use case, right? So if you're a small organization or maybe only some people or services need Vertica access, the solution is just to manage it with Vertica. You can see some commands here that will let you do that. But what if we're a big organization and we want Vertica to reflect what's in our centralized user management system? Sort of a similar motivating use case for LDAP authentication, right? We want to avoid duplication hassles, we just want to centralize our management. In that case, we can use Vertica's LDAP link feature. So with LDAP link, principles are mirrored from LDAP. They're synced in a configurable fashion from the LDAP into Vertica's catalog. What this does is it manages creating and dropping users and roles for you and then mapping the users to the roles. Once that's done, you can do any Vertica specific configuration on the Vertica side. It's important to note that principles created in Vertica this way support multiple forms of authentication, not just LDAP. This is a separate feature from LDAP authentication and if you created a user via LDAP link, you could have them use a different form of authentication, Kerbos for example, up to you. Now of course, this kind of system is pretty mission critical, right? You want to make sure you get the right roles and the right users and the right mappings in Vertica. So you probably want to test it and for that, we've got new and improved tri-run functionality from 9.3.1 and what this feature offers you is new meta functions that let you test various parameters without breaking your real LDAP link configuration. So you can mess around with parameters in the configuration as much as you want and you can be sure that all of that is strictly isolated from the live system. Everything separated and when you use this, you get some really nice output through a data collector table. You can see some example output here. It runs the same logic as the real LDAP link and provides detailed information about what would happen. Check the documentation for specifics. All right, so we've connected to the database, we know who we are, but now what can we do? So for any given action, you want to control who can do that, right? So what's the question you have to ask? Sometimes the question is just, who are you? It's a simple yes or no question. For example, if I want to create a user, the question I have to ask is, am I the super user? If I'm the super user, I can do it. If I'm not, I can't. But sometimes the actions are more complex and the question you have to ask is more complex. Does the principle have the required privileges? If you're familiar with SQL privileges, there are things like select, insert, and Vertica has a few of their own. But the key thing here is that an action can require specific and maybe even multiple privileges on multiple objects. So for example, when selecting from a table, you need usage on the schema and select on the table. And there's some other examples here. So where do these privileges come from? Well, if the action requires a privilege, these are the only places privileges can come from. The first source is implicit privileges, which could come from owning the object or from special roles, which we'll talk about in a sec. Explicit privileges, that's basically a SQL standard grant system. So you can grant privileges to users or roles and optionally, those users and roles could grant them downstream, discretionary access control. So those are explicit and they come from the user and the active roles, so the whole identity set. And then we've got Vertica-specific inherited privileges and those come from the schema and we'll talk about that in a sec as well. So these are the special roles in Vertica. First role DB admin. This isn't the DB admin user, it's a role and it has specific elevated privileges. You can check the documentation for those exact privileges, but it's less than the super user. The pseudo super user can do anything the real super user can do and you can grant this role to whomever. The dbd user is actually a role, can run database designer functions. CIS monitor gives you some elevated auditing permissions and we'll talk about that later as well. And finally public is a role that everyone has all the time, so anything you want to be allowed for everyone attached to public. Imagine this scenario. I've got a really big schema with lots of relations. Those relations might be changing all the time, but for each principle that uses this schema, I want the privileges for all the tables and views there to be roughly the same. Even though the tables and views come and go, for example, an analyst might need full access to all of them no matter how many there are or what there are at any given time. So to manage this, my first approach I could use is remember to run grants every time a new table view is created. And not just you, but everyone using the schema. So not only is it a pain, it's hard to enforce. The second approach is to use schema inherited privileges. So in Vertica, schema grants can include relational privileges. For example, selector insert, which normally don't mean anything for a schema, but they do for a table. If a relations mark is inheriting, then the schema grants to a principal, for example, sales people also apply to the relation. And you can see in the diagram here how the usage applies to the schema and the select technically. But in sales.food table, select also applies. Now, instead of lots of grant statements from multiple object owners, we only have to run one alter schema statement and three grant statements. And from then on, anytime you grant some privileges or revoke privileges on the schema to or from a principal, all your new tables and views will get them automatically. So it's dynamically calculated. Now, of course, setting it up securely is that you want to know what's what's happened here and what's going on. So to monitor the privileges, there are three system tables, which you'll want to look at. The first is grants, which will show you privileges that are active for you. That is your user and active roles and theirs and so on down the chain. Grants will show you the explicit privileges and inherited privileges will show you the inherited ones. And then there's one more inheriting objects, which will show all tables and views, which inherit privileges. So that's useful more for not seeing privileges themselves, but managing inherited privileges in general. And finally, how do you see all privileges from all these sources, right, in one go? You want to see them together? Well, there's a meta function, which was added 9.931. Get privileges description, which will, given an object, it will sum up all the privileges for a current user on that object. I'll refer you to the documentation for usage and supported types. Now, the problem is select. Select lets you see everything or nothing. You can either read the table or you can't. But what if you want some principles to see a subset or a transformed version of the data. So, for example, I have a table with personnel data and different principles, as you can see here, need different access levels to sensitive information, so security numbers. Well, one thing I could do is I can make a view for each principle, but I could also use access policies and access policies can do this without introducing any new objects or dependencies. It centralizes your restriction logic and makes it easier to manage. So, what do access policies do? Well, we've got row and column access policies. Rows will hide, and column access policies will transform data in the row or column, depending on who's doing the selecting. So, it transforms the data, as we saw on the previous slide, to look as requested. Now, if access policies let you see the raw data, you can still modify the data. And the implication of this is that when you're crafting access policies, you should only use them to refine access for principles that need read-only access. That is, if you want a principle to be able to modify it, the access policies you craft should let through the raw data for that principle. So, in our previous example, the loader service should be able to see every row and should be able to see untransformed data in every column. And as long as that's true, then they can continue to load into this table. All of this is, of course, monitorable by a system table, in this case, access policy. Check the docs for more information on how to implement these. All right, that's it for access control. Now, on to delegation and impersonation. So, what's the question here? Well, the question is, who is Vertica? And that might seem like a silly question, but here's what I mean by that. When Vertica is connecting to a downstream service, for example, cloud storage, how should Vertica identify itself? Well, most of the time, we do the permissions check ourselves, and then we connect as Vertica, like in this diagram here. But sometimes we can do better. And instead of connecting as Vertica, we connect with some kind of upstream user identity. And when we do that, we let the service decide who can do what, so Vertica isn't the only line of defense. And in addition to the defense and depth benefit, there are also benefits for auditing, because the external system can see who is really doing something. It's no longer just Vertica showing up in that external services logs, it's somebody like Alice or Bob trying to do something. One system where this comes into play is with voltage secure data. So, let's look at a couple of use cases. The first one, I'm just encrypting for compliance or anti-theft reasons. In this case, I'll just use one global identity to encrypt or decrypt the voltage. But imagine another use case. I want to control which users can decrypt which data. Now I'm using voltage for access control. So in this case, we want to delegate. The solution here is on the voltage side, give voltage users access to appropriate identities. These identities control encryption for sets of data. A voltage user can access multiple identities like groups. Then on the Vertica side, a Vertica user can set their voltage user name and password in a session. And Vertica will talk to voltages at voltage user. So in the diagram here, you can see an example of how this is leveraged so that Alice could decrypt something, but Bob cannot. Another place the delegation paradigm shows up is with storage. So Vertica can store and interact with data on non-local file systems. For example, HDFS or S3. Sometimes Vertica is storing Vertica managed data there. For example, in Eon mode, you might store your projections in communal storage on S3. But sometimes Vertica is interacting with external data. For example, this usually maps to a user storage location in the Vertica side, and it might on the external storage side be something like parquet files on Hadoop. And in that case, it's not really Vertica's data. And we don't want to give Vertica more power than it needs. So let's request the data on behalf of who needs it. Let's say I'm an analyst and I want to copy from or export to parquet using my own bucket. It's not Vertica's bucket, it's my data. But I want Vertica to manipulate data in it. So the first option I have is to give Vertica as a whole access to the bucket. And that's problematic because in that case, Vertica becomes kind of an AWS God. It can see any bucket any Vertica user might want to push or pull data to or from anytime Vertica wants. So it's not good for the principles of least access, zero trust. And we can do better than that. So in the second option, use an ID and secret key pair for an AWS, I am if you're familiar, principle that does have access to the bucket. So I might use my the analyst credentials, or I might use credentials for an AWS role that has even fewer privileges than I do, sort of a restricted subset of my privileges. And then I use that, I set it in Vertica at the session level, and Vertica will use those credentials for the copy export commands. And it gives more isolation. Something that's in the works is support for keyless delegation using assumable IAM roles. So similar benefits to option two here, but also not having to manage keys at the user level. We can do basically the same thing with Hadoop and HDFS with three different methods. So the first option is Kerberos delegation. I think it's the most secure. It definitely if access control is your primary concern here, this will give you the tightest access control. The downside is it requires the most configuration outside of Vertica with Kerberos and HDFS. But with this, you can really determine which Vertica users can talk to which HDFS locations. Then you've got secure impersonation. If you've got a highly trusted Vertica user base, or at least some subset of it is, and you're not worried about them doing things wrong, but you want to know about auditing on the HDFS side, that's your primary concern. You can use this option. This diagram here gives you a visual overview of how that works, but I'll refer you to the docs for details. And then finally, option three, this is a bring your own delegation token. It's similar to what we do with AWS. We set something in the session level, so it's very flexible. The user can do it in an ad hoc basis, but it is manual. So that's the third option. Now on to auditing and monitoring. So of course, we want to know what's happening in our database. It's important in general and important for incident response, of course. So your first stop to answer this question should be system tables. They're a collection of information about events, system state, performance, etc. They're select only tables, but they work in queries as usual. The data is just loaded differently. So there are two types generally. There's the metadata table, which stores persistent information or rather reflects persistent information stored in the catalog, for example, users or Scamata. Then there are monitoring tables, which reflect more transient information like events, system resources. You can see an example of output from the resource pools storage table, which these are actually despite that it looks like system statistics. They're actually configurable parameters for using that. If you're just in resource pools, a way to handle users resource allocation and various principles resource allocation, again, check that out in the docs. Then of course there's the follow-up question, who can see all of this? Well, some system information is sensitive and we should only show it to those who need it. So of course the super user can see everything, but what about non-super users? How do we give access to people that might need additional information about the system without giving them too much power? One option is sysmonitor. As I mentioned before, it's a special role. And this role can always read system tables, but not change things like a super user would be able to. Just reading. And another option is the restrict and release meta functions. Those grant and revoke access to, from a certain system table set to and from the public role. But the downside of those approaches is that they're inflexible. So they only give you an, they're all or nothing for a specific preset of tables and you can't really configure it per table. So if you're willing to do a little more setup, then I'd recommend using your own grants and rolls. System tables support grant and revoke statements just like any regular relations. And in that case, I wouldn't even bother with sysmonitor or the meta functions. So to do this, just grant whatever privileges you see fit to the roles that you create. Then go ahead, grant those roles to the users that you want and revoke access to the system tables of your choice from public. If you need even finer grant access than this, you can create views on top of system tables. For example, you can create a view on top of the user system table, which only shows the current user's information uses a built-in function that you can use as part of the view definition. And then you could actually grant this to public so that each user in Vertica could see their own user's information and never give access to the user system table as a whole, just that view. Now, if you're a super user or if you have direct access to nodes in the cluster, file system, OS, etc., then you have more ways to see events. Vertica supports various methods of logging. You can see a few methods here which are generally outside of running Vertica. You'd interact with them in a different way, with the exception of active events, which is a system table. We've also got the data collector and that sorts events by subject. So what the data collector does, it extends the logging and system table functionality by component is what it's called in the documentation, and it logs these events and information to rotating files. For example, Analyze Statistics is a function that could be abused by users and as a database administrator, you might want to monitor that so you can use the data collector for Analyze Statistics. And the files of these grades can be exported into a monitoring database. One example of that is with the Management Console Extended Monitoring. So check out their virtual BDC talk, the one in the Management Console, and that's it for the key points of security in Vertica. Well, many of these slides could spawn a talk on their own, so we encourage you to check out our blog, check out the documentation, and the forum for further investigation and collaboration. Hopefully the information we've provided today will inform your choices in securing your deployment of Vertica. Thanks for your time today. That concludes our presentation. Now, we're ready for Q&A.