 and I'll be talking about LDAP today. And it meant that I gave it a slightly inflammatory subtitle, but I have honest intentions. So to give you a brief overview, I'm going to give a brief introduction to the NoSQL idea and why I think LDAP fits it. And I'll give another introduction to LDAP for those who maybe haven't used any anger before. And I'll show a bit of each of the four ways to access LDAP from Ruby and show off one we're using called TRIQL, how we're using it at day job. So why LDAP? Well, maybe I need to back up a little bit. Maybe I should first ask, why NoSQL? Well, what is NoSQL apart from not using MySQL or Postgres or whatever? The definition of the term MySQL or NoSQL is still the subject of some debate, but I'm using the definition from NoSQL database which has a fairly comprehensive list of the NoSQL technologies by category. And there are apparently some of the features. Some features are pretty much in debate right now, but these seem to be the four that everyone can mostly agree on. I'm not really sure what the last one has to do with NoSQL, but it's a good thing, so I'm going to end the list. Some of the other features they list are, they claim a little bit more controversial, but I find them pretty interesting too. So I think those features are pretty generic, and the NoSQL technologies that they list on their site cover a wide gamut of uses and problem domains, but I think that LDAP fits what they call a wide column store. It's useful for storing structured data that doesn't perform to tables and relational algebra easily. And I think LDAP fits that because the other ones that they list, like HBase, Cassandra, HyperTable, all have this concept of common families and they store, or some of them store data in hierarchy, which LDAP does as well. So I'll give a brief introduction to LDAP, and I'll try to keep it brief for those of you who may have used it. First a bit of history. LDAP originated in the telecommunications industry, which is probably why it's still considered by a lot of people as a glorified phone directory or address book. It was started out in 1998, the International Telecommunication Union, named the COD United Nations, approved a networking standard called X500. It included a protocol called DAP for Directory Access Protocol, for accessing directory services which sat on top of the OSI protocol stack. LDAP was invented as an alternative that used TCP IP instead of OSI. OSI was incredibly difficult to implement. It wasn't around then, but so I've heard. LDAP is a pretty simple protocol, has 11 operations. The operations are generally independent of one another. They're processed as atomic actions and each leads the directory in a consistent state. Most operations don't need to wait for a response, so it plays really nicely if you wanna use it asynchronously and responses can arrive in any order. So when you're writing a client, you have to be aware of that. The directory is structured as a set of entities that are organized into a hierarchy. Superior subordinate relationships between entries are used to express a relationship between them. In the RFC, an entry is described as information about an object which is identifiable or can be named. It's composed of a set of attributes about the subject and is identified by a distinguished name or a DN. The distinguished name is made up of one or more relative distinguished names or Ardians that describes the path from the entry back up to the root of the hierarchy. Each Ardian is unique amongst its siblings as it must be distinguishable from them. This means that an entry's parent can be found by removing the first Ardian of its DN. So I said before the entries are made up of one or more attributes and this is a turn down dump of my user record in LDAP at my day job that I'll use as an example. So an attribute is an attribute description which consists of a type and optional options and one or more associated values. They're basically key value pairs with a few additions. The keys aren't free form values that are defined in the directory schema that are referred to as the attribute description and determine the type of values it can hold, what valuations apply to it and how it can be searched for and not dissimilar to a column in SQL table. Values are one or more binary chunks of data that conform to the type dictated by the description. I mean, you hear for obvious reasons but you can, for example, store an employee photo of voicemail and reading or other binary data in the same entry. Which attributes can be contained by an entry is dictated by its object classes. An entry's object classes are itself, are themselves attributes and the only mandatory one for every entry. It's required by the top object class which is the abstract based object class of all entries. Object classes like attribute types are defined in the directory schema similar to Big Table or Cassandra's column families. The entry's object classes govern the attributes that are required and allowed to be contained in an entry as well as where it can live and a few other things. The RFC describes it as an identified family of objects or conceivable objects that share certain characteristics. There are three kinds of them. They're abstract, which are pretty much the same thing as abstract classes in OOO. They can be added only through the inheritance. Structure object classes are somewhat the same as concrete classes. There can be only one per entry and auxiliary object classes are mixed in as a build up and be added to any entry. Here's my user record from before but sorted into what the object class the attributes belong to. An attribute can and often does belong to more than one object class. There are a bunch of other cool things that you can do with LDAP that I won't cover here today like introspection into the directory schema, referral attribute type options and other stuff like that. If you wanna geek out about LDAP afterwards, let me know. Still learning but I've been using it for about four years and I still find it pretty exciting. Now, how can you access LDAP from Ruby? There are a few ways to do it and here are four. I'll cover two low level libraries and then two high level libraries that use one or both of the low level ones to provide a more convenient abstraction. First up is Ruby LDAP. It's a C extension, spent around since about 2000 and it's based on the API described in RFC 1823 which basically describes how, if you're writing an application that uses LDAP, how do you know the things you might wanna do and how to do them in a consistent way. It's the most complete LDAP library I know of but it's missing support for some of the lesser used operations but in daily use you'll probably never run across that. So here's what accessing my user record looks like. It connects using TLS, sets a few search parameters and then executes the search and prints out the results. The base that it's setting there is the DN of the highest part of the hierarchy you wanna search from. And the scope says how far down the hierarchy you go. In this case, as it says, LDAP scope subtree, we're gonna search the whole subtree. And then that filter is how you form a query in LDAP which looks a lot like what it has. Prefix notation and everything's in parentheses. Another alternative is net LDAP and it's been around since 2003 and it's excellent for most purposes but it's missing a few things that Ruby LDAP provides but again in normal everyday use probably won't even notice it. Another advantage that it has is it's more portable than Ruby LDAP because it's pure Ruby instead of being a C extension. But here's what using it looks like. It's basically the same thing but using a hash for searching instead of positional parameters which would be nice, I think. And it provides an abstraction for building filters out of just key value types like that so you don't have to form the list-pitched-looking filters yourself. So that's the two low-level libraries and now onto the high-level abstractions. First one is Active LDAP and Active LDAP's authors describe it as an object-oriented interface to LDAP and it uses some of the active record idioms to present a similar interface to your LDAP data. Especially good when your domain classes are mixed between an RTBMS and LDAP but it is a bit more complex, a bit more difficult to use for the more complex LDAP directories. If you're already using Active Record and your directory is fairly small and continuous, Active LDAP is easier to get started with. I used it for about three years along with Active Record at my day job. It ran into trouble when I tried the access data that was scattered across the directory or sibling entries of different object classes. I was two years ago though so I'd still recommend giving it a try if you like Active Record. Here's what it looks like doing the same task as before but it defines an account class to pull the records out instead of using a raw search. It uses that LDAP mapping attribute or declarative I guess to set up the prefix that all the classes will start their search ahead and the DN attribute is the attribute that it will use when you do a find by ID. It of course provides accessor for attributes just like Active Record, as you can see there. It's fetching the CN and the entry just like before. Tree pool is another high level of LDAP abstraction which by myself and a few of my coworkers wanted to make basic LDAP interaction easy without compromising its natural flexibility. It's based on similar ideas from SQL which we use to replace Active Record in our web applications. Since Active LDAP depends on Active Record and we're running into limitations with it anyway we decided to implement something similar for LDAP. So this is using Tree Pool to do the same task as the previous examples fetching my user record using only the basic interface. So it connects to an LDAP server by URL or hash arguments but it also knows how to read the system LDAP config and use that if you want to do that. So the way Tree Pool works is it maps method calls to RDNs so this OU method is returning what we call a branch set that's based at OU people. The branch set is a concept borrowed from SQL's data sets. It's a search and suspension that you can execute or further modify at will. So we add a filter to the branch set that looks for entries with UID set and Granger. And then fetch and extract the CN attribute from each result. And Tree Pool also comes with an optional or unlike layer that's based on the idea of using mixins to enhance entry objects based on their object class. And it's loosely based on SQL model. So this is the same task but with a mixin defined for entries that have the like account and person object classes and that live under the OU people entry. Now any entry that you fetch from LDAP that matches its criteria by the configured model class will be extended with this mixing. So you don't even have to know that the ACME account mixin is around. Just will be automatically added to anything that you fetch from LDAP that has the necessary criteria. The mixin also sets up a search method that is based on its object classes and base. And then you can append further filters on it just as you would with the raw branch set like SQL does with SQL model. Then executes the search but extends each result with the mixin before it's returned or yielded. Which adds any methods or functionality specific to matching entries. Now having any additional methods in that module but you can imagine that you might want to add like if you have a JPEG photo you might want to return that image magic object or something and do things with it. You can add that to based on the attributes that the object classes in that mixin had. So the base or in class SQL model supplies the generic attribute accessors and the like. So now we'll go over a little bit of how we use LDAP out in the day job. I work for Leica, which is a stop motion and computer animation company in Portland, Oregon. And we make feature films like Coraline. And we also make TV ads for a bunch of different people. So my day job is uniquely challenging I think or at least it's unique in my experience. Because animators are a bit like programmers. They go wherever there's cool stuff to work on. It's not uncommon for us to bring on 400 or more people during the ramp up for a feature film and then have them leave after the film is over for another couple of years and then come back if there's something new or working on that they're interested in. They have some specialty that they need. Also animation and photography technology is continually progressing. They're always coming out with new, new hardware, HD video, 3D cameras, all kinds of new stuff that's coming out. So we need to stay competitive and give our artists systems that will run the latest and greatest animation tools. So we're constantly upgrading and moving people's machines around. We also replace network storage and the render farm up every five years or so which requires lots of reconfiguration, updating of both workstations and servers. Despite those challenges, our IT department is pretty small. We have eight IT people, including the director, with four users to support people and four infrastructure people. Responsible for about 2,500 individual hosts in four physical locations, running four different OSs anywhere from 200 to 600 users at any given time, all of whom fluctuate rapidly over the years. So we'd pretty much be dead without automation of some sort. And key to that is to unify the place to store all the organic data you need to track about people in the computer systems that they use. So here's a brief idea of how all that is set up. We're running two free BSD servers with open LDOT per physical location with one primary and one for failover. We have a single master that's in downtown Portland and all the other servers are replicating from it. We have Ruby installed on nearly every machine and a suite of tools written in Ruby that's installable by a private gem server. Just a screenshot from the list of gems that we have currently. Of course, the first place to start when you're using LDAP is the company directory. Provides a unified logon and workstations, servers, and web applications. It gives you posits groups, net groups for trading groups of hosts as a single unit, and all the other NIS-ish stuff. It also drives our web-based company directory, which is currently a mod-pro-mesin component, but it's slated for replacement with a newer Ruby-based application real soon now. We use a command line tool written in Ruby to manage all user data. This is the bit we used to find an unused phone extension for new users. The relevant LDAP bits are here. This is using Treehole again. It looks for entries that are either user accounts or resource accounts like conference rooms. So with that, first filter clause is given. Then it eliminates ones that don't have a phone. So the second filter clause says that the phone extension attribute must be present. And then it eliminates ones that aren't in the requested location. So we have the L attribute on each of the phone has the physical location of the phone, just currently just the four physical locations that we're in. And then it selects just the phone extension attribute from all of the results. And then turns them all into an array, flattens and compacts them. Each location has one or more extension ranges, which are defined as Ruby ranges and are currently hard-coded, but we'll eventually fetch those from the LAP too. So that's what the top is doing is it's looking up that range object for whatever physical location you're in. And then it collects over the ranges down below and turns all of the extensions that are in that range into four-digit extension numbers. And then uses ArrayMath to find the available ones by subtracting the used ones that they found in LAP from the valid ones that get extracted from the range. So speaking of phones, we run an asterisk voice over our P system with one box per physical location, all truncated together with ekes. And all the phones live as physical asset records in LAP and assigning a phone to a person automatically associates their phone number with that phone. The dial plan and several other asterisks config files are generated from LAP. And we started this before I'd heard of adhesion, but adhesion is definitely on our list of things to check out in the future. But in the meantime, this is a greatly stripped down version of the bit that generates the extensions config. The LDAP bits are here, fetches an array of all the phones that are in the locations being requested to have an owner. And the reason it checked for an owner is when someone leaves, we just remove the owner association from the phone record which automatically disables their extension. And that way, if they come back, we just re-associate them with any other phone. And if their extension hasn't already been reclaimed, they'll have the same exact extension was willing everything that they've never left. So this, then it builds a house or phone entries by their owner's extension, dropping anyone that doesn't have an extension. And the phone owner attribute here is a DN. And TREQUEL is actually smart about DNs that'll actually do a second lookup if it notices that you're fetching something that's at the end and give you another TREQUEL object back. And this obviously is doing one LDAP query per phone record which could probably be optimized a bit better but it works as is. Then the extensions are pending to the config that's returned and written out to the asterisk config directory and then tell asterisk to reload everything. This happens currently by a cron but we're working on a way to using synchronized replication in LDAP to react to people making related changes in LDAP to automatically rebuild asterisk on demand. So another thing we saw in LDAP is all the computer information. Store all of our hosts in LDAP which in turn is used to generate DNS and DHCP. And it does that on a location basis. So they're running an individual DNS and DHCP server for every physical location. And the DHCP information includes netboot information so we can actually provision and set up a machine just by dropping down to the network, tuning Symbits in LDAP and netbooting it and auto-image to whatever role we tell it it should be. We have a command line tool for doing this as well. We're also midway through converting our physical asset inventory system from one backed by a Postgres database into one backed by LDAP. And this will let us store purchasing information, physical location, and what software is being run as additional attributes or child entries right alongside the host information. It'll use DHCP events again, published over in Wikipedia also set host last known attribute location or last known location attribute. We'll make our physical inventory so much quicker. So every time someone plugs a machine into a network it'll modify that record in LDAP and it'll say what location last was plugged in. That way if you ever need to go find the host you can go have a creative idea of where it is. So this is the code that generates the DHCP entries. Really the only complicated bit is the function that turns a DN into an FQDN. So the DN zero number is the distinguished name of the entry for the host and minus the OU part of it it's actually, you can directly transform that into the TCP-IPF QDN. So it just catnates the DC attributes of the CN1 throws away the OU and that's the host name. So everything else is pretty much screened catnations and various attributes. ISC's DHCPD does support LDAP through its own schema as well but because of the way they structure it it means that you really can't use the DHCP data for anything else like you can't attach DHCP information to already existing host records. So we gave up on that and implemented our own. So for DNS we use Dan Bernstein's tiny DNS and it's compiled from a fairly simple text file. Auto-generating it is also pretty simple. This selects the host that have an assigned IP and generates an entry for each one. Here we filter on any host that must have an IP host member which is what the schema calls the IP address. And then that make tiny DNS entry function basically just rips out all of the attributes necessary to build the DNS and appends them to the string and returns. That config is then written out and it composite to a CDB and you need DNS data plus it flushes the DNS cache statement and it's off and running again. So another thing we use LDAP for is for monitoring. Computer systems fail all the time with this many hosts and such huge dollar figures associated with artist downtime. A monitoring system is pretty critical. To do the actual monitoring we use MON which is a very simple but battle-tested monitoring system in parole with a Ruby and Rake based configuration generation system that pulls net groups out of LDAP. Splits off monitoring zones by network subnet and then writes out the input files with automatically derived dependencies to avoid being crushed by alerts when the core router blips. So this is the code that makes an array of hosts to monitor given a net group. Using net groups makes it easy to add hosts to the monitoring system. So as you can see there, starting at the OU, OU is organizational in units, basically just a generic group class. Have an OU call net groups while the net groups are stored and then it just filters them by the CN or the common name of the net group. And then net groups are stored in NIS net group triples which is a weird looking string that is a leftover artifact of NIS and it's basically a host, user and domain in a common separated string inside the corrents. We pretty much only ever use the host part of it so it just extracts that stuff in the collection and then recurses on itself for member at NIS, member at NIS net group which is how you can embed one net group inside another. So another thing we're working on for monitoring is a system that'll let us store services to be monitored as there's an object class called IP service which is as far as I can tell intended for something like this. So we can store IP service entries under the actual host record and then finding out what should be monitored will be just fetching every IP service, every host that has an IP service from LDAP and then auto-generating it from that. So this is a dump from LDAP using a shell that is included with 3-fold LDAP to you navigate an LDAP directory like a command line basically. So here I'm just catting the CN equals CF is one of the hosts that we have. It's our CF Engine Damian server and then I just cat the CF server to read IP service underneath that and it says what it is, what protocol and what port it's listening on. So just basically monitor that port and that IP and that's it. So to wrap up, if you find yourself needing a non-relational database, I hope you'll consider giving LDAP a try especially if you're already using it for other things. And it can be a little weird with its OIDs and strange syntax, but it's a good, solid, flexible technology that I think has gotten overlooked as being just for address books. Any questions? I think it's open LDAP or something else. Yes, open LDAP. What about, I'm curious to hear from the patient when I showed you, you only told that. I have a lot of pain around the replication engine and trying to keep things in sync. Are you doing any monitoring of that to which I think you're staying in sync? I know that sync, I think that the sync that you were talking about was, I can't remember if that's the replacement technology of that old one. Yeah, that's the newer one. That's the one that actually works really well. Right, I took care of syncing it as I synced in. How was your experience with that? So we wanted to do that by checking, there's a, sorry, I'll repeat the question. He was asking earlier implementations of open LDAP, especially used an older version of replication that basically catted the LDAP from the directory out and then piped that to another process which was reading it on another host. And they since replaced that with something called sync ripple which is an invented way using an LDAP control where the master server will actually notify any child servers and you can actually replicate just a part of the directory or the whole directory and it'll notify it with an event anything, anytime anything changes. And he was asking how we monitored that and how we keep it, make sure that it's consistent. There's actually an attribute in the extended attributes of the base OU that is a CSN. It's the serial number for the last transaction. So we just fetch that from the master server and then build it all the slaves and make sure their CSNs are all the same. And if one of them drifts wait for five seconds and check it again, if it's drifted then we alert. Do object classes do they kind of work like in sync with the traits? Yes. So object classes carry a set of attributes with them. So an entry can be thought of as a set that's composed of all of the entries of all of its object classes. So object classes have must and may attributes. The must ones are ones which if you include an attribute in the entry, all of the attributes in this must have at least one value. And then the may ones are just optional ones that you can use by virtue of including that object class. So the way tree cool works is it actually creates a mix in for an object class. So it's making Ruby treat object classes pretty much the same way all of that does. Anybody else? I kind of, I guess I was talking much slower in my hotel room, so have a bunch of time. Yeah. I'm sorry to say that again. What are the largest volumes of data you store in the lab? Like as far as data size? To be honest, I'm not sure. Let me check real quick. I have one learning on my local machines. I assume that you're asking because one of the NoSQL virtues is that you can store incredibly large amounts of data. Yeah, I have not investigated its suitability for those massively, I do know that LDAP is used to store incredibly large directories, much larger than ours, but as to specific numbers, I don't know. I would guess ours is maybe three gig or so. Probably nothing to write home about, but I guess my point is that if you're learning LDAP in a lot of places already are, and you want to use something that stores structured data and you don't necessarily need the huge, massive scalability, then LDAP makes a pretty good fit. What database are you using to store your LDAP in? Is it BDB? It's a variant of BDB called HDB, the hierarchical database. They actually just switched to that. OpenLDAP had a bunch of problems with corruption when it was using straight BDB, but we've had no problems at all since we switched to HDB. Are there alternatives to that that people use, or do they stick to different alternatives? So OpenLDAP does what he's asking his other alternatives to storing your data in a BDB. There are a bunch of different backends. One of them is an SQL backend, there's a shell backend. There's a bunch of different things that you can use to map your update into something else. And we've not used any of those. We experimented with the SQL one, which was pretty much disastrous, so we abandoned that. Anything else? Yeah? Do you have migrations? Do you change your mind about what's going to be an schema and migrate things across? Do you have a strategy for that? We do. What he was asking is, do we have migrations if we decide something needs to change in a schema? We don't have migrations in the same sense as Active Record does. There's not really any way to, if you change the schema, up until recently anyway, if you change the schema, it requires server restart. So we had to basically just keep all of the schemas in version control and sync all of the servers to the same revision that we have a tag and then restart all the servers. You have to restart the master first and then while the slaves are down and then restart the slaves, so they sync with the same schema. We just actually switched to open all that to four, which has a dynamic config too. It stores its configuration in all that bit itself. And supposedly that allows you to change schemas on the fly, but we haven't tried that yet. What about the data? Did you change the data across? Do you have a script or? So far, we haven't had any conflicting changes to the schema. We haven't changed the column name or anything. You could, that's an interesting question. I'm not sure how you would manage that, because if you change the attribute name. One of the things that I thought out that's really interesting is a lot of the schemas are really old and fairly stable. We have added our own, but so far we've only done additions. So if you change the name of an attribute or removed one, that would be, I'm not exactly sure how you would do that. So you're scaling the ratings by just adding the slaves, how do you scale rise? Open LDAP does support multi-master, but they recommend still against using it. So LDAP is considered a read many write seldom. So basically just, I don't think there is, well, so you can federate LDAP directories together. You can separate the rights of different parts of the hierarchy on two different servers, and then have them all federated together by referral. So if you go to one part of the directory and try to write to a slave, for example, it will tell you where you need to go to write. So you just split those off into different branches of your hierarchy. A shard and a partition. Right. A lot of universities, for example, use that. They have different departments. The right master is her department, so each department manages their own, their own little writable section of the tree, but then it's all federated into one big day of store at the top.