 Good afternoon, Cleveland. As I said, my name is Russell Keith McGee. My day job is a CTO and co-founder of Trades Cloud. We're an Australian software service for tradespeople, plumbers, electricians, carpenters, people like that. But the reason that I'm here today is Django. I've been a core team member for the last 10 years. I'm president of the DSFs since 2011. I served on the technical review board for the 1.7, 1.8 releases and I'm also on the security team. But one of the other roles that I've assumed over the years is as a mentor in the Google Summer of Code. For those who aren't familiar with the program, Google Summer of Code is effectively an internship for college students. Open source projects apply to have students. The students then apply to be a mentor in that particular project. Google very kindly picks up the tab of paying them a stipend in exchange for 12 weeks of contribution to an open source project. The only catch, it needs to be open source and it needs to be done as solo work mentored by an expert from that project. Now this year is actually one of the first years in about six that I haven't been actively mentoring a student. But over the years, the Google Summer of Code has resulted in a number of very large features that have been added to Django. The fact that we've got multiple database support, the ORM aggregation APIs, the systems check framework in 1.7. And in 1.8, a formalized metamodel. Okay, so what is a metamodel and what is metaprogramming? Well, in short, metaprogramming is when you're writing code that can reason about the code that is running. It's especially useful when you're writing generic frameworks because instead of encoding specific situations, you're encoding how to respond to a generic situation. An easy warm-up example. Metaprogramming can be found inside Python itself. Let's say we're going to do some graphics. Okay, so we're going to need a point class to represent the points on our graphics canvas. We're going to do both 2D and 3D representation, so we're going to use some subclassing. Okay, nothing too challenging here. We've got some initialization methods. We take some arguments. We instantiate some objects. We want to be able to output those points in some way. So you write some output methods. You write output point for a two-dimensional point and output point for a three-dimensional point. Okay, nothing too challenging there. And when you run them, you get exactly what you'd expect. But what you really want is rather than having a separate method for each type of point, you really want to have a single method that will take a point of any type and output it appropriately. And that's easy enough to write as well. If you're coming from a language like C++ that doesn't have metaprogramming tools or Java, which has metaprogramming tools that doesn't encourage you to use them that much, you might be inclined to do something like this. You do a class check. Okay, we've got one method called output point. We're going to print the things that we know are common, the X and the Y coordinate. And then if this point that we've been given is an instance of a 3D point, then print the Z coordinate as well. Okay, and that does work. There's nothing wrong with that. But a more pythonic approach is to use metaprogramming. What you do is you can check for a class, if you check for a class that you're only checking for 3D points, that will work. But if you check for properties of the instance, you can adapt to any type as long as it has the right attributes. And so Python provides a tool, provides a built-in method called hasAtra. That is, what you're checking is rather than saying, is this object of this class, you're saying, if this object has the attribute Z, then print it. But what if there's even more attributes? We're going to start modeling Doctor Who now, so we need to start to include time into our XYZ coordinate. So that means we've got to do another hasAtra check, right? Well, no, because again, we can use metaprogramming here. Every Python object has a special attribute named dundadict that contains all the attributes of that instance. So second line there, it says p.dict, that's all the attributes of that instance. We sort those and we iterate over them. For each of them, we're going to print out that attribute, the name of the attribute, and get the value of that attribute off the object p. Now, the thing that's interesting is that that definition will work with any object, not just points. You can point any Python object at that method, and as long as the attribute can be output as a string, that will successfully output a representation of that object. Now, that is admittedly a very much contrived example. You probably wouldn't actually build it that way. You'd actually use the wrapper method or string method on the actual class itself. But it's enough to demonstrate the point. By leveraging metaprogramming, you can do some very powerful things with very little code, and in Python at least, it's fairly easy to understand code. Okay, so what does this metaprogramming mean when we move to Django? Well, it means being able to do some, the same sort of thing we just did with Python objects, but do it with database models. One obvious place this is really useful is model forms. I have a Django model. It has a bunch of fields. I want to be able to display a form to edit an instance of that model. Now, I could define a form instance for each of my models, and then add the fields that I want to edit on those models, or I can use metaprogramming. Using metaprogramming, I can iterate over all the fields on the model, and for each of those fields, add an input that's appropriate for that form. Where's that particularly helpful? In Django's admin. Django's admin allows you to just register a model, and you get an entire admin interface for that model automatically. How's that possible? Because Django's admin uses metaprogramming to introspect the models that form up a Django project, and then generate forms for those objects. Now, of course, you can get a much better, much more customized interface if you spend a whole bunch of time configuring your admin views, but as a first pass administration interface, the metaprogramming approach lets you hit some very big runs very, very easily. Django's admin also requires you to register all the models you want to display on the admin. Now, strictly, you could even avoid this step if you wanted to. Django's app cache is a form of metaprogramming, which allows you to introspect the applications that exist in a project, and then the models that make up that app. So if you were so inclined, it would be possible to write a completely zero configuration admin application for Django. Now, although the Meta API was just one of the benefit for 1.8, that doesn't mean it's a new feature, it's just a new stable feature. Django has had a Meta API since the very beginning. There were some very big changes in the early years, but by the time Django 1.0 was released, the API had pretty much settled down. What did that API look like? Well, if you had a model called myModel, it had an attribute called underscoreMeta, and that object had a bunch of methods on it for introspecting the fields on that model. And there was getField, and getField by name, and get fields with model, and get concrete fields with model, and get all related, many to many objects with model. Looking at that API, you could probably see why we weren't too keen on making it stable. It worked, but it wasn't pretty. Now, what you can't see from just that description are the warts. Some of those methods include fields from the parent. Some don't. Can you tell which ones? As a result, the Meta API was an unofficially stable API. It was an API that formerly wasn't stable, and officially, we reserved the right to change it. But in practice, we knew that enough people would be relying on it, that we wouldn't change it unless there was a really good reason. And the best good reason there is to formalize it, so that we're actually publishing it and making it official stable API. Last year, as part of the Google Summer of Code, Daniel Pirathan took the job of clearing out all that craft that had accumulated in the API, so that we could formally document and publish Jango's Meta API. This is Daniel, graduating from university last year. As you can tell, he's a very serious, very somber, quietly spoken individual. But what was the result of Daniel's work? Well, a new stable Meta API. And it looks a little something like this. The Meta object is unchanged. You still say, here's my model, mymodel.underscoremeta. But now there's just two methods on there. Get field to return a field with a specific name, and get fields to return a list of all fields on that model. You can optionally include the fields from parent models and hidden fields if you're trying to learn something a little bit about your inheritance tree. So the parent fields are the ones coming from your superclasses, and hidden fields are fields that back another field's functionality. For example, the underscore ID field on a foreign key, or fields that have been explicitly hidden, like a foreign key with a related name starting with plus. What you get on top of those two calls is either one or zero to many field objects, which have attributes that then tell you the properties of that field. So is it in a relationship with another model? What is the cardinality of that relationship? What is the other model that it is related to? Was it automatically generated, or is it explicitly defined in a model file? Does it have a direct manifestation as a database column, so is it concrete? Internally, this meant a whole bunch of code churns, so we could replace get all related many to many objects with model with a much, much simplified call. And that much simplified call is effectively just a list comprehension over all fields filtering out the ones that don't have the field properties we want, or the ones that do have the field properties we want. It also meant we gained some new functionality. Previously, generic keys weren't represented in the meta model. They couldn't be, because they were in contrary, but we'd have to make a special case of that particular class to get them into Core's meta representation. Now, we've removed the need for that special case. Any third-party field, including things in Contrib, like all the GIS fields and all the generic key fields, any third-party field can get the same capabilities as a Core Django field and be included in that meta data representation. OK, so why is this a useful thing? Well, to be completely honest, for most day-to-day build-a-blog use cases, it probably isn't. But it is extremely useful when you start looking at larger projects, in particular, anything that starts approaching framework level stuff. A stable meta API means you can now write functions that take a model rather than a model instance as an argument. This means you can write tools that produce high-level functionality that respond to the characteristics of a model and generate functionality based on those characteristics. So here's a practical example where a meta API can be very, very helpful. My commercial day-to-day Django application allows tradespeople, plumbers, electricians, to keep track of the work that they have to complete. Part of that means keeping track of customers. So I've got a customer model. Users can create new customers. And if you raise a piece of work, it's linked to that customer. When you issue an invoice, it's linked to the customer. When you record a payment, it's linked to the customer. So if you ever want to see the full history of a customer with your company, you can easily retrieve all the work, all the invoices, all the payments that relate to that customer. However, this data is being entered by humans. So sometimes you end up with two records for Mr. Smith. What you need is a way to be able to merge two records together. Now, this is easy enough to do. All you need to do is nominate which is the record you're going to retain and which is going to be deleted. And then you update all the foreign key and many, many references to the duplicate instance to point at the original instance instead. Hey, that's easy. So what does this look like? OK, well, we've got some models. We've got a customer model with a name field and a bunch of other data. We have a work order model that has a reference, a foreign key to the work order, and the order ID and a bunch of other interesting details. We also have an invoice and a payment model, both also owned by that customer. And so we can easily define a merge operation for those customers. An existing customer could have work orders. It could have invoices. It could have payments. So given a duplicate customer, we need to find all those related objects, update the foreign key references, and then delete the duplicate. Magic. So now you can merge customers to your heart's content. OK, so time goes by. You decide to add some new functionality to your system, say the ability to raise quotes. So you define a quote model and some views. You roll out that code. And then someone merges a customer record and all quotes related to the dead instance disappear. What happened? Well, the merge mechanism doesn't account for quotes. You updated all the work orders. You updated all the invoices and payments. But the quotes weren't updated. So when you deleted the duplicate customer, there were still quotes related to that customer. So all the duplicate customer's quotes were also deleted. Oops. So how do you fix that? Well, the first obvious solution is to update your merge mechanism, add the extra line you need for quotes. And that will work fine until you then add the appointments model to keep track of each appointment you have with your customer, and so on and so on. What you really need is something that will adapt to any new model as you add it. And how do you do that? You met a program. Rather than encoding the models that need to be updated, you use the meta model to discover all the related models and then update them. That way, when a new model comes along with a foreign key to customer, it will automatically be included in that more merging operation. So what we do here, we iterate through all the fields if the field is a one to many. So it means it's on the remote end of a foreign key relation. And it's auto-created. So it's the far side, the under set side. Get the accessor name, get the attribute of that object, update the customer to be the original, and off you go. Interestingly, though, this doesn't depend on customer. You can actually make it a completely arbitrary merge operation with just one change. Abstracting out the explicit reference to customer, if you notice just at the end there, customer equals original, replace that with a metaprogramming lookup of the name of the field you're being related through. All of a sudden, you've got a completely generic merging two models into one that will work on any Django model. Now, caveat here, this is only working with foreign keys. You also need to reproduce this logic for many to manys and one to manys and so on. But the principle is much the same for those types of fields. So that's a simple practical example of somewhere you might use Meta in your own code. But the real killer app for the Meta API, the reason that I got interested in the project in the first place, is the potential for exposing new data stores into Django. For almost as long as Django has existed, people have been asking the question, how can I use X with Django, where X is some non-relational data store? MongoDB, Google App Engine, React, Cassandra, CouchDB, whatever this year's favorite flavor happens to be. At one time, the Django core team harbored the dream that the ORM was completely data store agnostic. It was an object relational model, which is why you don't do joins, you have a filter. You don't group by, you aggregate. Now, that dream didn't quite pan out. The year after Alex Gaynor did the multi-database project for the Google Summer of Code, he did a second project looking at building a non-relational backend for Django's database API. And it did work, but there were a bunch of problems. So the idea of a database backend for NoSQL was kind of shelved. But the question didn't go away. So, OK, what's the actual use case here? Well, many parts of Django don't care what data store you're actually using, views, URL routing, the caching layer, just go ahead, use MongoDB, use whatever. As long as you're just accessing data, you might as well use the native API provided by your data store. There's no reason to force the square peg of MongoDB into the round hole of a query set. But there are two big pieces of Django that are dependent upon the data store, model forms and the admin. So I put it to you that the question, can I use MongoDB with Django really means, can I get a Django form for my MongoDB model, and can I query and edit my MongoDB data in Django's admin? And if that's the bar you want to set, it's a whole lot easier and a lot more plausible to clear that bar. First question, can I get a Django form for my MongoDB model is right in the sweet spot of Django's meta API. I have an object, what attributes does that object have? What is an appropriate form representation for each of those attributes? These are questions that aren't dependent upon having a relational data store. They can be asked of any model. Even a basic Python object could answer those questions for you. The second question, can I query and edit my MongoDB data in Django's admin, only requires a little bit more information than that. You need to be able to know what models exist in a data store, and you need to know how they're organized. You need to be able to do some basic credit operations on it. These query primitives are a much smaller API than the entire ORM. For the sake of historical consistency, let's call those operations filter, get, and save. If you can provide a duck that quacks the right way, Django's forms an admin don't actually care that it's not actually a Django model under the hood. It just needs to adhere to that basic API contract. So you won't be able to take your arbitrary Django application from PyPI and push it onto an arbitrary Django data store. But let's be honest, that was folly, and it was never going to work in practice. But what you will be able to do is get the benefits of much of Django's tooling while using a non-relational data store. The best part of all this is because it's public supported API now. You can do all these experiments without actually making modifications to core. To prove the point, let's look at a case study. Django Mailer is a proof of concept that Daniel put together during his summer of code to demonstrate that the meta API actually worked. What does it do? It uses Google's Gmail REST API to expose your Gmail mailbox in Django's admin. Now, as a quick, yes. As a quick aside, it's not the same Django Mailer that James Talbot released as part of PINX. Naming things as hard y'all. The code is available on Daniel's GitHub repository. The other caveat is that this example does depend on a couple of small changes to Django itself. Things that are in patches that we're looking at getting into core. It's actually breaking the last of the hard coded exceptions in Django that say this is a model. Just use this out of the box, but it shows what can be done with relatively little effort. How little? It's about 500 lines of code. Now, I haven't got a time to tear down 500 lines of code in the remaining 10 minutes I've got here, but I can give you a whirlwind tour of what you need to do. The following code is heavily edited for clarity. It will not even begin to work as described, but it will give you a flavor of where we're headed. So we start with the manager. You say you want authors.objects.all. Objects is the manager. We need a generic one that will point out Gmail and store your Gmail credentials when you get there. Then we subclass that base manager to provide a mechanism for issuing queries on specific object types. We need one to look at message threads, and we need one to look at specific messages, because they're going to hit on different API entry points in the Gmail API. They're going to return query sets. What about those query sets? Again, we have a base class, a base query that sort of abstracts the idea of calling a REST API, and a base query set to generate those queries. Now, you'll notice here that query set is a subclass of list, because all it's really doing is representing a list of results. You could use your own container if it's useful, but you can just use a base Python list if you need to. And again, we subclass that base class for specific data types. Get thread by ID is the actual Gmail API call there. So when you call get on a thread query set, you're invoking get thread by ID. And we do a similar thing for message query set. When we do, filter is a method that returns a cloned version of itself. So you can start to chain those filter operations. And every time you clone it, every time you filter, you modify it to add additional query parameters. Next, we've got the meta model. Here, we subclass Django's own options model, the class that implements underscore meta, because that gives us a bunch of functionality for free. And all we really need to implement is getField and getFields, and the rest all works. Now, we pull it all together. We define a base Gmail model that retains state. And we define a Python meta class that will control the process of object construction. Then we subclass, set a meta class for each of our models, and a default manager. And we're done. Now, you can say thread.objects.get and message.objects.filter and retrieve data from your Gmail instance, assuming you've got your credential set up correctly in your settings file. All that's left then is to register those models in the admin. You have a message inline, works on messages. You have a thread admin. It uses a bunch of inlines and displays the number of messages in each thread. Register that with the admin. Do exactly the same registration processes you would for any other Django model. And you're done. So that's a lot to absorb in a short amount of time. And it does seem probably a little bit overwhelming. Keep in mind, the whole thing is 500 lines of code. And I'm rushing through it to kind of show what is actually possible. But it is just 500 lines of code. And what you end up with is reproducing a limited version of the API that should be familiar to you as Django users, models, managers, query sets, queries, and so on. So here's a call to action. Who's coming to the sprints? Show of hands. There we go. Goodly number. All right. Who's looking for something to work on? Yeah, a goodly number of those. All right. Have I got the project for you? How about one of these? Have you got expertise in a non-relational data store? MongoDB, Google App Engine, Cassandra, React, CouchDB, or anything else that happens to be out there? Why not build a Django method duck for that data store? You'll be able to see your data store in Django's admin, generate forms for that object, off your run. What about more traditional data stores? You could wrap an LDAP store. You could wrap your local email inbox, your file system, even a pool of AWS resources with a Django meta duck, and browse. Even potentially do some light management of your AWS resources in Django's admin. What if you don't like Django's ORM? Well, why not replace it? There's no reason you couldn't make a Django meta duck interface for SQL using SQL Alchemy. So you'd be able to take a SQL Alchemy model as a Django resource, expose it into Django's admin. For most of your business logic, you're using the SQL Alchemy API, because that's the one you prefer. But for convenience and prototyping, you can just drop that model into Django's admin or into a Django form. Now, if you do choose to take on one of those projects, you are going to hit some rough edges. So I will warn you. As I said, Daniel's Django Mailer depends on a couple of patches that haven't been merged into trunk. I can't guarantee that we've identified every single one of the places in forms that admin that you might trip up. There's a couple of places where we probably have made implicit assumptions about the fact that a relational database underpins this application. But if you do find a bug, it's a bug we're interested in fixing. And the more people we have contributing these Django meta-ducks, the more confidence we'll have that we've broken the ORM dependency in model forms and admin. So I hope I've sparked your interest about the possibilities of the meta interface. We probably have time for a couple of questions, or you can grab me over the next couple of days or online. Thank you very much. I can hear crickets. Here they come. How's the documentation for this? Well, pretty good on the basis that there's not much to document. So there's literally two methods and a bunch of attributes. There is actually even some really good documentation for. So you were using the wrong API that wasn't formal. Well, OK, here's how you transform one to the other. So yeah, there is documentation there. It is pretty solid. The documentation of how do you write the Django meta-duck? No. But like I said, the only thing you need to be doing is essentially appearing to be a meta model. So if you implement getField and getFields and a couple others, you're basically there. So do you think there is room in the meta abstraction for some theories that wouldn't make sense in relational database that they do in things like Mongo or other stores? Extending the abstraction to be a little bit more flexible around the edge. I presume you're talking about things like documents as a data type. Exactly. Things like that. Yeah, absolutely. I mean, it's sort of where that is it a one to one? Is it a one to many? Is it a many to many? Is it a blob in some meaningful way? Whatever that. I don't know what that abstraction is. But yeah, I think that there's no reason that we can't have that flag in Django, even if Django itself doesn't use it. But also on top of that, you could set it up in such a way if you've got your MongoDB data type that needs to be able to filter on a particular flag, that flag is not going to be on for any other data type. So you only need to have it in your data store. You can say, get Atra defaulting to false. And any model that doesn't have it will show that up anyway. I don't think this is a new problem that I'm about to ask you to what you've just described. But I'm just looking at the Django documentation. And there's this page on model meta options. And then there's this other page on the model underscore meta API, which is talking about the class options. And I'm really confused what the relationship is between these two things. And is there a possible naming? Yeah, I guess some of that is a little historical. The name of the class options is never surfaced to the end user. It's an internal name. And underscore meta is the only one that is there. If there's something that we need to clean up, it's probably the documentation referring to options. Because it isn't something you should ever really be instantiating. So yeah. But there's also options under the model meta options, which talks about options.abstract. Oh, OK. I see where you're going. I think there's actually two naming conflicts, or potential naming conflicts, for people who are not intimately familiar with how all this is working underneath. Yes, I agree. Yes, there is probably something that can be cleaned up there. I'll talk with you later and see if we may do some suggestions about how that could be cleaned up. Which one we can publish out or take out of the documentation. So the word options, oh, seriously, I can't think of anywhere that's actually surfaced as public API. It's something that is used because that's the name of the class, not because it's actually a name you need to use. So it's a candidate to be deleted anywhere it's being used effectively, so. Thank you. All right, well, I think that's pretty much time as well. So thank you very much.