 So we're going to go to lunch right after this. And I recently learned that lunch is two hours long, which is enough time for wine and a nap. So I'm now convinced that Spanish conferences are the best conferences. Very excited for that. I'm also glad that I went right after that last talk because a lot of those same techniques I'm using in my project here, we're not going to talk about them at that lower level. But if anyone's wondering, can this actually be used to do anything, the answer is yes. So my name is Nick. I'm a web platform engineer at Opower, which is a small company in Washington, DC. And we're trying to save the environment by convincing people to use less energy by sending them personalized reports about how they use energy. And so far, we have prevented over 6 million pounds of 6 million tons of CO2 from entering the atmosphere, which is equivalent to taking a million cars off the road for a year. So we're excited to have hit that milestone. But completely unrelated to that is procedural content generation, which I'm going to tell you about. And when I tell people that's what my talk is, the first question I always get is I know what those three words mean, but what do they mean together? So that's what we'll first explain, why is this relevant. Then we'll look at the specific techniques and tools that we have to actually make this stuff work. And then finally, I'll walk you through a project that I did using these tools and tell you the lessons that I learned and the best practices that I came up with out of that. And this is a very big field, but my goal is to inspire you and to give you a sense of what's out there so you know where to get started if you're interested in pursuing this further. So what is procedural content generation? Well, it's creating content algorithmically and content can be anything that humans want to consume. So music or text or art, video game assets. And it's generally a non-deterministic process because the big power in this is, let's say you're making a level for a game. If you write an algorithm that makes good levels, you can have infinite replayability. And there are a lot of different things this is used for. Games is the first thing people think of, but I was actually surprised to learn that a lot of journalism that we read today is generated by computers. And we'll discuss more of that in a moment. It's also just a totally new mindset when you're thinking about these problems. When I'm at work, sometimes I spend days wondering why these run plugins aren't talking to each other and stuff like that. And this is just totally different so it was a nice mental break to explore these new ideas. And finally, because there is sort of this artistic bent to it, you end up working with people who you might not normally work with as a programmer. And I just find those projects are really fun and are good for the growth of everyone involved to work with new types of people. And just to clarify, this has nothing to do with FORTRAM. There are people who have heard this title and thought that I was gonna tell them about how to use FORTRAM and COBOL to make video game levels. Not at all true, so just to get that out of the way. Procedural content generation is actually a great place for functional programming, which is my favorite style. So a big application of procedural content generation you might all be familiar with is Minecraft. As you may know, when you're running around in this world, it's generated for you on the fly. And the total world is 10 times the size of the earth. And it's producible by this indie game company with 10 people when they started. And it's pretty amazing that they can produce content that so many people get so many hours of enjoyment out of. The flip side is something like Grand Theft Auto, which with $256 million is the highest budget video game of all time. And they have humans who are crafting just about everything in their world. But even there, they're getting help from computers. So this is a program called SpeedTree, where you put in some parameters and it will produce trees for you. And this is a much nicer way to generate your forest than crafting every polygon by hand. And this book, which is an interesting book about computers in general and the way we use them in society, the authors are arguing that humans and computers work best together when people really try to focus on what are computers best at and what are humans best at. And so in this case is a good example of that because computers are best at creating this very well-defined asset that we can use a lot of. And that frees humans up for the higher level tasks of is this a fun level to play on, is the user gonna get lost or frustrated, et cetera. But it's not just for games. So as I was mentioning earlier, I was surprised to learn that a lot of text is generated by computers as well. And there's a New York Times interactive survey I found or quiz where they put up snippets of text and ask you if it was written by a human or a computer and we're gonna do a few questions here. I'll see what you guys all think. And just to set expectations, I got 50% of it right when I took the quiz, which means I am no better than a coin toss at figuring out the answer to the question. So here's the first one. A shallow magnitude 4.7 earthquake was reported Monday morning five miles from Westwood, California, according to the US Geological Survey. The trembler occurred at 625 Pacific Time at a depth of 5.0 miles. So who thinks a human wrote this? Who thinks a computer wrote it? Now, very cleverly, you guys might be thinking, it would be weird for it to be a human given what this talk is about. You're correct, this was generated by a computer. Let's try the next one. Apple's holiday earnings for 2014 were record-shattering. The company earned an $18 billion profit on $74.6 billion in revenue. That profit was more than any company had ever earned in history. Human? Computer? This one is a human, specifically that guy. Last one. When I in dreams behold thy fairest shade whose shade in dreams doth wake the sleeping morn, the daytime shadow of my love betrayed lends hideous night to dreaming's faded form. Human? Computer? Room's about split. It is a computer. This is one of the ones that I got wrong. So this is one company and there are many others who have their business around generating this text and essentially what you do is you send them a JSON blob that has the details of your event. So your earthquake or your financial report and they will generate natural language to express that information. And just like with GTA, this is not necessarily going to replace all human journalism, at least not in the near future, but there are several key advantages. These stories can be generated within seconds of the event occurring. For humans, that's not really practical. You can also personalize the story for each reader. So just like everything else we do on the web is targeted to us, you can have certain details that be accentuated or diminished based on what that reader's interested in. And sure, you could just dump the data into D3, but natural language can be a better way for some people to understand things. And so financial reporting is one place where you see a lot of this. And also sporting events, both major league sports and smaller scale endeavors. And it's not cost efficient to have a New York Times journalist writing a story about a little league sports game unless that journalist happens to have a kid on the team. But with an algorithm, you can produce a story about the game that still is of some value to people. And so that is a cost efficient way of doing it. So now that we know what procedural content generation is and what we can do, how do we generate it? Well, procedural content generation is similar to machine learning where there's a general problem of with machine learning, it's we want to learn about some data set, we want to classify new examples. Same thing with procedural content generation, there's no hard and fast set of rules, but there are some things that people have found work pretty well. And a lot of these are defined by a lot of complicated math that is difficult to get through in a half an hour talk before lunch. Fortunately, we live in JavaScript land where there's a node module for just about everything, which was probably the node module you're going to use is written by sub-stack, but not all of them are. And so these are some modules that I use in my project and mostly found them to be quite reliable. And so just to tell you about a few specific types of these tools, you saw several different types of noise on the last slide. The reason that we have noise is so we can have non-determinism that has some sort of structure to it. So you might just say, okay, we want there to be randomness, let's just call math.random. Well, what you're going to get is the white static of the TV on the wrong channel. It will be random, but it's completely uniform. And that's actually not a great way to model things in the real world because you don't see that very often. So we have different types of noise that are defined by their mathematical properties to have sort of shapes and patterns you can recognize a little bit. And this makes it easier to use them in your project. So you can see that Perlin noise looks like it might make a good elevation map for a terrain maybe. Veroni noise is often used for cells or geographic borders or things like that. The way the Veroni diagram is created is you put a bunch of dots on a plane. You pick a bunch of points and then you draw lines between those points such that every point on that line is equidistant from two other points. And that's where it produces this structure which looks nice for various applications like geographic boundaries. Another technique is the L-System which is a form of grammar-based modeling. And L-Systems are really powerful for when you have something you're trying to model that is hierarchical in nature. So you might have a tree and the tree starts with a trunk and then it goes into the thicker lower branches and those have thinner branches off of them and then there are leaves. And the way that you define that progression is through a grammar. And so the L-System is given by three components. The alphabet which are all the things in our language that we're producing. The start state which is just a member of the alphabet and the production rules. And this screenshot is from the demo of the L-System node module. And you can see it's generating this fractal plant-like structure. And so as an example, we have an L-System that has these three components. The alphabet is X, F plus, minus and brackets. The start state is X and the production rules are X becomes all of this and F is doubled. And so you could easily look at this and wonder, well, what are X and F and plus? Like that doesn't have any meaning to me. That's correct. It has meaning to the rendering side. So the L-System is basically a contract between the grammar that you're outputting and however you're going to render it. So in this case, our renderer does have meaning for this language. You can see F means draw forward on the canvas and everything else controls the rotation and where we are on the canvas. And so you're able to output a sequence of these characters that creates this fractal structure off to the left. And for an example of that, we would start at time step zero with the start state. And then we have the production rule that says X is replaced by all of this. So that's what we do in time state one. And then in time state two, we again go through and we see, okay, F gets replaced by two Fs. This X gets replaced by all of this, et cetera. And so you just keep working your way through the string making those replacements and you do as many time steps as you want until you're happy with the way your output looks. And here's an example. Visually, you can see we start with nothing and then each time step, the structure grows more complicated. And so that's the basic L system, but you can add some more features to make it more powerful. One is non-determinism, which again is important so that we can generate lots of different things and not just a single form. And so here what we're saying is instead of always transforming A to B, we're only gonna do that 30% of the time and 70% of the time it'll become C. You can also have context sensitivity. So saying if A is followed by C, then make it into B but otherwise don't. And you can also have parameterization where you attach a value to the grammar symbol and you pass those values through and you can define rules in terms of them. So when X is equal to zero, then A becomes B, when X is equal to one, A becomes C. This is valuable because it lets earlier stages in your generation have an impact on later stages without making your grammar hideously complicated. Another tool we have at our disposal is the Markov chain. And a Markov chain models a non-deterministic process where the next state depends on the current one. Or more simply, it's a game of shoots and ladders. I don't know if this was part of all of your childhoods but it's an incredibly tedious game because literally all you're doing is computing a Markov chain. There's like no decisions to be made. You just are cranking through it. And so the way that the chain works is you have a set of states and then you have transitions between those states. And just like the L system, it is a time-based system so you move through it in steps. And so what this says is let's say we're in state A. For the next time step, there's a 40% chance we'll go to E and a 60% chance that we're gonna stay at A. And when we're in E, there's a 70% chance we'll go to A and a 30% chance that we'll stay in E. And so you can use this to generate structures. If you think of the thing you're generating is a good fit for this. And so you'll actually see a lot of Twitter bots where people are generating text this way. And so what they do is they read in a lot of text and that tells them what the states are and what the transition probabilities are between each word. And then they just run through the chain to spit out some sentences. And there's some pretty amusing examples of this on Twitter. One that I learned about recently is called Erowid Recruiter. Erowid is a message board where people talk about their drug experiences and the recruiter part is LinkedIn Recruiting Spam. So it combines both of these and it outputs things like this. People who are all tripping quite hard decided to sit on and in the redesign of consumer reports. So the Markov chain can produce some funny and some absurd things. And the reason that this looks the way it does you can see that there are phrases that are meaningful. So like people who are all tripping quite hard. That's a good phrase. And the redesign of consumer reports. That's also good. But when you combine the two globally it's meaningless. And this is a direct consequence of the fact that the Markov chain is only looking at the previous word to determine what the next one should be. So it has no way of knowing that the sentence it's producing makes no sense. And you can increase the number of steps back in time you're looking at but that does get computationally more expensive depending on your data set it could be impractical. And if you increase it too much then eventually you get to overfitting where let's say you're looking 10 states back in time. It could be that for a given 10 words there's only one way to continue. And then you're not generating new things you're just picking out arbitrary sentences from your input and that's not very interesting. But the other big reason that Markov chains are failing to produce compelling text whereas the stuff we saw earlier like the news stories actually was a lot more realistic is that the Markov chain has no way to know about semantics. It's only learning about the syntax but it doesn't know what to say. Whereas with the financial reports we're telling the computer what to say we're solving the hardest part for it. And so this brings me to my next point which is do we still need designers and artists and people or can we just program them out of existence like so many other jobs have been programmed out of existence. My answer to that is absolutely yes. Even in a game like Minecraft which is heavily algorithmically generated you can still see the intention of the designer in what is produced. And another example of this, this is my brother he's a drummer, musician in Los Angeles. And a while ago I asked him I was like so computers can make music. Maybe we should make music with computers. And he bristled at this a little bit and he's like no that would be some soulless stupid thing music should be made by humans. But the fact that if I were to do it by myself it would be very bad indicates that it's not soulless because the human is still involved. You still need someone with musical talent to make a good music program. So we definitely still need designers even though we have all these cool techniques to amplify their designs. So my project was to generate a city. Apparently we have a very urban theme today for the talks. And I wanted to make a map and I wanted to display it somehow. And so my first question was how am I gonna visualize it? And I wanted to just be able to drop it into some visualizer and not have to spend a lot of effort on like the rendering side because I wanted to focus on the generation. So the first thing I looked at actually was visit cities because I saw it on the agenda and it looked really cool. However, it seemed like it wasn't really trying to do what I wanted to do. And I didn't wanna spend all my time hacking it apart to do what I wanted to do. So then I looked at Mapbox which is a much more mature product and I just spent all my time hacking that apart instead. Mapbox is pretty nice. I did have a fair amount of pain with it, but then the next question was what format should I produce? And there are a dizzying array of different ways to represent geographic data. Ultimately I went with GeoJSON because JSON sounds like something that would be good to use from JavaScript. And there's also a really nice open source library called turf.js which does a lot of the geographic calculations for you. And this was important to me because I didn't wanna spend the time on that. I wanted to do the generation side of things. And I also found this site to be really helpful for debugging. You can just paste your output right in there and you can see what it looks like. And this is really important because if you're just looking at the JSON file, you're not gonna be able to figure it out. With a lot of these projects you need a visual component to see what's going wrong. So the start of my city was just a box. But this does not look very much like a city. So I used iterated subdivision to divide it into a smaller set of city blocks. But this still doesn't look very much like a city because there's no city in the world that's just a square grid of uniform size. So then I introduced some simplex noise and my algorithm was divide the box into a smaller section if the noise value for that box was high enough. And so this produces that dense urban center and then the outer lying areas are not quite as populated. And the other thing I did to make that happen was the requirements for the noise were lower towards the center. So I'm trying to bias it towards having like a city center feel here. Next I saw that the perimeter of the city doesn't look very natural because again it's just one giant box. So I applied a rule that would remove all of the grid sections above a certain size. And that made the perimeter look a little more natural. And then I decided that no city in the world has all blocks of the same size, like even within a size category. So I merged some of the blocks together and this again creates a little more of an organic look and it makes things look a little more varied. And then because most cities aren't built on a perfectly good grid system, I perturbed the streets. I took the points and I just sort of wiggled them around a little bit to give it a slightly more natural look. And one thing that was important was making sure that the size that I was moving was relative to the size of the box, to the size of each grid square. Or else you could quickly have things get totally distorted. Additionally I had different thresholds for the smaller grid squares versus the bigger grid squares so that I could fine tune how I wanted it to look. And then I added a river using a Veroni diagram. And so the way I generated that was I drew the Veroni diagram over the map of the city. I picked a point that was on one edge and then I just kept walking over the diagram until I was exited at the other edge. I just created the river and then I added a lake. And the way I did that was I took Perlin noise and I put it over the whole city. And I found the point that had the highest amount of Perlin noise and then I just expanded outwards from there until I met some threshold. And so that creates this sort of water fill effect. And additionally for both the river and the lake I went back and I removed the city grid squares that were on top of it. And it works out well when you have like a bunch of small grid squares that are right next to the water because then it looks realistic like the way that people do actually tend to build things right on the water. And the next I added some parks and the way I did that was for each grid square I looked at its size relative to its neighbors and within a certain range of ratios I just designated those as parks. So this was where I stopped. This was my final output. And obviously there's a lot more work to do if you want this to look like a totally realistic city but I felt this was a pretty good balance point in terms of getting my fee wet and understanding how this stuff works. And I learned a good set of best practices and lessons that I'll share with you next. I use TypeScript for this project which I'm very happy with. TypeScript like any other new tool does have a bit of a learning curve but after you get over it it does give you a lot of value. Here you can see that I'm getting both type safety and some ES6 features like the fat arrow function. But the big thing that drove home for me how TypeScript is useful is when I'm programming ideally there's only one thing wrong with my program at a time. And what that means is I stop and run the code every time I've written enough lines that I've probably introduced an error. So like maybe every 10 lines I have to stop and run it and see what it does. With TypeScript the number of lines I could write before I introduced an error at least one that wouldn't make it pass compile time was expanded and so I found myself spending more time writing code and less time switching back and forth to make sure it still worked. And I often had the experience of going to see if it worked and then being pleasantly surprised of like wow that actually worked the first time I kind of wasn't expecting that. And that's because of the support that TypeScript gives you. Another thing that's very important for this is using a seedable randomness source. And what this means is that instead of just using math.random you give it a seed which is like a string or some starting value. And the randomness is generated based on that seed. And so for the same seed you'll always get the same values. Without this it's basically impossible you'll just go insane trying to debug this because in science you can only change one thing at a time to know what is having what effects. And so if you change your algorithm and then your output looks totally different if you're using math at a random you have no idea if that's because you just got unlucky with having a different set of random numbers or if your algorithm is bad. And so by keeping the randomness the same you're able to debug it a lot more effectively and then when you would go to production with this you would use a random seed and then the whole thing would be end to end random. Another benefit is that this lets you do automated testing if you are generating something different every time that's basically impossible. And it lets you produce error conditions. So if one of your users produces something that looks really weird they can send you the seed value they used and you can reproduce that exact state as opposed to just saying like well I guess I'll hit refresh a bunch of times until math.random does the same thing for me that it did for them. And the node module that I used for this is called alia I guess that's how you pronounce it and it's worked totally great. It's also very helpful to be able to log everything and here what I'm doing is decorating the output with information about how it was generated. And this is very useful so when you're inspecting your output when something looks weird you can understand why in a way that like just doing console logs it's harder to connect the two. It's also very important to be able to configure everything and so your life will be a lot easier if your configuration is pulled out from your code instead of having magic numbers floating around everywhere and it's super easy to tweak them. And the reason for this is that having good parameters is completely essential. You can have a really interesting algorithm that is torpedoed by awful parameters. And one way that I torpedoed myself was when it was generating the lake. So as you may remember with the lake we have Perlin noise and we're taking a sample of it for the city grid and then we're trying to find certain points to generate the lake accordingly. And the Perlin noise is defined over an infinitely large area so we're just taking a subset. But when I started I was using latitude and longitude and on the scale of a city those numbers actually don't change very much. And so instead of getting a nice section like this I was just getting a tiny little section and so what that produced was a map of colors where the color represents the value of the noise was basically all the same. So then I applied a coefficient to try to zoom out but this was zooming out by too much. You can see this basically looks like the white noise static because the resolution of these points we're sampling is too low for the size. So then I found a good coefficient that was right in the middle where you actually can see those patterns that we care about. So the lesson is that you have to have a way to test your parameters and to configure them correctly because otherwise your life is gonna be really painful. Another important lesson is that you should use the right algorithm for what you're trying to do. So this is a map of Seattle and you can see a lot of Seattle city grid is north, south, east, west except for this cool area where it shifts and it is going along the waterline. I wanted to add something like this to my city grid but it was fairly difficult because of the algorithm that I use and the data structure I was using. And in fact, a lot of things were fairly difficult because what I was doing was I had a big box and I was just subdividing it into smaller and smaller boxes when you wanna go back afterwards and merge the boxes or perturb the corners or do a shift like this. It's very hard because you need to sort of reconstruct information that you didn't save originally. So if you just have a big set of polygons then finding neighbors is an order n squared operation which is incredibly slow. So if I were to do this again I'd like to use an L system to generate the city grid as a grammar. And I think this will also work nicely for having a hierarchical structure where you can say okay you have the city center and then you have the big highways coming off of that and then that leads to smaller roads, et cetera. Another important lesson is to pick your output carefully. So as I said at the beginning I really wanted to just drop my output into a renderer and have it look good. And Mapbox has a lot of built in mapping templates and styles to make your content look good but for whatever reason I didn't produce the output that Mapbox was expecting. I had my own special snowflake output and so that meant that I had to hack Mapbox's styles to work with what I had done. So if you are not interested in doing that you should be conscientious about having your output be well formed. And finally Async is not as helpful here as it is in other places because our program is totally CPU bound. And Async only helps with IO because the idea is that the event loop would be blocked waiting for something but instead it can do other tasks. Well if it's CPU bound the event loop is always busy. So you can't really get that easy parallelization that you can get with other problems. And if you wanted that you would need multiple Node.js processes which would be possible but is a little more painful. So those are my lessons and now that you guys have seen this I'm excited to see what you all will make. Thank you.