 Thank you everyone for coming. So before I get started I want to just ask a quick question which is how many people in the audience right now either already use big data or want to learn how to use big data so they can put it on their resume so they can get three times more recruiter emails? Exactly, perfect. All right, that's pretty much what I'm going to talk about today, how to optimize your recruiter emails. Building data driven products using Ruby. So you're probably wondering who is this guy, why should I listen to him, right? He looks like he's about 12 years old, his voice hasn't even broken yet. So I studied computer science and bioinformatics at UCSD before I eventually dropped out so I could kind of join the startup scene in San Francisco and right now I'm currently a data scientist at ShareThrough which basically means I'm an engineer who also happens to be good at math and since I live in Silicon Valley they decided to call me a data scientist. The company I work for ShareThrough is a native video advertising platform. What that means is we consume large amounts of data about users all over the web so that we can customize ad experiences to hopefully make ads as a whole suck less, right? Which is good because it means I basically get paid to use data to improve my business's bottom line so really you want to listen to me because somebody pays me to do this so I must kind of know what I'm talking about at least. So my goal of this talk is to help you answer the following four questions. What is a data driven product? What does the development cycle look like for a data driven product? Where does Ruby fit in this new world of data science? And how can Ruby be improved to stay relevant in the age of big data? But I just want to let you know I'm not gonna talk about whether you should use like support vector machines or some new type of regression or whether you want to use principal component analysis. This is really about talking about how Ruby fits in and how we build data driven products as a whole. So before I get started I want to give you a couple of warnings. Ruby is not your only option, right? The world of big data right now is pretty much a minefield. You have so many things to choose from, right? We have Hive, Pig, there's R, Scala, Cascading, Python, Java, right? All of these tools in this ecosystem right now and really you're not gonna use just one of them. So my day-to-day job I actually use a combination of Ruby, Python, Java, R and even a little bit of Scala just to do what I do on a day-to-day basis, right? The key is it's all about picking the right tool for the right job. But since this is a Ruby conference I'm really gonna talk about the ways I see Ruby fitting in and the places in the data-driven product cycle where Ruby really is a good fit. So now we've got those warnings out of the way. We'll first start with what is a data-driven product? Right? A data-driven product is really anything that uses data to improve the bottom line of your business, right? So it could be a standalone product, right? Where the whole company just does data. Some examples of these are like Boundary, Mixpanel, right? Google to a certain extent. But I think it's more interesting to look at the ways you can incorporate like particular data-driven products into a larger company offering, right? Examples might be like ad targeting, right? Product recommendations if you're amazon.com, right? Which books you bought? Information, aggregation and filtering. So if you go on a news website which articles you're likely to wanna watch, right? We can see some examples of these all over the web at the moment, right? We have GitHub, the classic example, right? This kind of seems simple and trivial. When you really think about it, what GitHub is doing is they're giving you data and they're letting you use data to understand how you interact with their product and in understanding how you interact with your product they know you're gonna become like more attached to the platform as a whole, right? So using this simple graph, I can do things like know when my engineering team is most productive, right? And if I know when my engineering team is most productive, I can make sure I don't schedule meetings during those times. At the same time, I can check to make sure I'm not burning my engineers out, making sure they're not committing at one o'clock in the morning every single Saturday because nobody really wants that, right? The other classic example is LinkedIn's people you may know, right? So social networks realized a long time ago that a user's interaction with their network was highly coupled to how dense their social network was on that given network. So in order to help improve that, what LinkedIn did is they kind of pioneered this notion of people you may know, right? So we use algorithms and we use data based on how you interact with the network. In order to target people they think you're likely to want to get to know. In doing this, they're able to help improve engagement and overall improve the value of their product, right? Because LinkedIn makes money off advertising. If users are on their site interacting with their friends, they get more ad impressions which means ultimately they make more money. And the classic example is advertising, right? Like Google AdWords is probably one of the ultimate data-driven products, right? They use data from your searching history. They use the actual search term itself all to target advertising to you, right? And the cool thing is by using this data they're able to create value for Google as a whole which lets them deliver the awesome search results that we're so used to on a daily basis, right? Using this data product they're able to make money for Google. But it doesn't have to be a product in the traditional sense. Another great thing you can do with data is improve marketing for your company as a whole, right? This is a great example. Facebook released this graphic back in December of 2010 and they got tons of pickup, right? All the tech blogs covered it was on TechCrunch Mashable and like a bunch of people use it as their screensaver, right? Just a simple product like this you wouldn't normally think of it as a product but when you kind of take that step back what it's doing is it's giving Facebook a way to reach users, right? And if users see how connected Facebook as as a whole they're more likely to want to join the network and most importantly brands now realize how connected Facebook is because a brand manager sees this. They're gonna say like hey, maybe I should spend some ad revenue like some of my ad budget on Facebook. So that's great. You've told me exactly what a data-driven product is filled with lots of buzzwords but I'm an engineer so I wanna know how do I actually go about building something? So I think building a data-driven product kind of comes down to this cycle which has four major steps, right? You start off with asking the right question. There's collecting and cleaning your data then you move on to building the predictive model and finally you get to like publishing your results phase, right? The important thing to notice about this is this is not a linear flow, right? It's not waterfall, building data-driven products is just like building every other product that we're so used to building, right? Like Rails, web apps, we all do the agile thing, right? It's very much the same with data but it's very hard to get out of that traditional like waterfall straight down model, right? Like most people in a traditional kind of enterprise background doing data stuff using traditional like business intelligence tools, teradata, that kind of thing. You get in this kind of waterfall mode, right? You just like give me a question, I'm gonna answer it, same technology all the time. But I think it's important to realize that to build good data products you need this cycle, right? Cause you might finally get to the publish results phase like you print out this graph and then the business says, okay that's cool but we've kind of pivoted since then so your graph really has no value now so you need to go back and do it again, right? Same thing happens like you've built the ultimate model, tons of work and then somebody in business development signs a new deal with a third party and all of a sudden you need to integrate that new third party's data source cause hey the business needs it, we need to put it on our website, we need you to put it in your model. All of a sudden you're gonna have to go from that whole building a model phase back to cleaning and collecting data because until you've done that you can't truly build a model. So now that we have kind of the four main phases outlined we're gonna like take a step in and look at how we do each one. So the first phase is all about asking the right question, right? This seems really simple, it seems kind of trivial. You're probably wondering why I'm even talking about this, right? This is a tech conference I said I was gonna talk about Ruby. This is actually one of the hardest phases, right? If you don't ask the right question no matter how awesome your tech stack is no matter what technology you pick it won't matter, right? If you're not answering a question that really helps the business you're not delivering value and the whole point of like engineering and product is delivering value to your business, right? Which means you really need to focus on asking the right question. And conveniently the only thing you need to do that is English, right? You don't need Ruby, you don't need Python, you don't need Java, you don't need Hadoop to help ask the right question. What you need to do is you need to go out and you need to talk to the business, right? You need business context. You need to know like what makes a business run. Like what are partnerships looking like? What does the market look like as a whole, right? All these things are traditionally the hardest things for engineers to do, right? We actually have to go out and talk to people which is quite challenging. I don't personally like doing that that much. We have to do them so that we can kind of get that first phase done. So we can use Ruby to kind of help with that do some exploratory stats but really at the end of the day, that whole first phase, you don't need any programming technology. You just need to talk to people. So for a personal example, right? I kind of want to guide this whole process through what I do on a day-to-day basis. So the first example that I really have is the marketing department in the business comes to me and they say, okay, Ryan, I want a data dump of percent of users on publisher X that I've also seen on publisher Y, right? So I want to see how many users I've seen on Forbes that I later saw on the all or business insider, right? I could do that. That's simple. That's like a very easy question but the problem is what value does that really give them, right? A data dump is very simple. So you kind of, you take that step back which is the next box down which is the thing they're really trying to ask, the question they really want to know is what is the value of a user on an ad network, right? Because if we can determine the value of a user on an ad network, we can better predict our revenue as a whole and we can better gauge how much we charge an advertiser for each impression, right? But that's almost too big of a question, right? Like now I found this huge theme. What do I actually do with it? I can't really answer that. So you maybe take that one step down, right? Which is in this case, what is the supply of a user of a given type, right? So given a user's been seen on publisher A, what's the supply of users who will also be seen on publisher B, right? And most importantly, can we predict given we've seen a user on one publisher that we'll see a user on another publisher? And that's kind of the question we chose to work on. So once we have that question formatted, we get down to the real code writing phase, which is phase two, which is data collection and cleaning. So this is not very glamorous, but you'll spend 90% of your time doing data collection and cleaning, right? No matter what anyone tells you, right? Like in computer science class, they're always like, focus on math, focus on stats, that's all great. You're gonna spend 90% of your time cleaning data. Data in the real world is very, very messy. And the thing that really makes a difference between a good data product, a good engineer, a good data scientist, and someone who isn't very good is your ability to deal with and clean data. So for example, you would start off with something like this, right? This is what my logs look like, that I do most of my analysis on, right? You can kind of read them, but you can already see, like there's some missing values. We have the intentionally blank HTTP referer. That's awesome. And the whole thing is just kind of messy, right? And what I need to do is I need to take this massive data and I need to output something that looks like this, because I need this two-column CSV so I can input it into my graph algorithms to actually determine what percentage of a user's gonna cross over, to build the product that the business really needs, which is a predictive model for how many users are gonna be seen across our network, right? But for you guys, how do you get your data, right? Where does data come from? Right now, in the social web, most data comes from these four sources, right? We have server logs, right? Your front-end boxes, Rails boxes, engine X boxes, they're all producing tons and tons of logs. You have that, you have third-party APIs, right? Everyone loves to collect Twitter data right now. Everyone loves to collect Facebook data. You have web scraping, right? Maybe they don't have an API or it's just a page you wanna get some information from. So you're gonna go out there and you're gonna scrape that data. And finally, we have direct user input, right? You have a questionnaire, you have a survey, something where the user's directly giving you data. The important thing to note about these four sources that's very different from what we're used to is they all require programming skills, right? Like none of this data is just conveniently handed to you, right? Like everything requires you to go out and write code to get it, right? And it could be even worse. What happens if someone gives you a PDF, right? They're like, hey, I gave you data. You're like, no, you gave me a PDF. I can't do anything with that. But ultimately, you're gonna have to pull data from all these sources in order to even start building a real product that helps your business. And this is really where Ruby comes in, right? Ruby has tons of tools to make it possible for you to collect and clean data, right? This is just a sample of the tools I use on a day-to-day basis, right? We have NoCoGiri. If you wanna parse XML, parse HTML, right? If you're doing any sort of scraping, you're gonna spend tons of time using NoCoGiri. And even if you're dealing with old APIs that still use XML, you're gonna spend tons of time using NoCoGiri. We have SaveOn, which everyone loves, right? It's a SOAP client. If you're having to use an API that was built in the 90s and they haven't updated it to REST, you have to use SOAP. SaveOn will make it a lot easier, right? We have REST Client, which makes it really easy to make HTTP requests, because you're gonna be making a lot of HTTP requests if you wanna cross reference data sources, right? Pretty much every API right now is HTTP-based, especially if it's modern and using REST, so that'll come in very handy, right? We have PDF Reader, because you will most likely have somebody come to you and want you to do some analysis on a PDF. Like, hey, I have tons of data on a PDF. I need you to extract it, right? Ruby makes that possible. And then finally, we have Sinatra, right? Like, Sinatra's a great way if you just wanna quickly set up a survey, right? Like, I could write a survey that I can put up on an Amazon Mechanical Cirque to ask somebody their opinion on the election in probably 10 minutes, right? And then we also have Twitter. Twitter is a classic example, and that's where we're gonna look at our first piece of code. So maybe a simple thing you wanna do is you wanna say what's the word frequency from Hurricane Sandy, right? And the awesome thing about Ruby is we can pretty much do that in, like, 11 lines of code given that we've already, like, configured our tweet stream client, right? And if you just take a second to read that code, what you can see is that it's really very basic, right? Ruby takes this hard process of cleaning and collecting data and makes it much easier to work with, right? So here we're already able to do, like, basically the two hardest things, which is one, go out and collect the data, right? We're able to track all the keywords that have the hashtag Sandy in them, but it just doesn't just do that. We're also getting the cleaning phase done, right? Because no algorithm that's gonna do interesting things on natural language is gonna take, like, really sentences, right? Most of the models fundamentally want words or engrams or some sort of representation, and we can already get that, right? Ruby makes it so easy. We're able to take that entire sentence, split it, and then remove blank strings, like, just like that, right? Which means now we can simply output a comma-separated list of words, which would be awesome to input into the next phase in our algorithm. And when we talk about collecting data, we can't not talk about Rails, right? Rails is so easy if you need to collect direct user input, right? Like, if you wanna use Mechanical Turk, right, for sentiment analysis, anything like that, Rails makes that so easy. There's an awesome open-source project from Twitter called Clockwork Raven, right? Which has gotten a lot of press. It's basically a way to submit jobs to Amazon Mechanical Turk, get feedback, and then refine which users you think do the best job, right? So that's all written in Rails, and there was a funny quote the other day from one of their data scientists. He said, I was trained as a classical scientist, but I spend most of my time writing Rails. Because what it comes down to is he spends most of his time writing the frameworks he needs to collect the data so that he can even do the complex analysis that he spent all that time in school learning how to do. Like, that's great, but he said you're gonna talk about big data, and my data is big data, right? Everyone right now wants to say their data's bigger than everyone else's, and it's kind of, yeah, we have to get used to this buzzword and realizing that just because your data's big data doesn't mean you can't use Ruby, right? Everyone says, well, Ruby can't scale. It turns out some smart guys created this thing called Hadoop, which means you can make Ruby scale, right? So Hadoop is Java. I don't really like writing Java personally, so if I can get away with it, I'm gonna not write Java. And for a lot of tasks, you don't really need Java in order to write Hadoop jobs. So if you remember those log lines from before, when there's two of them, it's pretty easy to see which pieces of data are missing, right? We can see that it's pretty easy, blank HTTP refer, there's a couple of missing values. But in reality, in production, if you're dealing with big data, your logs basically look like that, right? It's an undiscernible mass of text. There's no way you can go through and manually inspect every single log line to see like, hey, somebody introduced a bug in the client, and all of a sudden, my data's not being collected, right? And that's kind of where Hadoop comes in, and that's where we can really use the power of Ruby, because Ruby makes it very easy to do simple tasks across humongous clusters of nodes. So here's an example, from those log lines, I happen to know that at a given point, there was a bug introduced in our client, and lines were being passed back to me without user IDs. And without user IDs, there's not really much interesting stuff you can do if you're an ad company, because you need to track users. But in basically three lines of real Ruby plus a header, we can basically take 10 billion log lines and just output the log lines that have missing user IDs, right? And if that might only be 10,000 log lines, right? So what I can do is I can use Ruby to take this massive data that's basically indiscernible, and distill it down into something small, right? Once I have those 10,000 log lines, I can probably download those onto my local machine, see like what time did the log lines start appearing, right? What time was the bug introduced? What time was the bug solved, right? One of the best things you can try and do, one of the things Ruby's so good at is taking big data and making it small, right? Because you really want to make your data much smaller to make it much easier to work with. So when it comes to Ruby and Hadoop, there's kind of three good options at this point. There might be more, but these are the kind of three mature options that we've experimented with and used, right? First off, you have vanilla Hadoop streaming. That's a script I just showed you. Basically, Hadoop will take every line that it sees as input and just output it to standard out. And then your Ruby script can just read it in and do whatever you want with it, right? This gives you ultimate flexibility and power, but tons of boilerplate code to write. So if your data is serialized, you have to deal with all the deserialization, right? Trivial things like I want to do group buy an account, right? That becomes hard because all of a sudden you have to write your own Ruby code to do a group buy an account. And it's a distributed account, so it's very hard. Conveniently, the guys at Infochimps wrote this cool library called Wukong, which is an abstraction on top of the vanilla Hadoop streaming, which makes it much easier to work with Ruby on Hadoop if you just want to use the streaming and regular Ruby. And what they do is they give you this kind of abstraction that makes it very easy to perform classic tuple operations, right? Something like a group buy account, a distinct, all those kind of things become very easy. But streaming is not necessarily the most efficient way to use big data. It's definitely not the fastest way you can run a Hadoop job. Really what you want is you want to drop down to Java, right? You want to drop down to the native libraries. And those cool Java libraries pretty heavily adopted. Everyone really uses it these days called cascading. Cascading is all Java and there's tons of verbosity. It makes it very easy to do the same kind of tuple operations, right? Group buy, account, those kind of things. But the problem is, is Java. So I end up writing like 100 lines of constructors, like abstract factory-factory group buy, like awesome. So the guys at Etsy wrote this cool wrapper, cascading JRuby, right? And what it lets you do is it lets you write JRuby scripts that end up translating down essentially to JVM byte code, right? So really the power of Hadoop, the power of big data is JRuby, right? JRuby is an awesome project. It gives you the full power of the JVM and it pretty much lets you do anything with Hadoop. You want to do in Ruby syntax, right? Sure you'll take a performance penalty, but most of us aren't running on like Facebook and Google scale, right? We don't have like petabytes and Zeta bytes. We have like maybe terabytes, right? JRuby's fine for that. You can use JRuby to write pig UDFs, right? Maybe everyone already likes pig. Maybe your company already has a Hive cluster. You can use JRuby to write Hive UDFs, right? Like custom Hive functions. Say I want to do something like geocode and IP address, right? That becomes a lot easier when you can write that using existing Ruby libraries. And much like the way Square rolls everything up before deploying it using their framework, you can kind of do the same thing with Hadoop, right? You just bundle up your JRuby in a jar. You just create a big Uber jar and you just ship it. And Hadoop is fine to run your JRuby. And we can really see the power of that with this example. So I'll let you read it and see if you can get the gist for what's going on. So yeah, this script basically is the classic word count example, right? So to give you a little bit of context, the classic word count example, if you use the raw Hadoop API, I think ends up being a couple hundred lines of code. If you use just vanilla cascading and you really compress it, it'll probably look a little bit shorter than this, but it's a lot harder to understand, right? The cool thing here is this is all the Ruby code we're so used to doing, right? Ruby gives us blocks. We can pass them around. It gives you anonymous functions, things that Java doesn't have, things that make it a pain in the ass to write cascading jobs, right? I have to create a class just for a custom function. This script basically lets us take however much data we want, right? Split it into words and count them, right? Something we all know how to do on the command line. JRuby gives you the ability to harness that across a cluster of a billion nodes if you haven't. So Ruby as a whole is a powerful tool for data collection and cleaning, right? We all know how to use data to, we all know how to use Ruby to clean data in Unix, right? Really, the great thing about it is you can use those exact same tools you're used to using every single day to build like data-driven products, right? Just because you have big data doesn't mean you can't do the stuff you're so used to doing, right? You write a script, you can run it on Hadoop. You want to use JRuby, you can harness the full power of Hadoop, right? Which just makes Ruby so powerful. When you combine that with the ability to easily collect data using things like Rails, it's such a natural fit for this hardest part of building a data-driven product because you will spend 90% of your time doing it. So if you can use Ruby for the thing that you spend 90% of your time doing, you already have a win, really, right? So once it's all cleaned, you kind of move on to the next phase, right? That's the statistical modeling and the prediction. That's kind of the glamorous phase, right? If you go and interview for a job, this is what they're gonna ask you all the questions about, right? Like what kind of distribution does my data have? All that kind of stuff. So it's definitely the glamorous part and it turns out really what you're trying to say in my personal example, right? I said I need to be able to predict a user's likelihood of being on multiple publishers, right? And what that really means is I need a function that takes as input a user ID and a publisher X and I need it to output me the probability that the user will be on publisher Y, right? So it's pretty basic, right? That's a pretty simple function, but it's gonna require me to do some stats, right? Like that's where the kind of data stuff really comes in, right? I'm gonna have to do some stats. But Ruby sucks the statistical computing, right? Like if you go and stack overflow and you post a question about stats in Ruby, somebody's gonna say, don't be an idiot, use R. But it turns out, who cares, right? Like most of your time is gonna be spent cleaning and collecting your data anyway. So the fact that Ruby might not be the best statistical programming language doesn't really matter. The other thing you can always say is have you actually tried running R in production? It sucks, right? Like to take code and actually build a real product that your business is gonna make money on, you need to be able to monitor it. You need to be able to deploy it easily. You need to know when an exception occurs. All those things that Ruby already gives you, you don't get when you try and run languages such as R. You can do it, but it's definitely not the best way. And since we're all Rubyists, we wanna know how we can use Ruby to kind of do the statistical modeling. So it turns out what people may say, Ruby sucks at stats. There's actually a pretty good selection of libraries that let you do most of the stats you only need to do to build the kind of products we're talking about, right? Stat sample is an awesome library. It basically goes through and implements most of the statistics functions you'd ever really need, right? Sure, it will maybe lack some obscure genetics algorithm, but at that point, you should probably be using another language anyway. We also have a PsyRuby, right? Which is a big push, kind of an attempt to make Ruby equivalent to PsyPy. And what it lets you do is that same stuff, right? It provides you with matrix libraries, right? All these things that we need to do stats are there in Ruby, right? Like we can use them. And then most importantly, we have libSVM, right? So support vector machines are one of the most popular ways to do classification right now. And there's a pretty battle-hardened production ready C implementation of SVM called libSVM. And there's a Ruby binding for it, right? So if you just go online and you Google Ruby libSVM, you're gonna find a good blog post by, I think, Ilya Grigorik on how to use it. And you'll just find tons of documentation, right? There's no reason we can't use these libraries because most of them are in C. There's Ruby wrappers on top of most of these hardcore libraries like libSVM. And if you really need to do some obscure stats, we have libraries like RSRuby and RINRuby, right? That basically lets you send Ruby objects over to R. So if you need to do some obscure complicated regression, right? Like some least-angles regression that isn't Ruby yet, but you already have your data in Ruby, like from ActiveRecord, say, right? You run a BigQuery, you have all these objects. You can just pass those over to R. R can do the hard part. You can just put a little bit of statistics in, crunching you need, and then give it back to you, right? And again, the other tool we really have is JRuby, right? I can't say enough, like JRuby really gives you the power to build data products in Ruby, right? You can harness the full power of the JVM. And when it comes to data stuff, there's a JVM library for pretty much everything, right? Like there's already a Java library for hard matrix stuff, sparse matrices, pretty much every type of machine learning algorithm you wanna use, every kernel density method you wanna use, they're already implemented in Java. And the great thing about JRuby is it lets us write this nice wrapper on the top of that stuff that we already, in a language you already all know, right? We already know how to use Ruby. And since the hard part has usually already been written, there's no reason we can't put Ruby on top of it, right? There's no reason we can't make it so that data products are approachable to all of our engineers. So really, although everyone will make fun of you for it, there's no reason you can't do statistical modeling in Ruby. And once you have that, you get to the final stage, phase four, which is publishing results, right? This is where you actually make money. If you don't get here, you haven't really made the business any money, which means you have an academic research project, not a product. And this is really, again, where I think Ruby is almost the perfect fit, right? So when it comes to publishing results, if you really wanna talk about a product, right? You're gonna end up building a web UI or a mobile app, right? So here's an example. That's ShareThru's analytics dashboard, right? So that whole thing is all Ruby. Because once you've done all that number crunching, all that modeling, you end up with actually really small data, right? And Rails is awesome at taking something from MySQL, from Postgres, and making it so that you can put it to the client, right? And once it's on the client, you have D3, right? You have high charts, all those things in JavaScript that we're all so used to using and we already know how to use. We just harness those, right? Like once you have the data distilled, once you've cleaned it, you've modeled it, and you're ready to present it, really. It's just building a web app or a mobile app, right? Here we have Yelp, right? Kinda the same thing. They've done tons of behind the scenes coding so that they can give you good recommendations filled throughout fraud. All of those things are already there, right? We already have that. And the reason we have that is Rails, right? So just because we're talking about building data products and we're talking about big data, doesn't mean we can't use Rails, right? When it comes to publishing your results, oftentimes you're gonna end up building dashboards, all those kind of things. Rails is great at them. So to kind of close my personal example, right? I said I needed to basically predict user overlap. So I'm not 100% done with that whole process, right? I've been having to iterate on those last two phases. I had a model. People didn't really like how they had to interact with it so we've gone back. But we get something cool like this, right? This is a little bit of eye candy I thought I'd put on the slide for you. And what this basically shows is a dense network of publisher overlap, right? So like dense publishers in the middle essentially have tons of crossover because every single white line represents a user who you've seen on multiple publishers. So kind of the nice thing, the thing I'm kind of trying to get across is even though you're building a data product, there's no reason we can't use Ruby. However, I have to note, unfortunately, I did not generate that graph with Ruby. I was a little bit dirty and I used Python. And that's because even though I love Ruby and I love the stuff you can do with it to build data products, it's not all roses yet, right? There's definitely things I think we as a community need to improve and come work on to make it so that we get more people talking about doing data things in Ruby. And kind of here are the four things I think we really need to work on. The first one is kind of a graphing library. I know there's been attempts, right? Like there's RubyViz attempting to port ProtoViz, but there's no really Ruby equivalent to the things people are used to in Python and R, right? In Python, you have matplotlib, in R, you have ggplot2, right? If Ruby could just have this centralized, adopted graphing library, it would go a long way because you spend a lot of time doing graphs when you're doing kind of exploratory analysis. And kind of the second thing is a unified matrix and vector library, right? Machine learning, if any of you guys do it, you pretty much know what it comes down to most of the time, it's just matrix math, right? You're like, you're trying to do some matrix transformation so that you can get like a matrix that you can work with, or you're trying to like fill in missing gaps in a matrix, right? All that kind of stuff. It'd be great if Ruby had kind of a more unified, centralized matrix library, right? The guys at CyRuby are working on it. I think there's a couple of other libraries out there if you look on GitHub, but there's no kind of centralized knowledge, which is really what we need, right? Like it's not that Ruby can't do it, it's really just that we need to like centralize discussion around it, right? Just like we have that centralized web framework, which made it so that everyone started doing web stuff and rails, and kind of rails to a certain extent helped us beat Python for web stuff, we kind of need that for Ruby. And really it comes down to those last two things, which is we just need more publishing, right? Like we need people talking about it more. We spend a lot of time in the Ruby community talking about like TDD and OO, and all those things are great, but it would be awesome if we started to get a little bit more publishing around kind of Ruby and machine learning and how we can really use Ruby. And that kind of ties into last one, which is academic buy-in, right? So like most of the time if you guys are going out to try and hire people to like build data products or do stats or be analysts, they're mostly academics, right? And academics tend to use Python and R, simply because there's not a lot of academic buy-in at least in California universities for Ruby. If we could just kind of get the academics to accept Ruby, see that we can use it to do kind of computer science-y things, because that's what computer science programs are all about, it would go a long way because people would come out into the workforce already knowing that they could use Ruby. So even though it has its words, I definitely think Ruby plus data equals agile data products, right? It gives you the ability to iterate really quickly. You can harness the power of Hadoop, you can collect data really easily, and then you can present the data that you've built using Ruby, using Rails, using Sinatra. And finally, my obligatory We're Hiring slide. So if you wanna work with me on data things, feel free to email me or visit that link. Thank you.