 Next up is Trevor Grant, who will be talking about one of the oldest but still going strong projects in Apache Software Foundation that have to do with big data, Apache Machout. Who's everyone doing today? The magnificent modular Mahoops. My name's Trevor Grant. I will be giving this quick lightning talk. I'm not going to dwell on myself very much because we don't have a lot of time. Suffice to say, I formally trained in mathematics. I had computers to varying degrees of success here and there. I work at IBM. They brought me out here today. I also get paid to work on open source. It's wonderful. If anybody has the means to do so, I highly recommend getting paid to work on open source. It's very rewarding. Apache Mahout is a big math library. I'm going to kind of get into what exactly that means. Big data, I know it's high performance computing, but that can mean different things. Big data, users, some. It's linear algebra, unlike machine learning. That is part of the reason that it is not necessarily the most popular projects because it is a project that is that thing that you hated to do, but you kind of muscled your way through to get through college. Advanced linear applied linear algebra. And there's GPUs, I guess. I don't know. I made these slides recently and it was kind of a hip, it was quick. Clusters versus supercomputers. The idea of a big data, this big math, is that we could do a supercomputer. You could buy a big fancy supercomputer that would be able to do your matrix inversion. But those are really expensive and there's a lot of other costs, things that go along with that. So we maybe want to just buy a bunch of cheap commodity computers and we'll have one part of the matrix over here on this computer and another part here, another part. So you've got this matrix that's now spread across, let's say, 10 or 20 or 100 different computers. That is the problem that we seek out to solve. When you have a matrix that's split up like that, simple things like finding the inverse, doing the compositions becomes tricky. The other problem is that if you're writing in Scala or a lot of these other big data languages, you don't have a mathematically expressive, even TensorFlow, which is Python and Python is supposed to be very expressive, it's still not very expressive in the case it's like dirty Java. Enter, we also have a Scala, an R-like Scala DSL. So this is Scala code, so your matrix A transpose times A is expressed like this. Very easy to read and this is important because if you're trying to express these very difficult math formulas, you want them to be as easy to read as possible so that you can keep up the code, so you can maintain these things over time, et cetera. Distributed row matrix in general is like in RDD and Spark, it could be a data set and flint batch, it could be a SQL table. It's whatever big data engine you have, you, to say, this is going to be my matrix. And I'm saying this because one of the big features of Moot is that it's very modular. You can plug in your own engine. These are flint, Spark, and H2O are the three out of the box engines. However, you can write bindings for just about anything. And what that means is you're going to, I'm out of order, you implement like matrix A transpose times A, matrix A times B. You implement some of these things, just general BLAST operations for these big data sets. It's a library that runs on your distributed engine. Now locally, you've got these small little chunks of your matrix. You've got one chunk here, one chunk there, one chunk there. So when we get down to the, in, like, on each node, the level that we're running, we also have optimizers that will run BLAST operations locally that you can optimize out of. So you have, like, a JVM. This is usually JVM type of stuff. Instead of using the Colt BLAST packs, you can use OpenCL and run CPU BLAST packs. You can run Vienna CL, that also is the GPU BLAST packs. You can have CUDA BLAST packs, whatever you need to do. This is also very exciting because if you are doing IoT things, you're doing some sort of IoT operation and you've got some weird architecture and you want to run really fast on that architecture, you can let whoever knows that architecture really well take some fairly well-defined, you know, I guess not fairly well-defined, it's matrix, awesome. It's matrix operations, matrix A times B. They can implement it for that architecture and then at compile time you just change the dependency and now all of the code, all of the advanced math code runs now on your edge device is the kind of exciting part. And so, what this all, to kind of like tie this all together, why this is all exciting? I bet you, it supports doing this data science or big math or whatever you want to call the buzzword AI, but it makes it a team sport. You have the really smart math people who can use, maybe program a little bit, but they're mainly doing like a scholar, this art, this very mathematically expressive scholar DSL. You have database engineers who really understand how to high-performance tune their database. All they need to do is take these well-understood matrix A times matrix B. They implement those on the database. If you want to do specific hardware acceleration, you can find someone who knows that hardware very well. And it allows this divisional labor where people who, everyone who knows their specific piece of the pie can really just go all in on that little piece and it all comes together into this beautiful system. The other really cool thing there is you start working in a regular like business or foundations or wherever you're at, you can have, you know, someone writes this very difficult algorithm and it's documented, but there's only maybe a few people in the world who really understand this algorithm and maybe one of the person who wrote it is left. Well, that kind of sticks your organization on that architecture because nobody understands how this works to even begin to port it over. Well, the upshot is if it's written in Mahoud, all you need to do is rewrite the back end and everything just picks up and lifts. You can just, you change your back end, there's a level of abstraction there and you go to your next engine as time goes on. Math nerds write the algorithms once. Oh, but wait, there's more. Best in class recommenders. Mahoud has always had a reputation for having best in class open source and even close source recommenders. The universal recommender of Apache prediction, IO incubating, cars.com uses it. There's other places that use it, but I'm not sure exactly who I can talk about. So suffice to say though, this recommender engine is in production at a lot of places because it's a very, very good recommender. It is also not MapReduce. I very much hope that everyone goes out and checks out Apache Mahoud and all this and you're going to run into a lot of stuff talking about Mahoud on MapReduce and how Mahoud's dead because MapReduce is no but no more. We're not, however, everybody who's come up with an ML package and they want you to buy it since 2014 always holds up Mahoud because Mahoud was the machine learning of MapReduce. They're like, we are 100 times faster than this same job on MapReduce and they're talking about Mahoud. And it's all Mahoud, so it's whatever. It's a very important project. A lot of projects depend on it. If not explicitly, code gets copied and pasted out. So please, if you're into math or this kind of stuff, join our mailing list, commit some code, et cetera, et cetera. We have a really nice new website and I nailed it. Again, one question from the audience. Anybody? All right. Well, let's thank Trevor.