 Thanks for coming. This is the first time I've spoken about this in a couple years. I did short lighting talk about two years ago, so I'm happy to have the chance to talk about this project. This is a project that I've been working on for a little over a few years. The repo started I think in March of 2011. I made it public a little over a year after that. It's basically me learning scholars a couple years. So as a part of that journey, at some point, learn the category theory is something that you should aim for as a culture programmer. That's kind of one of the end stages. But I started to realize, I just went to Vlad Petroshev's talk on heterogeneous mistic type theory. I've been to a number of his categories in the Bay Area, and every single one of them is well over my head. But I started to realize that was because there is actually this, all these other milestones in between what most of my computer science education was, comes from chapter one of, this is a table of contents from an abstract algebra textbook. And in computer science, yeah, maybe you also get some regular expressions and context regrammers, that sort of thing. And of course, vector spaces, matrices and lattices come up every now and then, but not necessarily in a very rigorous way. But discovering abstract algebra as a subject gave me some faith that maybe there was actually a path or roadmap to get to category theory at some point. And of course, one of the first stops in that journey are monoids. And I remember Heiko Sieberger asked at Scholar Days in 2011, you know, how can I learn all this stuff? And his honest answer was learn Haskell. So I took that to heart, got to learn you a Haskell book, sponsored by him with that. And monoids gave me the first, first quick hold in his journey towards increasing abstractions. And also, Nick Partridge has a great demo, it's up on Vimeo where he starts with summing all the stiff integers he derives in something like 17 easy steps. Half a dozen type classes including monoids. But it's a great, the syntax is a little out of date, but it's a great hands-on way of understanding where these things come from. And of course, after monoids, there's all this abstract algebra stuff. The first hint I had of that in the Scholar ecosystem was Algebra in 2012, seeing Oscar Boykin, Sam Ritchie. Yet abstract algebra running in large distributed systems was really inspiring. And then a few months later, any scholar in Philadelphia, Tom Spitzer, gave his Life After Monoids talk, which is available on YouTube, where he talks about fields, groups, rings, and other patterns for abstract algebra and how they're actually useful in day-to-day programming. And then Adover Chang, who's here today, also gave a great talk last August as a scholar by the Bay. He talked about pretty hard isomorphism, the deep correspondence between cultural programming and logic. Once they started finding these things, these are country actionable projects and concrete syntax that really helped towards the goal of category theory and subjects like that. I worked well on this too long, but these are all just definitions pulled from Wikipedia of a lot of these patterns from abstract algebra. And many of them we can find examples for just in the energy, the energy addition as an example of several of these things. A ring, the addition and multiplication of smart matrices is a ring. The field requires the vision, so if you look to the reals or the rationals and vector spaces, you'll be somewhat familiar to most folks. So the goals of this project, the first one is to cover a first set of algorithms. So I'm not working at the level of spire or cats myself directly, but I want to understand those things. And so in order to do that, I want to cover a wide range of algorithms. So I'm pulling examples from the last couple of decades of horsework, computer science, or otherwise anything that evolves in programming. So that's natural language processing, biochromatic statistics, subjects like that. And the diversity of platform is also an important role. So in the middle there where a lot of, this is a selected set of depends on relations from my project as parts of the related parts of the ecosystem. So in the middle is the axle core, and that's just some type classes like functor and that sort of thing. Axle core only depends on spire and discipline, discipline. We heard earlier today in the cats talk, it's just a way to organize some of the scholarship properties that define the type classes. So I'm trying to keep the set of dependencies small from axle core. And then at the algorithms, the natural language processing, all that sort of thing is coming is also, it's just depending on axle core. And I've got some games defined. And so for instance, so axle algorithms, there are many algorithms that require a matrix to implement. So linear regression or Lederstein distance is expressed in terms of matrix operations. But I don't actually have any concrete matrix implementation in axle core or anywhere in the class path. But what I do have is a small jar called axle j-blast, which is just a set of witnesses for the linear algebra type class and some other type classes. So to make a concrete deployable piece of software in terms of axle, you would just include j-blast, include axle core, axle algorithms, and a little shim that is actually j-blast, which is the type class witnesses. A couple of other things about the left-hand side of this chart. I gather that both algebra and spire will both be factored in terms of this project called Algebra at some point soon. And the cats already is. So at some point, I would actually love to get rid of axle core and just depend on these kinds of fundamental libraries. And then over on the right, I'll just point out that I would also love to get this stuff running in a distributed fashion. For instance, on Spark. So I have begun a few months ago defining type class witnesses for RDDs, for Spark. So resilient distributed data sets, things like functors. So a Spark RDD has a map function. So why not define a functor for an RDD? And then the last priority I'll mention for the project is documentation. So here we have a little bit of documentation from a Kamey's clustering algorithm. I saw that the top was mentioned earlier. I would love to start using something like that. I was just speaking to Rob earlier. If I could get embedding images produced by some of these code, that would be the last little feature I need. So trying to get to the point where I can basically just cut and paste any of the documentation in a rebel. That should aid in the adoption and the clarity of the code. Some of the other things, of course, I want this stuff to be correct complete fast. And I also want to provide access to a diverse number of data sets. But it's just me. So some of these things are sort of a lower priority. There is a test suite. There's a lot more to be done there. I haven't really focused much on speed at all. I'm trying to build on nice, fast components. But the loop code that I'm writing has not been tested so much in terms of speed. So here are some ways to get in touch with me. I'd love to hear what you think about the project. I'm also now going to show a few examples just in the scholar worksheet of how this stuff works. So the first area where these abstract algebra patterns made sense was units of measurement part of the library. So in this part of the library has seen probably the most change over the last few years. And in hindsight, it makes a lot of sense that we can use abstract algebra to apply to the units of measurement library. In the end, we're just taking quantities and we're tagging in with minutes. But we're doing, we're supporting all the same mathematical operations that the real team had, for instance. So one interesting thing that's different about this from a lot of the units of measurement library is that the units themselves are not represented as types. Rather, the thing that's being measured like mass or distance or time, that's represented in the type system. So let's see. And also I'll mention I've got a little show type class that allows me to pre-print these things. Here we have an example of two graphs. The examples here are all just from mass. And then here's where some of the spire comes in. I think typically you would just import spire dot implicit dot underscore, pull in a bunch of syntactic sugar for the purposes of this talk. I'm being really explicit. And also I found from time to time it can lead to some ambiguities if you just pull everything in. I'll pull in just this text that I need. So with the additive semi-group I can perform addition on units of mass. So 10 grams plus a kilogram is 1.01 kilograms. And I've got to choose which unit to express the result in. So I choose the right-hand side. I'm just doing that. With the group ops, that gives me subtraction. So the same two quantities subtracted gives me a negative kilogram quantity. And there's an in function which allows me to convert. And if I were to try to convert this to a feat or something like that that would make sense, I would get a compound number. And the implementation, it's very simple. Again, I'm not super focused on performance just yet. So these things are just case classes. They have a magnitude and a unit. They have two type parameters. One is the thing that's being measured at mass, distance, time. And the other is the number, the magnitude type. So in this case, I can just, like a random function that takes a unit of quantity. And assuming I know how to show the number type, I can show the unit of quantity. I can actually also create unit of quantities on number types like something random, like a cat. So we've got this cat case class. I can define a witness for show. We've got a couple of cats here. These are my cats. And I can prove that the show works. And now I can say Cody, who's a cat, Cody Graham's. And now this function that I wrote up here does the right thing. But because I don't have a field defined on paths, I can't pass it to a function like food. I can't convert paths from one unit to another. In this example, I'm simply doing addition. But what I described earlier, there's actually conversion potentially that has to happen during addition. That's why even addition requires a field on the number type and this converter for that quantum in that number type. So I do have a field defined on doubles. And so this expression here works fine. We've got a couple more examples I wanted to show. This one is an edit distance, Leverstein distance. Who here is familiar with Leverstein distance from natural language processing? I've got, I can show it briefly, the Wikipedia does a good job. Basically it builds a matrix using one string down the left-hand side and another string top. And it fills in from top left to lower right. And at the end, the value in the bottom right is the edit distance between the two strings. So most implementations of this don't, they'll mention that it's, you know, they'll call the function a distance but they're not really formally treating it as a distance. So I'll show, I'll work up to what that means in a bit. But here we don't actually need to fold in your algebra on this because we're just filling in values but I'll create one anyways. It's a super set of what I need for this. And I'll create a space object. I'll make that a physically available object. And this thing can answer questions like the distance between the two strings. You'll see here this string on the right is missing the C and this one has got the wrong vowel so that's an edit distance of two. This one has got the right vowel but still is missing the C so that's a distance of one. And these are the identical strings so the distance is zero. And then if I import the syntactic sugar from Spire that allows me to use this infix distance operation, I can create expressions like this. Which makes it feel a little bit more like an actual ESL. And the nice thing about having to find it as a metric space is that I can test that this thing obeys all of the laws that government metric space is like from the triangle of the quality. So any distance between two points should be shorter than if we have to drop through some other point. So that gets tested. And the last thing I'll show is a little demo about a spark application of the stuff to spark. This is a fairly contrived example but this is the Monte Carlo estimation of pi other than it comes from the spark documentation. Basically it just says if you have a random number generator you can generate numbers between zero and one. Generate two of those to represent a point on the unit square and then compute what share of those fell within the unit circle that was centered in the lower left. So the more times you do that the more accurate your estimation of pi is. The difference in this version here is that the number of trials which in this spark documentation that's just an RDD where the size of the RDD is equal to the number of trials. So I've abstracted out the RDDness of that example and said that all I need of that thing be it an RDD or whatever is that as a function that I can form aggregations on it and that it has a size to the finite. And you see here, this is the finiteness of both fries. This is the functerness and then inside of this sigma is the ability to do aggregations, distribute aggregations. So what I've got, let me actually switch this up. Graduation part of this? I do, yeah, yeah. So for a lot of the common scholar collections I'll define aggregation. That's actually, I noticed that Spire had a, this came out of wanting to use Spire's unicode sigma on parallel sequences and so I looked at it and noticed that it was implemented in addition to the pool so I implemented in terms of aggregates, submitted that. I then got thinking, well, what would it take to run that on Spark? So that's what it's for. I think you're not going to summarize this now because of the closure detection. So initially I commented out that the Spark stuff initially for this run. So this is estimating Pi using a parallel sequence to represent the trials. We get something 2.23. And if I comment this out and replace it with an RED, it will turn for a while and this is just running locally. But it is Spark and here we can see that we get another estimator of Pi. And there is some serialization baked into the bumper witness so it serialized the aggregation operator. So I've got a few other examples I could show but I made a cap at that and just take a few questions. Did you say that you have to find a functor for RED? Yeah, that's it there. How did you deal with the site tag? So I pulled it along. I don't have a good workaround for that. So it's a functor plus class tag, which I'd love to have a workaround for that but as far as I know there's not a good workaround. I think this is really great. And I don't know why I didn't find out like an airplane from Atlanta for this kind of stuff and we survived it. Do you think it would make sense to do some kind of community side so you would like to find stuff? Yeah, I'd love to hear suggestions about that. I've been really just sort of working on this in private. I haven't really done much marketing. I know Spark has some new community efforts and advertising for that community. Yeah, whatever you think, I'd love to talk about that. I would say more like for a scholar in general. Yeah, yeah. I think like people either don't seem to like be really awesome for them and they're most of them I might deserve it and if they put that people are like just centralized way to oversee all the structure. Yeah, absolutely. Yeah, these conferences are one way to do that but this doesn't scale out enough. Anything else? Yeah, there's some fun visualization stuff in there. Check out the contact information out there. But also some fun visualization stuff too. I'll just leave you with this. So this was tweeted a while ago. This has been a distribution plan account for ten weeks. I've got stuff in there before. We have art charts and time series and some of which can be animated. I'd love to hear if you guys have any thoughts.