 I'm going to be talking about a new programming language that I've been working on. It's called Unison, and it's a really big, big project, a lot of new stuff in it, and so I'm just going to be focusing on one small piece of it, which I'm very excited about, which is its support for building very large-scale distributed systems in hopefully very little code. So just to motivate things a little bit, this slide is very much motivated by my, or inspired by my daughter, Ariana, this is her, her brother on the right there, and yeah, programming is awesome, right? It's fun, it's interesting, and I guess when I think about the times that I am, that I most enjoy programming, it's when I'm able to focus on the essence of the problem, the essence of the code. So the core data structures, the algorithms, the business logic, the stuff that I'd actually be excited to tell a fellow programmer buddy over lunch or something. And programming, it's certainly some of that, but it's not all that, right? There's a certain amount of kind of boring or tedious stuff we have to do, maybe stuff that feels needlessly complicated, and I don't know how many people sort of feel this way, but maybe sometimes it even feels like that's the majority of what you have to spend your time doing as a programmer. And so, yeah, I really wanted to try to do something about that for a while, and that's kind of why I started working on Unison. So I think that a lot of the sort of boring stuff we have to do as programmers, it has sort of a root cause, which is that our current model of what programming is is being sort of crammed into this tiny little box. And I'm going to explain what I mean by that. So I'll start by making what I think is a somewhat troubling observation, which is that what we currently think of as a program is actually just a description of what a single operating system process should do, right, in the sense that you run your program and what happens? An operating system process starts up. And while it's certainly true that that OS process can communicate with other processes, there's a certain amount of friction in doing that, right? You generally need to work out some sort of protocol that these processes are going to use to communicate. Maybe you're even parsing and serializing JSON and issuing HTTP requests. You're doing stuff that's not very interesting, and it feels sort of ancillary to the real thing that you're actually working on. And so, yeah, I guess if you sort of look back at the sort of history of computing, there have been various times where we've needed to expand our concept of what a program is. So sort of used to be, you know, in the early days of computing. By the way, I don't know if Joe's in the audience, but Joe's talk yesterday was great. But yeah, in the early days of computing, you know, a program was just a description of a single sequential thread of control, right? And that worked pretty well for a while until computers started getting more and more powerful and we started building these multi-tasking operating systems so we could kind of better take advantage of all that power. So with that, we got this notion of the operating system process. And you know, we were sort of writing programs with that sort of model of what a program was for a while, and it worked pretty well, kind of up through, I would say, kind of the era of shrink wrap software, right? Because in the era of shrink wrap software, it was, you know, you sort of wrote your program and compiled it, you know, to your executable, and you're sort of done with the software writing process, you know? You would stamp that on CDs and ship it, and you were just printing money at that point, right? Okay, but things have changed in the last 20 years or so. So now we're in the age of the internet, and I love this cheesy clip art, by the way, particularly that middle image there is really great. But so, yeah, now we're in this age of the internet, we're building these larger and larger software systems, and it is increasingly the case, I would say, that what a single operating system process is doing is like this tiny percentage of what we actually need to be able to specify to build a large-scale system. So we need to be able to talk about all these other things. We need to talk about provisioning of computing resources, deployment, orchestration, scaling, you know, failover, all these other concepts that our languages aren't really the best at talking about. And so I would say we kind of have this gap. It's the gap between what our programming languages can talk about and specify well and the set of things that we actually need to specify to build a large system. And so we've kind of been filling that gap with all these different, I would say, special purpose technologies. So, yes, we need to sort of fill that gap somehow. But I would say the way to do it isn't necessarily with special purpose technologies like this, which are, they're kind of like machines or appliances. They solve sort of one problem well, but they're not a general purpose tool. They don't have the same power that a real programming languages have, where you can, you know, a real programming language, you can abstract over things, you can compose things however you want, you can factor the code in lots of different ways. And so we see this proliferation of technologies and things that you have to think about. And they're sort of always going to be insufficient, because we're always going to come up with some new use case that isn't covered by these existing sort of special purpose tools. And it also just, I don't know, personally, it makes me very unhappy to have to think about, you know, you might have to think about 15, some very large number of these different technologies that you have to pull together to assemble a large system. And, you know, I don't like having to think about that many things when I'm programming. Okay, so I think a better model is rather than sort of filling that gap with special purpose tools, I want to simply expand our concept of what a program is, so that it can talk directly about things like provisioning, deployment, orchestration, large multi-node computations. I want to just be able to describe all that with a single program. And that's sort of where Unison comes in. And that's what Unison is attempting to do. So for the rest of this talk, I'm going to sort of introduce Unison. I'm going to work up to the code for a search engine in Unison. It's going to be very little code. And so we'll sort of get to see, you know, what it's like programming in a language that can talk about all these other concerns that we currently have to specify sort of in other ways. And then after that, I'll talk a little bit about sort of how this is all done, like in the Unison runtime. Okay, so just as a quick intro here. So Unison is a statically typed purely functional programming language. I would say it's probably most similar to Haskell, if you've heard of that language. Just to, oh, but it has sort of this magic power. And that magic power is we can transport an arbitrary Unison value to another Unison node. And I'll show you how that works. So just some basic syntax here. So you apply a function to some arguments. We just list the function name and then the arguments separated by spaces. That's that first example there. And then type signatures look a little bit different than you might be used to. This is the type signature for a sort function. So it takes an ordering of A, a vector of A, and it returns a vector of A. So we just have the arguments and the results separated by these arrows here. Okay, so let's look at our first function definition. This is the factorial function. And it's this function that is doing a couple things. First we take the numbers from 1 to n. And then we're just going to multiply them all together with this fold left. And we could leave off the type signature. We don't need to say that it's a function from number to number. Unison is inferred, so that would be figured out for us if we didn't supply it. Okay, so we can evaluate factorial sort of locally on our current node. Or we could evaluate factorial at another node. So here's kind of how that works. So now our factorial function is taking a node, which we'll call Alice. And you can see in the body of the function we're starting this remote, I'll call it a remote block. And in the remote block, the first thing that we do is we transfer control of the computation to the Alice node. And so now the rest of the computation is going to happen on Alice. And the first thing we do on Alice is we're just going to evaluate this pure operation of factorial then. And what will actually happen at runtime is potentially the definition of factorial will be synced to Alice's node if she doesn't already have it locally. But yeah, so this is, and then one more thing is you'll note the return type of this function is now a remote number. It's not just a number. So that's sort of signifying that this is a function that may, whose evaluation may happen on multiple unison nodes. Okay, so let's keep going. So most interesting applications have some sort of persistent state that they need to manage, right? And so currently unison has one built in type. It's just a local persistent key value store, which is called index. And here's a subset of its API. We can do three things. We can create an empty index. We can insert a key value pair into the index and we can look up a key in the index and get back an optional value. And these key and value types can be anything at all. We don't need to specify a serialization format or how we're going to translate it to the database, anything like that. And then just a couple of new things. Unit is, it's just sort of like void. There's only one value of that type, also called unit. And then optional, we use that if we may or may not be returning a value successfully. There's two cases, it could be some or it could be none. Okay, so here's an example of using index. So we, again, we have a remote block. First thing we're gonna do in our remote block, we're gonna transfer control of the computation to Alice. Now we're on the Alice node. Alice is going to create the empty index and she's gonna insert a couple key value pairs into the index. Then we transfer control of the computation back to Bob. And then Bob is actually gonna do a lookup in that index that Alice created. So these index values are serializable and they can be transported to other nodes just like anything else. And so that lookup that Bob does will actually be contacting Alice's node because that's where the index is actually stored. It's in Alice's local storage. So these index values, they're not like sort of global distributed mutable state. They're sort of owned by a particular node that created them. All right, so far some like pretty simple, I hope, primitives here and nothing too complicated, at least at the sort of API level. But we have kind of most of what we need to actually build a search engine in about 15 lines of code, at least the core of the search engine. So before we sort of get to the actual unison code, let's take a step back. What's the actual data structures and algorithms we're gonna be using for our search engine? So really simple search engine, you have some sort of search index. You have a mapping from a search keyword to a set of URLs which contain that keyword, right? So the keyword programming, there's a bunch of URLs whose page, content contains the word programming. And same thing for these other keywords. We could have a more flexible notion of what a keyword is. It could be a bi-gram or a trigram or some processed form of keywords. But we're just gonna ignore that for now. Okay, so to do a search in this index, it's pretty simple. We say we're doing a search for unison programming. So there's two keywords in our query. So what we do is we look up in our index, we look up for what's the set of URLs that contain programming? What's the set of URLs that contain unison? And then we're gonna just take those two sets and intersect them, all right? Simple, so let's look at the unison code. There's gonna be a few new things here, so I'm gonna walk through it. So don't be alarmed if there's anything that you don't quite understand yet. So this is the advertised 15 lines of code here. So let's just sort of get oriented. Let's look at our type. So our search function, it takes a number of search results that we wanna return. So we might not want the whole set, we might want just the top ten. It takes a query in the form of a vector of keywords, and it takes a search index, and then it returns a vector of URLs. Okay, what's search index? So search index is this dindex type, which I'm gonna explain in a minute. But the first thing we do in the implementation here is we're gonna loop through our query, and for each keyword in the query, we're gonna look up the set of URLs for that keyword. Now we have a list of these sets of URLs, and we're gonna intersect them. I guess before we do that, any keyword that is sort of not found in the index, we're gonna map that to the empty set. Once we've done that, we're gonna take all these sets, we're gonna intersect them. And we're only interested in the top ten or whatever it is. So we're going to take the first limit of that set of URLs. So there's a couple of new things. So one is this index traversal.intersect business. So we're actually a little bit smart about how we're doing these set intersections. We're doing the minimum amount of work necessary to just generate, say, the first ten results. So we're kind of like lazily producing that result set, rather than sort of fully instantiating these massive intermediate sets. That's just like a little generic library. It was about 70 lines of unison code. Nothing to do with search engines, really. You could use it for lots of other stuff, too. Okay, the other thing that's used here that's new is this dindex type, which is a distributed version of that key value store type that I showed a little bit earlier. And I'm gonna kind of dive into that a little bit. So we're sort of used to thinking of distributed data structures, algorithms as like these very complicated things that require mountains and mountains of code, right? And I feel like that's not necessarily true. I think when you have a language where it's very nice and pleasant to do this distributed communication, you're sort of able to focus on the essence of the data structure and the algorithm. And yeah, distributed data structures and algorithms, they work differently than they're sort of single machine counterparts. But that's fine. I mean, they're not necessarily extremely complicated. So for our distributed key value store, it's based on this very simple idea, which is that we're gonna have a directory of nodes. And each node in the distributed key value store has its own local storage. Okay? And so nodes are allowed to enter and leave the cluster that's backing the D index. And so, but we sort of have this question of if we have a key, say Alice, and we have a cluster node one, node two, node three. How do we know what nodes to, or nodes to contact to do a lookup or an insert, right? We don't wanna have to contact every node in the cluster cuz the cluster might be huge. So there's a very simple approach that is used sometimes. It's called rendezvous hashing or highest random weight hashing. And the idea is this, we're gonna just compute a hash of the pair. Node one Alice, a hash of the pair node two Alice, and so forth for each of our nodes. And we're simply gonna pick the node or nodes whose hash value is highest. And those are gonna be the nodes that we consider responsible for the key Alice, okay? So it's really just a random uniform way of partitioning our key space up among the available nodes. And so here's sort of the core part of the unison code where we're picking the nodes for a key. The important part here is we're calling that hash function on a pair of a node and a key. And we're sorting by that hash. And then we're taking the top K. So you might have some replication factor for your distributed key value store. So if one node goes down, you don't lose any data. But okay, so it's pretty simple and dindex is again, it's totally generic, has nothing to do with search engines specifically. It's just a generic distributed data structure. And I think the implementation is around 100 lines of pure unison code. So yeah, I like this. It's like simple and I don't know. I feel like I'm able to focus on stuff that's important about what I'm trying to do. So I haven't talked about creating nodes yet. So okay, we've got this great, this nice distributed data structure. We've got our search engine, our search lookup algorithm. So now we just need a massive cluster of unison nodes to sort of deploy it all to, right? So there's a primitive, it's called remote.spawn. And it returns a remote node. And here's an example of how it's being used. So we spawn a node, then we transfer control of the computation to that node. And then we're gonna do some other stuff. And that other stuff that we're doing, that's gonna now happen on the node that we just spawned. You might ask, well, okay, we spawned a node. But where exactly did we spawn that node, right? And that is actually, it's left deliberately somewhat underspecified. So it might be literally on the same machine as sort of the originating node. Or it might be just in the same data center. Or it might be sort of just in the same data center region. All those might be fine choices. But let me just show kind of an example of how we might use this to provision and deploy our search engine. So we have a node, imagine we have a node for each of the computing regions where we might wanna spawn new sort of computing resources. So we have a node for US East, a node for EU Central. We could have a node for EU West, etc. And you can see, here's an example of how this could be used. We create an empty dindex. Then we're going to spawn a bunch of nodes off of the US East node. We're basically gonna be provisioning nodes from the US East region with this unison code. And we're gonna do that 10,000 times. So we're gonna get back this list of 10,000 nodes. And for each one of those nodes, we're just gonna loop through it and have it join the dindex or the cluster that's backing the dindex. So yeah, what are we doing here? We're provisioning new computing resources. We're deploying code to those nodes that we've provisioned. And we're doing all that with just sort of ordinary unison code. So we're not calling out to all these special purpose tools. We're just writing regular code. Any sort of patterns we notice while we're writing this code. We could factor them out into generic functions and reuse them. And this is the kind of thing that at least I want to be writing. I don't want to have to deal with all these other tools when I'm building a large system. Okay, so I'm gonna post sort of all the code including these generic utilities I wrote very soon on the unison blog, along with some sort of demo of the search engine actually running. But yeah, I want to talk a little bit about how this is all done. Cuz maybe it just seems like it's just too magical, like this can't possibly work, right? So let's go back to our very simple example of factorial. Evaluating factorial at another node. So we kind of have this question, right? If we're gonna evaluate the factorial function at Alice, how exactly do we tell Alice about factorial? So we could just send her the name, factorial, right? But then that's maybe a little too fragile. Cuz how do we know that the meaning that Alice has assigned to factorial is the same as the meaning that we've assigned to it? She could have a completely different code base with different set of libraries, right? So that's kind of not, it's too fragile, not very scalable. So the unison solution is we're going to identify things not by their name, but by a cryptographic hash of their content. So, and it's actually a nameless, what I call a nameless hash of the content. So it actually doesn't matter whether we call the function factorial or whether we call it blah. It doesn't matter whether we call the parameter z or whether we call it n. It's the same function, right? And it's gonna have the same hash, let's suppose that is q82 something. So what we're actually sending to Alice when we send her that expression, factorial of n is not the name factorial applied to n. We're actually sending her that hash, the hash of the specific factorial function that we're asking her to evaluate. And you can maybe see how this sort of solves the problem of, how do we transport an arbitrary value, including possibly a function to another node, is Alice can just check her code store. She can see, I already have that hash, great. It's gonna be the same function, so I can just proceed with evaluation. Or if she doesn't have that hash, she can ask the sender, hey, I don't have that hash, can you send it to me? The sender sends her a value, she verifies the hash, adds it to her code store. It's cash for next time. So this works really well, it's very robust. It almost feels like cheating. I mean, I think it's a great idea and it's vastly simplified a lot of aspects of Unison's runtime. It's in its implementation. But it does have some very interesting consequences, which I'll just sort of give you a sample of. Which these could almost be another talk in and of itself. But so what it does is it changes our model of what a code base is. So we're used to thinking of a code base just like this bag of text files, right? And you just modify those text files. That's how you edit and evolve your code base. But in Unison, you don't really modify anything. So if we take our factorial function, and we modify, quote unquote modify it. So yeah, and remember, say it has that hash q82. If we now modify that to factorial then equals 43. Well, now this is actually, we haven't really modified anything. We've simply introduced a new definition that has a completely different hash. Zz, zz, z. And all of the references to factorial, they're referring to it by hash as well. So they're not like automatically updated to point to that new hash. And we probably wouldn't even want to do that anyway because it's not really safe. This sort of raises a lot of questions about sort of the overall programmer, you know, user experience when you're developing code. Like how do you do things like refactoring? Yeah, how do you do things like dependency man? So it's like all these calls into question and like sort of the basic workflow that we have as programmers. But it turns out that there's a lot of really big benefits beyond just the distributed programming stuff. There's a lot of big benefits to representing the code base as an immutable data structure. So just to at least give a preview, there's much better, you can have much better support for large scale refactoring than we have right now. You can have much better support for dependency management. And so you sort of have all these benefits that kind of come with that. And I'd love to talk more about it. But if you do want to read more, there's a lot more on the Unison blog. So unisonweb.org, it's mostly a technical blog. There's design posts, there's status updates. And definitely encourage you to follow along if this sounds interesting at all. And lastly, just want to thank, first of all, the full stack fest for putting on an awesome conference. Yay. And also just a number of people have been helping with Unison, either with actually contributing code or just kind of consulting, advice, ideas, and so thanks to all the folks who've been helping. One last thing, Unison is of course open source. It's being very actively developed, so things are changing very, very rapidly and definitely fall along if you want to sort of see where it's going. All right, that's all I got, thank you. Which advantages does Unison provide over the distributed currency model of Erlang? Yeah, I was expecting somebody to ask this very question. So yeah, I mean, so Erlang is great and I think it's taught us a lot about how to build distributed systems. I think the thing that's kind of missing from Erlang is, so in Erlang, when you still have to work out some sort of protocol when you have two Erlang processes communicating, right? So you can't just send an arbitrary, well, you can't safely send an arbitrary function, for instance, to another Erlang process. Because you don't know whether that recipient is going to have sort of all the transitive dependencies of that function that you're sending. So I guess I'm not really an Erlang programmer, but I guess it's even somewhat frowned upon to just be sending arbitrary functions around between processes. So yeah, you still have to kind of work out some sort of communication protocol and it's definitely a lot sort of lighter weight and nicer than maybe issuing HTTP requests sort of manually and doing like JSON parsing and so forth. But yeah, I guess I feel like it sort of doesn't quite go far enough. And then there's sort of one other thing, which is maybe more subtle, which is that I personally feel like message passing at the level of actors or processes, it's sort of, it's kind of a level, it's maybe too low level, it's not really the level at which I want to program. And you can maybe sort of compare it a little bit to the unison code you saw where, well, yes, we are having to send messages between these nodes to implement our distributed programming API, but it's sort of like, it's sort of below the level of abstraction at which we're working. Like it's sort of like implementation detail. Yeah, there's messages flying all over the place, but yeah, I just want to sort of declaratively specify, hey, I want this part of the computation to happen at this node, and then I want it to move to this other node, and so I just sort of like that model better. But yeah, Erlang is great, and if Joe's here, I hope he's not too offended by my talk. All right. I feel like you and Sasha from who was speaking yesterday around Erlang and Alexa should have a conversation in the bar, in the beach bar tonight, and anyone that's interested should sit around and see who wins. Okay. Okay, so another question is, can you give an example of a production system that uses unison? Oh, no, so. Are these the guinea pigs? So yeah, I mean, it's like I said, it's being very actively developed. It's not ready for production use yet. But so it's this crazy, it started as this crazy experimental project, and I was doing a lot of exploring of different ideas. I wasn't sure which ones were going to work out, and I kind of wanted to get the ideas to a point where I felt good about them before really sort of putting all the engineering effort into making it like a production usable system. But yeah, now I actually do feel like the ideas are really solid, and I'm looking forward to sort of putting all that engineering effort in to make it something you could actually really do real stuff with. All right, well, your final question is, and you might have kind of answered it a little bit earlier, is unison inspired by Erlang, all the Cloud Haskell project, and how would you compare them? Yeah, I mean, I think to some degree, yeah, I've been definitely influenced by lots of different languages. I guess Cloud Haskell is very much, it's very much inspired by Erlang's concurrency and distributed system support. So yeah, I sort of put those in a similar bucket. But yeah, I mean, definitely some inspiration from those places. So yeah, I know, it's kind of a vague answer. It's a big question. Okay, well, if unison becomes the programming language of the future for distributed systems, you heard it here first. Round of applause for Paul Cisano. Thank you very much. Thank you.