 Okay, so my name is Mauro, I work at TenX and we don't do closure there, but I'm a closure enthusiast and I've worked with closure before in two or three other companies. And today we're going to talk about Onyx, which is a platform, a library platform for distributed computing and also it organizes our code and orchestrates the computation in a nice way even without distributing. Has anybody used Onyx? Has anybody heard of Onyx? Some people have heard of Onyx. Nobody actually installed. Okay good. So this is the home page. Yeah. So Onyx can integrate and consume and produce data from many sources, many other systems. So it can integrate with Kafka, Datomic, SQL databases, Elasticsearch, they have all these plugins prepared, so it's easy to integrate with them. They also integrate with Corei Sync, which is something we're going to use today. Here in the user guide, they have some concepts. I won't go over all of them, but I think it's interesting to see what they mean. So they have a computation graph representing all the processing that your system is going to perform. And it's just a closure data structure. So if you have something like this, this represents the edges of the graph. So a new information will come in and move to increment and then from increment to out. So this is the graph. It's just two consecutive edges. If you have something more interesting, it comes from this input and goes to two other streams. So it can come from input and goes processing one and two. And from each of those it goes to some different output. So input and output could be a Kafka queue or a Datomic database, all of those things we saw in the other page here. It has some similarities with Datomic. I don't know if anybody has used Datomic here. So you can install some functions. Again, everything you install is just closure data structures. Just like in Datomic, you install the attributes, the entities with data structures. You have these flow conditions to decide if the computation will stop in some node of the graph, if it will traverse an edge or not. You have all these primitives function that will be performed for every edge that the computation traverses. Life cycle, which is you can specify some functions that will be called when the computation starts, when the computation terminates, when it throws an exception, for example. It has these windows, particular time windows, which we're not going to use today, but they are pretty cool. You can specify user sessions. So if an user does not arrive in the next five minutes, then it's considered the end of the session. So onyx can aggregate all the events in sessions, user sessions, the user interacting. For example, if the user is using a phone app and producing events to a server, in the server, you can aggregate all those events into a single session if those events are consecutive, if they arrived in a short period of time. So it's easier to process and analyze. Well, let's show the code. I have already opened the repo here and I've connected to it. Can you read the text? Can you read now? Okay, so here we define the computation graph as we saw on the website. Make it bigger? Sure, like this. I want the frame to be bigger. Is it okay now? Okay, so here we define the computation graph. So the workflow will be this, to start with in. Yeah, so first I have to explain what we're trying to do here. It is a small project that I made by myself. It's a classic example of a bank account. So we are processing withdrawals and deposits, but this can be done with Onyx. So the project is not different from something that you have already seen, but it's done in Onyx. So I think it's interesting to compare instead of showing you some totally different project that you have nothing to compare against. So the computation starts with in. And in this case, we're going to consume from Cori sync. It could be a Kafka channel. We will see it later. And we'll go to this router that will decide what kind of operation it is. If it's a deposit withdrawal, we're going to write to a database. In this case, the atomic. And also from the router, it could either write to the database if it's a valid operation or it could write the error. So process the error if it's invalid. And if it's an error, we are going to persistent database, but in a different entity. So we can have all the log of all the errors that occurred. And so after it goes to error handling, we return to the database. And we also go to the output. So from one node in the computation graph, we can go to two different edges here. Okay. So in this case, I'm not using life cycles, windows, triggers, it's kind of like there's much more that we can do with Onyx. We start with the base job and we add the tasks. Since we're using core async as the input, we can specify the... So we create a new core async task and add it to the base job. And we do this for all the tasks. These other tasks are defined here. Okay. When it goes to the router task, it's actually multi-method. And depending on this... So our segment is one message that goes to Onyx. And Onyx will decide what to do based on that segment. So it's basically a... It's a data structure, a closure map that has an ID. And that's pretty much it. And some data that will be computed. So when it gets to the router function, it will be... We will decide depending on the type of the message. If it's a new person, people have accounts and the account can have those operations that I said. So when we are creating a person, it will run here. We can create accounts. We can deposit, withdraw. And we return the transaction that will be performed. The functions in Onyx can be defined this way. So here I create single closure functions for each of those operations. So we can the same thing, create person, create account. And these are the functions that will be called by the tasks. And we check for errors. And we return the data structure. So either we return a new entity. So yeah, either we return the person here or the error. Whatever we return will be persisted in the atomic. I do this pattern for all the operations. So we always return the error. And as you can see, they are very similar to the other operations. It's pretty easy to implement. So here is where it starts running. So in this, you can see the line 95. We are putting a new, new information into a Cori sync channel that will be picked up by Onyx. Yeah, the rest of this function is not so nice. But so this part is the important part. We produce segments from the messages we want to process. It is for each segment, we produce the message. So let's see it running and to understand what I'm talking about. So the messages we're going to process are simply these things. So every message has an ID and the operation, either message withdrawal with amount or deposit with amount or the account that we are creating with all the other information. Or, and here I'm using a simple account template to create an account with a balance and the person that owns the account. Let's run this test to see what happens when we create a duplicate person. Actually, I think the other one is just a second. Yeah, run this one. So here we're running a test that to check that we cannot create a duplicate account. So two accounts with the same ID basically. First, we instantiated the job with some settings. Then we create a person from a template and we send the message. We create the segment from the message and send it to Onyx. That Onyx will pick up this segment and then run the function submit job that we saw. So it will come here and put some more data around it and put it into a query sync queue or channel. After producing the message to the channel, it will it will create an account for that person from the DB. So it will fetch the person from the DB, create the account, create the segment, which is how Onyx reads and submit the new job, then we create the new account. After that, we try to create the same account again. And here we expect to see an error because we create a second account with the same ID. And the error is consuming from the error task. And here we're getting a segment ID of the segment that produces the error. And then we print, let's see the output here to see what we are printing. So here after Onyx loaded, at first, it produces something to the query sync channel, which is this person. The segment has an ID message, which is this, the message is created account. And the account has an ID, a balance, a person, which is another entity pointed by it. And the person has an email and a name. Then we create another account. And this account has the same ID, same ID as this one with the balance zero. And then we try to send again. Oops, no, sorry. Now here, first, we created the person. Now we create an account. Now we're going to create the second account. And now we expect to see an error. So we saw that the account was a duplicate. So, so in the error task, we consume a new message that shows the account, the details of the account that produces the error. So we see that we got an error. The type of the error was duplicate account, can trace back what happened. And the body of the error is, could be any string, I decided to put the serialization of the entity that produced the problem. And, and since the output is, we are using the atomic plugin, it's persistent database. So we can, we can see later what happened. So this was create duplicate account for deposit. It's, let's raise this. Okay, while it's running, we create a person create an account for this person. And here we create this list of messages to send to Onyx. So we're creating the account, the person account. We are depositing 100, trying to withdraw 90, it should work, trying to withdraw 20 more. And this should not work. And we should be able to consume the error again. And let's see the output. So here we send the message to create a person, create account, deposit, withdraw, withdraw again. And then here we're logging when you're consuming the messages. So create person, create account with those IDs, deposit, the old balance was zero. Now it's 100. When we withdraw the new balance is, now we're withdrawing 90. So the Oh, you know, yeah, no, the new, we withdrew 90. So the, the old balance was 100. The new balance is 10. Now we're trying to withdraw 20. The old balance is 10. The new balance would be minus 10. So here we're printing a printing what the new balance would be, but we don't actually persisted because it throws an error before. And now we get the error from the other channel, which is insufficient funds. And it specifies all the incoming message that costs the error. So we have here the body for, so for this account ID, we had this balance with, with this amount, we tried to withdraw. And we got insufficient funds. So we can consume this and, and plug it into the atomic and persist the error and persist the other state. We did this with, we saw that it was with CoriSync. But if we show here, I don't have Kafka installed here, but I would CoriSync, it's, it's not distributed. But what is it? Maybe if you change here to a Kafka channel, we can, we can spin up many machines, all of them connected to the same Kafka channel, and they can be producing. So it's very easy to make this distributed system. And likewise, we can consume from the same channel with multiple machines. So everything that's implemented here with CoriSync, we can just replace with Kafka and should work. Yeah, I think that's how I had to show it. Does anybody have doubts? Yeah. Just onyx has, like, a web interface in Onyx. I mean, is it 30%? So if Onyx has a web interface, right, they have one, one other project by the same group. It's called Pyroclast. Yeah, to this. So it's not, it's technically not part of Onyx, but it's very well integrated because it's by the same people. I don't know if you can see here, but yeah, it's made to be able to see what's going on in streams. And you can perform queries against those streams, and it will update in real time. I don't know if you can see there, but so we can write the closure functions here, in cases using frequencies. And as it types, it shows the result. It's, it's processing the streams in real time. I think it's pretty cool to watch this video or recommend. And this is what I think it's the closest thing to a web interface for Onyx. You can do fitter and map in here so you can, you can show only part of the stream, only some messages in the stream, or you can transform them and compose those transformations. Here he gets an input, split white space, zip map. Does that answer your question? Or not exactly, but program. And then, I would like to know the authenticity of each computation. Right. So this is not something to define the computation. He will want to get the output of this. He's not using this for development. He won't get the output of this, this pipeline and, and write the code and save and deploy it to his system. He will do this in real time. So if you want to see how many messages were sent here, you just put a function account or something. And you'll see how many messages are going through the, the stream. Yes. If I have a cluster of machines that I want to use as my Onyx cluster, what do I need to install on each of these nodes? To run it as a cluster? For example, I compile my code, I have my jar file. And then do I need to copy this jar file to each node in the cluster? Or can I send the job from a separate client, one client, to run on the cluster? Right. So I think my idea was to do the second thing that said, like, create the jar and send to multiple machines and then you spend all them, them all up. There's a, there's a, is there a Onyx agent sort of thing running on each, each node that is expecting a jar or something similar? Oh, I believe that there's not something like waiting for a new jar to, to be deployed in a, to be created in a folder and then it, it creates a new computation. No. But if you, if you have defined the, the streams that you need, you can do these computations with Pyroclast on the fly. So if you, if you want to simply analyze something quick, you would use Pyroclast. Yeah, if you, yeah, if it's not some something, just a quick analysis, but something like you want to create that those streams permanently, you would have to update those, those jars, you know, know the machines, at least as far as I know, that's how it would do. So if my task is like a long running computation, is it possible to do it with Onyx? Yes, you can leave the, the process running, running that, that, that program creating that stream. Like, why, why, why do you say it? Why would it be possible to do a long running? I'm not using the right terminology. But I'm just curious how I can take advantage of multiple machines to run a computation. Right. So in this case, if you're, for example, using Kafka, you would have many machines consuming from the same channel. And then Kafka would, would send the message to only one of them. So you would have, you would be able to scale horizontally. You would have many machines processing the same stream. So you're, you're scaling. So how, how is this plaster, is it relying on something like heat keeper? Or does it have its own plaster membership for the core? Right. So if you're comparing with Spark, for example, like Spark is a much bigger project in the same, the, the top layer of Spark. So it doesn't deal with the infrastructure or how to manage the machine. So if the machine dies, like Onyx won't restart the machine, it's one level higher of that. So you, so yeah, using Onyx, you have to find some other way to manage the machines. No, you're not limited to one machine, but you have to monitor machines and restart them with some other tool. So that's not part of Onyx. Yeah. I don't know what you would choose for this. Is anybody using Onyx for multi nodes pastures? And what do they do? Yeah, right. Yes, a hobby project for people running on a laptop. I think it's not just a hobby project. But actually, I have never like deployed this in production or something. So for me, it's like a hobby project. But if you see their web page, I think they have some cases or not. Just to answer your question. So onyx is masterless. Onyx does not have any master state configuration. And yes, you can run on some multiple configurations. So the way I do is something about, they implement something about snapshot with mechanism. I think there's a talk by Mike Rogales, who explains how they do masterless. So the idea of Onyx is masterless. So if you don't need a coordinator to actually ask, compared to like, that's the short answer. If you run our code in a cluster, do we need to distribute the job, the final job, which contains a workflow itself? Yeah, I don't know the answer to that. But I don't know the answer. But if I'm not mistaken, there should be a way to start like an agent, and waiting on a job and then send the job to the agent. Because Onyx is masterless, and he doesn't have a guy who coordinates it. So actually, I think something more like a ring, the guy from the neighborhood, they coordinate each other. That's, that's from what I know, how Mike explains it. Similar distribution can still happen in a masterless way. Anybody else? Is the code you used in your example, is it available on GitHub or anywhere? No, it's not available for now. If you want to see it, I think I can clean up a bit and upload somewhere. I just, it doesn't matter if it's not being nice, I just think to have a look at an example, if you're happy to do so, it could link beyond the middle to the awesome. You understand it's not a licensing reason for something this one. Yeah, okay, right. And what would be a color world for Onyx? Yeah, they have I think, for me, at least the example was quite involved. Mm-hmm. Yeah, they have a series of exercises, a tutorial. Let's see what is it. What is the simplest program? So they're gonna be something like a log file management and time based. And so Onyx is very strong in one of the big areas is using Apache teams concept. So what Onyx has done is completely married for the Apache teams we're doing concepts to, to what, you know, what some few years, a few months ago, Schafka and all did not do. So they, they did all of these. So Onyx was a project which was started as an open source. And of course, they got some funding. So they moved to distributed match planning. So they created Pyro class, which is a completely cloud based version of Onyx. So Onyx remains as an open source project. But in terms of making money and funding work, I think Pyro class was a cloud based version. So if you want to really keep the tires of Onyx, then Pyro class is a better, better solution. So pretty much if you have a use case for something like Storm or Spark, you can implement and get used to Onyx. And also back on, you were asking one of the people who works on Onyx, look, this is, we part of the business that used to be part of the same company. And he's moved to work on it. Yeah, so in this Is it something like a work on the example? I'm looking at Spark. Yeah, so in this report on the official organization, they have these examples. And I don't know what example would be the simplest, simple, but the most simple example. But yeah, this is not that simple. But yeah, it would seem Yeah, there is actually, like, this example called aggregation, it actually counts words. It doesn't mean it's the most basic thing. But so it will split the sentence, the workflow is some new sentence arrives, we split into words. Then after splitting, we count the words, and then it goes to the output. Okay, so I think that's the closest thing you get to the Spark example. So the operations that it will perform. So for a split sentence, it will call this function to split sentence, count words, its identity, because after splitting, we don't want to do anything. Output will go to corey sync, but it could be anything. Input is also corey sync. Here, it's using those time windows. In this case, it's global. But if you change it to like five minutes or something, you can see how many words in each interval are arriving. Triggers. I don't really remember. I never used triggers. So the input in the example is these five sentences. So each input has an idea, a time, everything at same time, zero and a sentence. And it's pushing everything to the corey sync channel. And then we close the channel. This is just the configuration. They are using Zookeeper. Yeah, I don't really know how they are using Zookeeper, but I think it's the start of the answer to your question. So they're using it to orchestrate the machines. Yeah, let's see the main function here. Yeah, it will simply take all the segments from corey sync. And since we have already pushed them all to the channel, it will run here. Yeah, it will follow that computation graph and split the sentence here. I missed this part. Yeah, go through the workflow and like split sentence, count words. Count words, I think doesn't do anything, right? Yes, identity. You will be the group by key, right? Exactly. Thanks. Group by word. Yeah, yeah, I think this is the example you're looking for. Okay, anything else? So in a multi cluster, let's say we have a Kafka running right? And some task is given to one node. And there is a failure. Then how it is handled? So you mean an unexpected error? Because we expected errors, we are iterating. So like how do you practice replicates at three places? Right, so you could have multiple machines consuming from that channel from the Kafka topic. And they could be in the same, same partition, or it could be different consumer groups, actually. So both machines would receive, would process the message. So you can have redundancy in that sense. Is your question, how is a failure handling on it? It's on its use as a loopkeeper for distributed state management. If I'm not taking submitted jobs and they get stored, someone is loopkeeper, so there's a mechanism for helping. I don't know exactly how it is. And I think Pyroclass, which is the, like the platform as a service, it's actually built on on it. Like a clover script overlaid on top of it. I mean, just to clarify. Right. Okay, more questions. Can you do a streaming word card for the next talk? Yeah, okay.