 I am going to give a presentation here today on Scholar's Ed Stream, the good parts. I'm going to talk a little bit about the not so good parts, but mostly I'll gloss over those today. As Colt mentioned, I am using Repel's Ent, which is a really nice tool actually for doing presentations, except for graphics. So who am I? I'm Derek Tenbecker. I've been doing Scholar development since about 2007. I did a lot of work with Lyft initially, moved on to some actual commercial work for a couple different companies. At my current job at Simple Energy, we use Scholar's Ed Stream for data processing and energy usage from utility companies. I've been using Scholar's Ed Stream for about a year. My friends either wince or give me a blank stare when I say it, but I like to think of myself as a pragmatically functional programmer. I see the benefit in some of the things, other things, you have to convince me a little more. We've already had a couple talks today on Scholar's Ed, which is great. Hopefully, everybody's at least a little bit up to speed on what this is. So I'm not going to really touch on Scholar's Ed directly. What we're going to be talking about today is primarily Scholar's Ed Stream. We won't touch on much that's related to Scholar's Ed and Inno. What is Scholar's Ed Stream? Scholar's Ed Stream, as the name implies, is a stream processing library built on Scholar's Ed foundations. It's a compositional library for stream data processing. What that means is, at its core, Scholar's Ed Stream is a really simple library. A stream of data is something that produces data, something that transforms that data and something that consumes that data. Scholar's Ed Stream provides a nice breadth of different pieces for those things that allow you to work with them and plug different things together in different ways in a really simple fashion. There's also a big part of Scholar's Ed Stream that's built around resource safety and around constraints on resources. Resource safety is making sure that you're not leaving file handles open. You're not trashing your VM in the process of dealing with things. The constraints allow you to do things like process two terabytes of data without having to worry about your heap blowing or blowing stack or that kind of stuff. A big goal is performance. There are some areas where Scholar's Ed Stream, you're going to trade a little bit of functionality for performance, but in general, in our experience, it's been pretty good. It's been a reasonable balance between the two. What I talk about here are the examples I give are using Scholar's Ed Stream 07A and Scholar's Ed 7.1. There were some pretty significant bugs in concurrency. I like to be a little understated. Daniel, I'll kind of go off the rails if we dive too much into what was fixed, but just in general, if you can, try and use the latest version because there's some pretty big things that were fixed there. For this talk, we're going to use the set of imports. I'm going to bring those in here. Colt said, you can either be surgical about what you want to bring in from Scholar's Ed, or you can bring in the kitchen sink. For conciseness here, I'm just going to bring in everything. We're using Scholar's Ed Stream, bringing all the stuff from there. Scholar's Ed Concurrent Task is the mona that I'll be working in. Generally, on the work that I do, that's all we work in is task. It has a lot of nice functionality that was already touched on in Colt's talk and in Stu's talk, related to how some of that stuff is used for cats. But I'll show some more things that we can do with it here for Scholar's Ed Stream. And then I'm bringing in some duration stuff because part of Scholar's Ed Stream is some time-based streams of data which are kind of neat to work with, and I want to show those off. So what is a process? So the process is kind of the core concept or the core type of Scholar's Ed Stream. And a process is basically a sequence of values, or more specifically, it is a description of a sequence of operations to produce values. So what I've got here is a pint. I'm not going to call it a pint because I don't believe in Hungarian notation. We have a pint here that is just a process operating in task over a whole bunch of integers. Process, admit, all is just going to do what it says. It's going to take that sequence and admit each of those elements in the sequence. Now, if I run this, you kind of get to see some of the sausage being made. When I say that it's a description of a sequence of operations, there's a, this is an interpreted kind of thing. So you have this construct that is a process and then when you need to do something with it, you need to turn that description into something that actually runs and interprets that and produces things, does the things you want it to do. So here you can see the actual return value is an admit of list of this stuff and so that's basically saying, admit these things. There's a admit, there's also a admit all. Process eval lets you take an effectful operation and produce values from that. So here I've got a silly little sleep for a second, but you could envision say reading from a file or a socket or anything like that. Eval lets you say, I'm going to do something within my, within my monad and that's going to produce a value and, and just deal with it that way. Anything that is not, that is in, you know, like the admit and mentals are, are evaluated directly. So we're going to run these and we see more of these things. One other thing I want to point out here for the, the process eval, what we get here is an await, right? And so you'd see there, the wait is a little bit more of this machinery where it's saying, we're going to operate, we're going to execute this thing. And then you'll see the function one, right? So processes, there are sequence of values you can, can concatenate them together, you can put them together. This is essentially like a thunking process. So this is like, this function is going to be executed to produce the rest of the tail essentially. Repetition is something that's pretty easy. You can do a repeat eval if you have something that is an effectful operation. You can also do a dot repeat on the end of any process essentially and that turns that process into something that repeats ad infinitum. There's also constant, right? So constant is kind of the, the analog to repeat eval for things that aren't effectful. So you get to say process constant 42, I'm going to just produce a whole bunch of 42s. As of 0.7, rep 2 is not actually equivalent to rep 1. Rep 2 will, will hang sometimes, depending on what you're making your task. This is what I said about Daniel knowing all the ins and outs in the bugs, but yeah, that's, that's a good point. Is generally you would want to use repeat eval. Repeat is something more like if you were past a process, you could use repeat to essentially go over that again and again. But yeah, if, if you're constructing it, you should use repeat eval. Cuz that's more, more in line with what you want to do. We're making a work. It's gonna work. Uh-huh, all right. So, so we're showing a little bit of how we get data into a process, or how we construct a process. But then you really want to do something with a process, right? So the first thing that you do is you need to run that process. Running the process does not actually execute the process. It constructs the machinery that will execute the process. So in this case, it's gonna return, if you have a process of F, that is the monad is working in F, Run will give you back an F that does the things the process is supposed to do. Here I'm working in tasks, so I get back a task. If I actually run, run, this is, this is where things get a little silly, I think in terms of syntax. But you run, you run, run a, you run, run a process. And then the first run says, okay, take the description of the process, turn that into a task that's gonna do something. The second run is actually executing the task. Now, if you remember, pint is just a process of a bunch of ints. So I ran it and I got nothing, right? Well, that's actually to be expected because remember a process to do anything either has to be an effect of operation or you have to consume that process. And so when I said run this, I said, okay, make this process, generate some values, but don't do anything with them. So that's why I got nothing back. There are a couple variants on the process.run that let you do other things with them. So run last takes the process and executes it returning whatever the last value produced was from the process. And in this case, so it returns an option and that's because it's processes don't always have to produce values. There are processes that are essentially process of nothing. So they don't produce anything. And so you can get a none there. The other thing you can do is a run log. This is run log is one of those things where, so run last is. Wait, how do you mean is that deterministic? Right, so yeah, it'll do the same thing. Right, yeah, that's yeah. Well, it's deterministic if you didn't do something like a merge upstream from it, right? I mean, as long as you're deterministic on the process that you're running. Right, but that's encapsulated in the other fact. So run last itself, the value from run last is reference to taxpayer. Yes. The value from run on the task. Yes, that's good, but yeah. Yeah, I'm not putting caps on the merge upstream. Right, right. So run log is something that basically executes the process and collects the values that it produces, right? So it returns it here as a vector. So we had a process that admits one, two, three, four. We go ahead and run log that, we get a vector back out for those. If you're doing anything of size with this, run log is a really bad idea. I mean, obviously it's collecting all the values. So we process 10 gigs of XML a day at work. If I were to do a run log on that, I'd probably get fired after someone came and had a very stern talking to me because we'd blow heap and do all kinds of other nasty things. So run log is nice if you know that you have a fairly small thing or if you just want to debug something or kind of log things. But there are other ways to deal with processes we'll get to you later. Run fold map is kind of like the last run method on processes. And for unfold map is kind of an interesting one. I actually like it a lot because there's some things you can do really easily with it. Essentially, it's kind of like fold map in Scala Z where you're taking a mapping function and then you're folding over the results of that mapping. It's map reduce, right, I think that's how it was described. So for example here, I've got my process of integers. I just want to sum all of them up, right? So I instantiate a monoid here that says I'm going to add all the things together. And my identity is zero. And when I do a run fold map on that, first of all, I'm not trying to transform the value, so I just use identity here. But I get back the sum of those, so 1, 2, 3, 4 is 10. Yeah, yeah. I think, is that how it's implemented? I can't remember him. Yeah, everything except for run is implemented through run fold map. So run last, for example. Oh, that's right, yeah, yeah, yeah. So run run run fold map, which actually is the source of some bugs. Yeah, yeah, so processes are rerunable, right? So it's just a description. So if you execute it as long as the same state exists, then you're getting the same results. So I can run this as many times as I want. I'm going to always get that back. If you're dealing with external state, like if you're probably doing anything IO related, or in this case, doing something really bad and defining some vars to mutate, you will get different results every time, right? So what this is doing, this is basically just, it's a conditional on what that var actually value is. And so yeah, don't do this. Generally, you don't want to mutate things outside of your control if you don't have to. But that's a general Scala thing. That's not specific to Scala's ed stream. Process relies on the stack safety of the monad that you're operating in, right? So there's nothing magic about process. It has to operate based on what the monad can provide. If the monad doesn't trampoline, you can blow your stack. Even if you're using task, task is not guaranteed. So task often will work in a different thread, depending on what you're doing. But there are cases where it's not running in different thread, and so it's very easy to blow your stack that way if you're not careful. If you don't have any effects at all, like you have a process, it's like a process zero, I think is what it is. There are some methods applicable to it, like two stream, two lists, two vectors, stuff like that, where you basically can just produce the values, and that runs it for you. So the almost all of the stack blowing stuff has been fixed, but the run full map from logic center, like that's, you'll build your stack in that, and all of the process zero stuff that you just referenced, those are not stack safe. Right now, yeah, process zero, don't use it if you know that you need something that's stack safe. That's more like, if you had a process admit all, that's not actually operating in something that needs stack, so you could do that. In general, methods that are available on seek and it's ilk in the standard library available are things like map, filter, flat map, zip, take, all those kinds of things. If you actually look at the library, not all of them are actually implemented on process. In fact, I think most of them are not implemented on process, they're actually implemented in another construct that I'll be covering towards the end, it's a little more advanced way to deal with processes. But yeah, we can run all of these, we get our plus tens, we get whether, we filter on the Fibonacci numbers that are just old Y2, fairly straightforward. Plus plus and append are basically what they say, you can append two processes together and when you execute that, you'll get the concatenation of those processes. Now, it only works if the previous processes were successful, right? So here, I have an admit all, I'm gonna admit a couple numbers, then I'm going to issue a fail. And a fail is basically a terminating process that's saying this process is halting specifically for this exception reason. There's also a process halt, which is kind of your normal termination condition. And then I'm gonna try and admit a zero after that. But as you can see here, I only got the first five numbers because as soon as we hit the fail, it terminated things. Now, another thing to notice here, I'm working in task which has attempt run on it. You see, instead of doing run run, I do an attempt run and an attempt run basically executes the task and if an exception is thrown, it's returned as a disjunction, right? So either the throwable on the left or the actual return value on the right. And in this case, down here, you can see at the bottom, we got a left of the exception that I expected. So given that you need to be able to handle exceptions or other things happening, there are a couple of methods that you can use. One is on complete. This is kind of like your finally clause in your code. If you have an on complete at the end of a process, no matter how that process terminates, you will execute that code. There are some corner cases. There's some known bugs. Yeah, I know. Daniel's rolling his eyes here. There's some bugs. In theory, on complete should always run the way you expect it to do, as would a finally clause. But there are a couple of outstanding bugs that if you're doing some not so friendly things with mapping and nesting things, you can screw things up. But generally on complete is like your finally. There's on failure, which is kind of like a recovery. So on failure takes a function from throwable to process. And so you can, if you just wanted to say log the error, you could log it there and you could just return a process halt or you could recover from it and do something else and return more process to produce. On kill is another one like that. On kill is like, there's a way to essentially kill a process, which basically says just halt what you're doing, absolutely stop. And on kill handles that. So we talked a little bit before about how like the methods on seek are available on a process. You have map, flat maps like that. Well, it's nice to be able to work with a transformation on a process in the context of a process. And the type for this is called a channel. And a channel, I mean, like it says, is something that you run data through and you transform it. And what a channel really is, is a process that produces a mapping function over and over and over again. And what happens behind the scenes is that mapping function is applied to each element as it comes through. So here I've got a simple channel. Channel task int int basically says that I'm working in task. My input type is an int. My output type is an int. And process.constant, if you remember, is basically just repeatedly produce this value. The signature for the function you have to produce has to go from an A to an F of B. So it's an effectful transformation. And that's why this is in this task now. Task now is essentially just immediately evaluate this because presumably addition is not going to have any side effects. Not in my JVM. One more close there. One more close. I don't feel like I should have to qualify that, but I'm sorry. Let me run that. So I get a channel there. Now, it's interesting. Channel is actually just a type alias. If you look at the source, channel is just a process of F of A to F of B. So these are all kind interchangeable, but fortunately, there's some nice type alias that make it clear that that's what you're trying to do with this. So working with a channel, if you take an existing process, you can transform it through a channel by using the through method. So I'm taking my pint, and I'm going to add two to everything here. And then I'm going to go ahead and do a run log so I see what my results are. I took 1, 2, 3, 4, 5, and got 3, 4, 5, 6. You can also transform a channel. So given a channel, you can contra-map the input. And that is to say, if you expect an A for that channel, you can take something from C to A and apply that as a pre-mapping function. Or you can map out, which maps the output to something else. So given a B to a C, you can get a channel from A to a C. Similar to a channel, sinks are the consumption side of processes. And again, this is really just a process that's producing functions. In this case, it is a process with a function from A to F of unit. The whole point of a sink is to be an effectful consumption of the value. And so the return type is unit. Because it's a process, you could do a through on this. You would just get back units, which doesn't make much sense. But what you normally do is you will use the two method on your process. So you say process two and your sink. And when you run that, you get your effects. So here, I'm not doing run log. Here, I'm doing run because now I actually have something that will consume the values with the two. So when I run this, it'll actually consume what those values were. So this is my transform function. Additionally, you can observe things. And this is a really nice feature on this. So essentially, you're splitting off. It's like you're kind of forking the outputs as they come through. And that's not really how it's implemented. But essentially, what you get is with an observe, your sink gets a copy of the value as it goes through. But you'll also still get the value as it comes through. So in this case, if I did a run log, I actually get my consumption that prints to standard out on lines. And I also get the return value at the end. And so this is something where if you had, say, a process that was producing values and you had a couple different things that wanted to work with those values, you could just observe off to those various sinks. And they just kind of split off and do their thing. And then you could have a final consumption at the end. It's also super useful for debugging. Yeah, I would say this huge pipeline of processes that goes through this, that goes through that. You can just stick and observe any point in there to print line and see what the heck is being passed through. Best thing ever, running or whatever. I mean, that's the thing. This is one of the things I love about scholars is streamings. It's such a simple concept. But it has all these really nice, simple tools that make a very powerful abstraction for working with big data. Concurrency within processes. So essentially when you run a process, you're going to execute it sequentially. If you have a bunch of processes that all produce the same values, you can do what's called a merge. So merge n takes a number of input processes. They all have to be the same type. You can't merge strings and ends together and expect to get something normal out of that. Although you could, if you had a case like that, like that might be something where you pre-map to a disjunction. So you could use a disjunction as really as an either, as opposed to like an air handling type. You could say, OK, it's either going to be a stringer and you can pipe that in and merge those at that point. But you'd have to do a little work ahead of time. The merge is non-deterministic. It means that when you execute it, it's going to pick from those processes as it goes, but you're not necessarily guaranteed any order there. Merge n also takes a max parallelism count. And this is kind of an important thing. If you don't specify a parameter for max parallelism, you're basically saying you're unbounded on the merge. You can do whatever you want. You can take up as much memory as you want. You can use as many threads as you want. In production, that's usually a bad thing. You want to constrain things. So typically what we do at Simple Energy is we will usually bound to some factor of the number of cores. We tend to be more heavily CPU bound than memory bound, but you may use something else there as a heuristic for bounding that. But in general, although Scholar's Ed Stream does provide a couple different things that are generally unbounded, you want to use the bounded version where you can. Avoid yourself some future grief and like 3 AM calls on a Saturday. So here I just took my pint and I run it through a merge. You can see I'm flat mapping, or I'm sorry, I'm not flat mapping. I'm mapping each of the values to its own process and the process simply prints out which thread it was run on and what the value was. And you can see here that it's non-deterministic. That was kind of funny. I don't know if it's going to do it again. Yeah, so I don't know if this is cash warm-up or what's going on, but after the first run, it seems it would appear to be deterministic, but don't be fooled. So air handling patterns. I talked before about how you could kind of have this global air handling at your monad level. So if you're using task, you could kind of do an attempt or an attempt run there and you could say, okay, well, the process exploded at some point and I got an exception out of it, but that's not that satisfying if you deal with a lot of data. I mean, you don't know where it died. You don't know what happened exactly. On a per element basis, it's very typical to use disjunctions. And this is where you could basically say, I have something that I'm going to do. Here I'm going to give you an example of parsing integers, right? So I've taken a whole bunch of strings. I'm going to try and parse integers from them. I'm actually going to get validations out of this because parse int is a validation implicit. And I just disjunctioned that. I left map the actual error that I get to just like fail this because otherwise it's a number format exception, I think. But when I get this, you can see, I'm going to go ahead and just run log this. So I get a whole bunch of lefts and rights. I get three rights for the successful parses and I get a left for the one that failed because foo is not a number in Arabic. Could be in a different one. Yeah. Earlier you showed that you can map a process directly. Yes, you could. Okay, so how do you decide between doing that and having a constant channel? I would typically use a channel. This is more just kind of demo code thrown together. Yeah, generally in production, like I'll write a channel specifically to do a certain task because then it just fits in more naturally because when you're looking at what a process is doing, you'll see it's through this channel, rather than looking at, I mean, a whole bunch of maps is going to be much more verbose if you have any non-trivial operation on your processes. Because it's like all this eval stuff, or reading stuff and that, like you can do that in a channel. So you have a channel that pipes this data through some other machine and then put it back. You can't do that. Yeah, I mean, the thing to remember there is that a channel's, the signature of the function that a channel's built on is an A to an F of B, right? So in that F, you can do all kinds of effectful things. Whereas with just a plain map, you can't do that, right? You would end up with a process of tasks or whatever, which wouldn't make much sense. It does not, no. No, that's incumbent on whatever monad you're using to determine what's happening in terms of threading. So you're gonna join them? Hmm? You're gonna join back? Yeah. Like you cannot, then? You can, oh yeah. No, there's nothing preventing you from doing something. Like if you took a process of ints and you turn that into, say, a process of task of HTTP response or something, there's nothing that would prevent you at the end from, say, doing a run-fold map and folding over the tasks and executing them there. But the machinery is kind of geared towards... You can also, like there is also a dot deval that you can use on a process like that to kind of join it up and flatten it if you want to. But I mean, the machinery, as you said, is very much... Yeah, like you could do an eval map identity. And basically, I mean, or you can just do an eval map, really. That's the other way to do it, would be an eval map. Let's you take essentially the same function signature and use that, not in a channel. Yeah, there is a plan. So back to the error handling, when you're working with a disjunction, there's actually a type for this. They're a type alias. So a writer of task of WO is really just a process of task of W disjunction O. So you've taken a disjunction, and now you're dealing with it in a way that's kind of designed for this. And the nice thing is, there are a whole bunch of methods on a writer through syntax that give you things like mapW or observeO or stripW. So you can do things like, I've got a process of disjunctions. Maybe I only want to do certain things to the error sides, or I want to only do things to the valid sides. And so you can do that. StripW would basically say, run through this process, and wherever I have an error side to disjunction, throw that away, and anything that is the success side or the O side, it becomes a process of O at that point. So you're throwing away the disjunctionality. I don't know if that's the word, but you're throwing away the disjunctitude. I'm falling apart. This is not, yeah. Might be a little confusing. This is not the same writer. This is like strictly a type alias. Yes, it's a process. So yeah. Yeah, so this is just something that allows you to say, I have two cases for every element. I have something that was maybe a success, something that was a failure, whatever. And I want to specifically apply things to one side or the other, right? So example here. So I have my parsed process, which gave us three valid parsed integers in one bad one. If I do a drainW here, that basically says anything that is a W side, an error side in this case, I want to drain to this sink. And IO standard outlines is basically a premade sink that says, you know, like for every element, take that string and ship it to standard out. You'll see here, because I did a run log, I get back a vector of collected values. The collected values are the ones on the O side, right? So drainW is something that we use all the time. I mean, you know, we'll parse XML data, right? So like there could be something wrong with this particular XML standard, that becomes something that we drainW off to this error reporting. You know, further down the line, maybe we need to do some calculations. Something was wrong here. You know, the ones that make it through on the O side, the ones that aren't on the W side, we send those off to a different sink. And so you can kind of work through your different phases of your processing that way. So real world usage. There are, you know, unless you have a really boring job, you're probably not going to be using a mitt and a mitt all that often, or at least not for the bulk of your work. I mean, if you're getting paid money to spit out a whole bunch of integers, I need to talk to you. Cause that sounds like a dream job. But, you know, typically you're going to be working with effectful things. You're going to be working with IO, you're going to be working with files, network, sockets, you know. We use AMQP, we use RabbitMQ. We have process wrappers around that. We use a Kafka, we have process wrappers around that. There are a number of really handy utilities in the Scholzit Stream IO package. They're mostly geared around reading and writing from files, or from standard and standard out, or from input streams, right? Input streams and output streams. They're kind of broken down by what they read, whether it's a byte vector. Byte vector comes from SCODEC, which I think Stu may have mentioned earlier, is an awesome library for dealing with binary data. Just really fantastic. Byte vector, I really suggest looking at SCODEC. I think every person who comes up and presents should kind of push that today. So it's how I feel obligated to say that, but you really should take a look at it. But even if you don't care, you don't want to know about it, basically you can think of byte vector as an array of bytes, right? So you don't need one with byte vectors, which is binary data, you're chunking in and out of something, or you're dealing with strings, which is typically going to be lines, like from standard in, standard out, whatever. New in the zero seven branch of Scholzit Stream is this really cool function called to input stream. If you're dealing with any non-Scholzit libraries, which is not all that unlikely, and you need to get something from a process into this horrid Java library you need to deal with. To input stream will basically take that process and turn it into a Java input stream that something else can read from as if it were an input stream. And all the magic is done behind the scenes for you, you don't have to deal with that. I'm not going to cover all the individual methods for IO, but I do want to cover one specific one, because this is a really nice one, it's very handy. This is what we've used to write things like adapters for AMQP and Kafka, things like that. IO resource is basically a method that takes three functions. There's a function for allocation, which basically says whatever my process is going to read from or consume from, I'm going to set it up here. I'm gonna set that up. There's a cleanup that allows you to do resource deallocation at the end. And then there's actually essentially an iterative processing function that says, given the current state of this thing is do something else to produce another value or stop the process and then keep going. And so it's a very trivial seven line example, but you can extrapolate this. I think we're gonna open source some of our AMQP and Kafka stuff, so we'll have something to actually look at there. How often do you find yourself doing the throw cost terminating? Only in IO resource. I was gonna say, that is like the one ugly thing about this. It's terrible. That's the whole reason that the PDVAL isn't the same as the VAL would be. Okay, yeah, so if you see here, this is, I hadn't gotten to this yet, but yeah, so the meat of my function here, I've basically created an iterator, and I just need to say, if I have another value in the iterator, go ahead and produce that. Otherwise, throw an exception, which is really awful, really, really awful, but this is how you terminate a process from the inside, right? From the outside, you can just say, like, I only wanna take this many values, or I only need to produce this many values, and that'll kind of halt it normally. But if you're actually inside kind of the guts of the evaluation system, this is how you stop things. The good news is that that's actually only inside the guts of the sinks and channel stuff, because resource is creating a channel for the creep, I don't remember, or there's a channel under the surface, but if you're inside of a normal process, you have a little bit more control, and you don't have to use exceptions. Yeah, no, no, like you can always flat map into a halter, no, not flat map, you can eval map into something that produces, oh yeah, you flat map into a halt. Yeah, so IO resource, love it, hate it, it's really useful, but yeah, this is like the one word on it. But yeah, so if I go ahead and run log this, so I get my done executed first. Remember, the cleanup happens at the end of the process evaluation, but the actual finishing, the collecting of values happens even after that, right? So that's why I get the done before I actually get the values out. So I'm gonna talk briefly, I mentioned in the beginning, there's some time-based processes you can use, these are kind of cool. They may or may not be useful to them, but I think they're good to know about. There are a lot of little nooks and crannies of Scala's Ed Stream in terms of functionality, and I'm trying to cover the breadth of things. So I wanted to cover these. So first one is awake every. This one's actually really useful. This is basically a timer. You run it, I'm gonna execute every second, it produces a continuous process, it won't halt. That's why I have the take three in here at the end there. So I'm saying I'm gonna take three values from this and then I'm gonna stop. If I didn't do that, it would be a very boring presentation from this point forward. But you'll see here, one of the things you have to bring, if you're using these time-based, the time-based processes, you have to bring in a scheduler because that's what it uses behind the scenes to actually schedule the things to be produced. Sleep waits a specified duration before emitting a value. Now, it doesn't actually emit a value. So maybe that's a misleading title, but sleep does actually sleep for that amount of time and then I have to, that's why I have to append another process to actually emit the value on the end of this. So I run this, I'm gonna get a sleep for one second, roughly two seconds, roughly three seconds. You can see here, processes just like it, they do have, they are a monad, so you can flat map or, sorry, for comprehension on those. Duration is a continuous stream. Now, when I first got into scholars that stream and I was trying to look through all the corners and everything, continuous streams kind of blew my mind of her, so I was like, wait, it's continuous. Like how can it be continuous, right? We have to do something to execute this, but really what's happening is the range is continuous, but the actual computation of that time is discrete, right? So you could use, duration gives you the duration from the initial invocation of the process. So when the process started computing to where you are now is the duration that you get back. As a simple example here, we want to time how fast our printlin implementation is. So I can run that and I get an interleave here, so I actually get the printlin and then I also have the two IO standard outlines so that I can actually see what the durations were and that tells me that it's probably fast enough. Every is another continuous Boolean stream which is true after some, which is true basically after boundaries of duration, right? So it's a continuous stream of Booleans, but you can think of this as kind of like a, it's a bit that gets flipped once every time it crosses this duration boundary. So in this case, I want to run a timer every second and I'm gonna zip that within a wake every quarter second. So I should get four values, but only one of every four should be true, right? And what happens is when you consume that true, the next value will be false again. It basically resets the value until the next time it crosses that time duration. Now, essentially use case would be like checkpointing, like you're consuming off a stream, you wanna make sure that after some number of seconds you're gonna do some operation to push stuff off to the side. I'm gonna run this here and you can see so we get true, false, false, false, true. So getting more into the asynchronous side of things, there's a scholars at stream async package that has a couple of these nice constructs for dealing with asynchronous things outside of processes. The first is queues. They're like I said at the beginning, there are two versions of queues. There are unbounded and bounded. I recommend the bounded. You can live dangerously. Yo, a queue. Right, yeah, so a queue, I mean, a queue is a queue. The difference here is when you construct a queue, the queue itself is not the process, but the queue that you construct has methods that allow you to create producer sinks and consumer processes that you can run from, right? So here I've got, I'm gonna go ahead and push two values into my queue. So I'm creating a queue with size two. My queue.nqueue gives me a sink. So I can sink values into that and that's basically pushing things onto the end of the queue. I'm gonna do a run async when I execute the task because I don't, obviously if I block that it's not gonna ever get to the dequeue part of it. But then my dequeue gives you a process of the values coming off the end of the queue. So that gives you something you consume. Now, you can have more than one consumer, you can have more than one producer and it'll do the right thing in terms of interleaving. It's nondeterministic which ones are gonna get off. I can't remember if there's fairness. There's some fairness that was in the merge end and so if you actually are using those to compose the two processes that are pulling things off as fast as possible, then yes, there is fairness. But not inherit dequeue itself. It's more what the process is that is producing or consuming values. Yeah, it'll block. I think the actual implementation is non-blocking. The implementation is non-blocking. So it's not just wrapping a re-blocking queue. Right, right. It will asynchronously block. Yeah, but yeah, you could tie a process to queue and queue and just shove a whole bunch of stuff in, but it's gonna essentially stop as soon as the queue fills until you attach consumers to the end of it on the dequeue side, if that makes sense. It is important to note that if you have, like if you're trying to do multiple consumers on the same queue, like Derek said, they're not gonna get the same data. You're also as of master, they're not even guaranteed necessarily to have strict ordering. Like the ordering will basically work, but we can only guarantee ordering if you decide to, if you allow us to throw data away, which is not necessarily what you want. Or ever what you want, but yeah. So that recovery stuff causes problems. Right, so go ahead and run the queue. I get my two 42s back out on the consumer. Remember to close things down. There are a whole bunch of things where you're building these kinds of things where they're shut downs, especially with like queues and topics, which I'm gonna talk about next. These are things that are like not themselves processes, but they do have things behind the scenes that are doing things and so you need to cleanly shut them down. A topic is like a queue except consumers receive all messages. It's pretty basic. It's like PubSub for processes. One thing to note, and similar to other PubSub architectures, is that when you subscribe to a topic, you're only gonna get messages essentially from that point forward. There is queuing actually involved inside because if a consumer gets behind on things, you need to be able to buffer up the values they haven't consumed yet. And so, I think Daniel pointed this out when we were talking earlier, there is a possibility there that you could blow up your resources that way. Yeah. Perhaps not for production use. Yeah. If you had a topic in one slow consumer, it's not gonna stop the one producer if you know the topic. No. She moved that. Yeah, your heap will stop you, eventually. With queues though it will. Yeah, with queues it will. Queues it will topics it won't currently. There's also signals. So signals are another many of these. They're available in both discrete and continuous variants. But basically a signal is kind of like a, it's a signal. You can set values and consumers of the signal get notified when the value changes or when the value is set. In this case, I'm just gonna do a really simple Boolean signal, say async signal of false. And you see it actually prints out the RX false. So I have a discrete signal, I'm mapping it into a string version that I'm sending it to standard out for the consumer. Because there is initial value set, I get the initial value when I pull that off, right? So this is not like topics where you only get things from when you subscribe forward. It's like you will get the current value, I think. And then value changes going forward. Value changes going forward, there's, you can basically, there's a set method that lets you essentially do an F of whatever task and run that. Interestingly, it's not, the signal doesn't get, or the consumer doesn't get notified when the value changes, it's whenever the value is set. Is that a good thing or a bad thing? I was surprised when I saw this behavior, actually. I thought that it should be when the value changes, but then you have to, yeah, you would have to have an equal. You would have to know, like then scholars at stream would have to be aware of what it is that it's actually signaling on, which then I kind of understood why it is the way it is. There is network support, TCP and UDP. I think there's still some bugs in the TCP, so I'm not sure that's ready for prime time, but I can fire up a UDP listener here. And this is just like a process, just like anything else. Listen is gonna open a socket and then receives, gives me a buffer size. I'm gonna take some number of packets off that and do something with them. So I'm gonna use my handy netcat here and fire off some test packets. And you can see that I get these down here. And because I did take three, I actually get to continue. If you're gonna do network related stuff, I would really recommend looking at scholars at NETI. This is, it's a really nice wrapper around the NETI library with all process-based stuff. I think it's NIO1, yeah. And the TCP stuff uses NIO2, it does not work. It also, NIO2 does not work on certain platforms. You can do exactly like bugs in Mac OS X, so UDP does get solid and it's NIO1. Yeah, so if you need an Echo server or you're set. So if you have a job where you're building Echo servers, I kinda wanna talk to you. That sounds almost like- It's the same company as the INTS. Yeah, yeah, right. All right, so we've talked about a couple ways to do transformations, but everything that we've talked about so far has kind of been stateless transformations. Like if you have map or you have a channel, you don't have any way of kind of looking back at what it was that the previous values were, anything like that. But there are some really useful things that we wanna be able to do with there. So we can build these with another part of Scholar's district called Process1. Process1 is basically a process that takes a single input, produces a single output. You can use it to build tail recursive like transformations. I say like because remember that when we construct processes, we're chaining together thunks. We're chaining together essentially functions that will give us whatever the next value is when we need it, right? So it's not actually tail recursion, but it'll look a lot like it. This is something that we use at Simple Energy. Like I said, we ingest a lot of data. We wanna get statistics and some other stuff on the data that's flowing through the stream. And so we wanted a way to be able to say, okay, given all the stuff running through here, kind of classify what the values are and gather that into some statistics, something that can be reported when we need it. And that way we kind of do heuristics. We can say, okay, we have way too many failures, you know, low number of successes, all that kind of stuff. So first we need a monoid to collect our stats in. And because it's the easy way to do things, I'm gonna use a disjunction to count failures and successes. Just to make the code a little easier to read, I will actually use a case class for the collector and for the monoid here. So I got those set up. We're gonna classify with an accumulator here. So I've created a method here, go. And it looks like a tail recursive function, I'm not gonna use that tail rec, right? But the first thing I do is I call process receive one or. And what process receive one or does is it waits for a single input to come in. And if input does not come in because the upstream process has terminated, I get this or clause that I get to execute. And for right now, I'm filling in with question marks. Next slide I'll show what we're gonna do there. But essentially at the end of a process, we will do something there on the or side of things. For the second function that's an argument to receive one or this is the actual processing thing. This is saying given a value that came in from the upstream process, I'm gonna do something with it. And what I'm doing is, first thing I need to do is actually emit it because I want to kind of observe statistics on what's going on. I don't wanna actually change what happened in the stream. But then I just append that to the next invocation of go along with a semi-group append of whatever the current accumulator is of my stream status case class along with a fold. The fold is a method on this junction that basically you can say, if you have an A and a B, you get two methods. One gets applied if it's an A, one function gets applied if it's a B. And so because my left side is an error, I wanna produce a stream status zero one because one was failures in the stream status class. If it's a right side, I want stream status one zero. And that's basically gonna, the append there, the pipe plus pipe, I'm not even gonna try and name these things, is going to basically sum those things up, sum those case classes up as they come through. So to expand that a little bit then, now I've got my full report stats. And you can see the basic meat of my go method is in there. And then I just basically kick things off with a go M zero that says, I'm gonna start not by cooking the books. I'm gonna start with zero counts for everything and go from there. And then, and the way that you use that, like everything else in ScalaZ, there's a symbolic name and a normal English name. Here for conciseness, I just used, it's pipe greater than is basically saying, given the parsed pipe, pipe that through this process one called report stats. And all I'm doing here for my collector function is I'm saying, I'm just gonna print whatever the collected statistics were. And I can run this and I get back stream stats and said I had three successors, one failure and then my, you can verify that by looking at the output or run log. So we use this on really big streams. And like I said at the beginning, the really nice thing about ScalaZ Streaming, it is designed around constraining the resources that you're using at any given time. And this is one of those things where you can do some work with really big data sets and not have to worry so much about, how is this all gonna fit? No, you couldn't do this with observe because observe doesn't have any way to pass the accumulator, right? So observe you can use to basically say, within this particular process stream, I can kinda look at what the values are. That's why it's observe, but you can't really do anything with them other than report them. So you can't observe and then take the, it's not like you can't actually split the process so that you're observing here and doing something else here. Any other questions? All right, well thank you very much. Oh, oh, sorry, sorry, totally forgot. One last plug here. Scholas-Ed.net I already plugged, should definitely take a look at that if you're doing network stuff. If you're doing stuff, HDP related, HDDP for S is a really, really nice project for kind of a lightweight rest layer HDP service that uses processes in Scholas-Ed stream for input and output. So, thanks.