 Welcome everyone to this session of free based DSL for distributed compute engines by JoyDeploy. We are glad that JoyDeep can join us today. So without further delay over to you JoyDeep. Thank you everybody for joining this talk on free based DSLs for distributed compute engines. So I'll be talking about some of the work that we have been doing at COTAP and is almost at the verge of open sourcing it. I would also be sharing the repose. So let's start at first I would just like to provide an intro about myself. I work as a principal engineer at COTAP Data Engineering. I have around nine years of industry experience on distributed systems and big data. I started working with Scala since I think the end of 2019 when I was actually playing around with the Apache Spark which with the Scala API of the Apache Spark. So I started out and then slowly and steadily moved into functional programming. So apart from that when I'm not working you would generally there are a few things that I do. One is coding. I try to try out new recipes especially during the lockdown period. Really got into it. Also take an interest in playing table tennis and off late I have taken an active interest in chess. So playing on chess.com and Liches. Things like that. That's enough about me. Let's move on to today's topic. Okay. Yes. So let's talk about first about the motivation. How we came about creating these projects. So at COTAP where I work we have multiple heterogeneous data pipelines. When I say data pipelines these are data processing pipelines or data processing jobs which are run on Spark data flow which is Apache Beam, BigQuery, all sorts of pipelines are there. And then again these are SQL based pipelines and they also have custom processing logic. And when I say custom processing logic they're both written in Java and Stala. So that's where we started out. And the problem statement is basically can be divided into these two types where first is the coupling of the business logic and the runtime engine. So basically what I want to mean here is say suppose a job that is written in Spark cannot be run on Beam. Why do we need it? In today's world we are bound to use our cloud providers and these cloud providers hit a lot of bottlenecks like maybe cluster scaling issues or we don't get unavailability of on-demand nodes or simply we want to try out some other engine to see how the cost optimization or the cost figures are. But we are even if these jobs are written just in plain SQL, in Spark SQL, we cannot take that job and run it on Beam just because of the amount of boiler plate that is involved in it. And also the amount of developer effort that is required to do this coding. So that is something that was one of our thing, one of the problems that we wanted to solve. Second thing I would like to talk about is there was no code and functionality reuse. So what I mean by that is two things, inter and intra. Inter is between Spark and Beam. So basically say suppose I have written a functionality in Spark, I cannot take that functionality and run it on Beam. And also we also wanted to use the basic, the compositionality function, the compositionality of functional programming. Basically say suppose I have written some code which is in Spark, just do some good industry practices, put it inside a library so that it can be reused in some other repo. So within Spark code also there was a lot of repeated code, similar logic, lying in different silo which were all doing the same work. So first was simply have a well-defined library where things can be reused and then extended to something which is across platforms. That was the main problem that we were trying to solve. So let's see what was overall design consideration here. So one thing that you can see now is we want to standardize all the actions in the system. So when I talk about standardizing all the actions, we definitely need a domain-specific language, a DSL, in terms of which we can talk and then actions can be taken. And the motive of doing this is to do segregation of the platform and the domain. So the domain is the business language that we speak to do standard actions and it should not be tied to any platform, Spark or Beam or whatever. So we wanted that. And another very distinct feature that we figured out is within this DSL, within the AST, within the abstract syntax tree of my DSL, the atomic units, we needed some sort of combination. We needed sequential control flow. They needed to be combined in certain order where the output of the previous DSL is the input to the current DSL unit. So we needed that kind of a functionality. As you will see, the data IU and the ZO flow project has that. And again, we also needed DSLs which would run parallely and independently. The data quality or the data expectation supports that. So we are able to run independent computations, get all their results independently and then calculate something out of it. So this was, we were mainly looking for something like this. This was our design consideration. Okay. So what was our solution? So our solution were three distinct projects, three items, ZO flow, data expectation and data IU. So ZO flow is what we are calling the general model of computation that we have gone on to create. It reads data, it supports all kinds of transformations that we want to do on data. It also supports data quality and then you might want to send some alerts and you want to write your data. This is the general DSL that we wanted to create. Out of this, we figured out that the data quality in itself needs a separate DSL, needs a business language of its own. So we created another project which is data expectation which mainly addresses the data quality part. And then we had data IU. We figured out that the storage, you know, the blob storage, there's three on the GCS, read and writes are not that simple. So I'll give you one example. We have, you know, something called a zeotab specific feature. Okay. So when I say this, say suppose we have some optional columns. So when I read a data source, say I'm talking about Spark, when I read a data source, it was part data frame. I append some optional columns. Okay. If they're already not present in the data frame, append those columns and move forward because they might be needed in the downstream pipeline. So all these specific, you know, use cases, zeotab specific use cases or organization specific use cases is something that we wanted to support. So we created a separate business language for data IU itself. So having said that, so how, so now I would want to show how we used the functional programming constructs and how we use that to solve our problem. So first I would like to talk a little bit about the DSL part of it. So DSL as we know is a domain specific language, business language generally talks about what and forgets about the how part of it. Okay. It's not really bothered with the implementation when we talk in terms of DSL. We are just talking about how, you know, what is to be done. Okay. So, sorry. So our design consideration for the DSL itself was very, you know, clearly defined. We needed something that can accommodate a language with an AST like structure where I can define my atomic units of my DSL. It should behave like a monad because we needed a kind of, you know, sequential control flow. We needed to combine them in different ways so that we can get our desired result. And the runtime, the distributed compute which we are talking about or the execution engine would become plug and play operators. I would talk about this business language and I would plug in the spark or the beam interpreters at runtime and I'm able to generate the output. So these were one of our key considerations that we looked at. So having said this, let's move on to some of the functional programming concepts which we used, which we explored to solve this problem. Okay. So the first thing that I would like to talk about is reification. So what is reification? So reification is something which was abstract in your code. You make it concrete. Something which was implicit in your code. You make it concrete. So you take some concept that was implicit in the code and you turn it into data. You then can use it to be manipulated by your program. So that is basically what is reification. Let me show a few examples. Let me take the example of functional programming itself. So in object-related programming, functions are not data. However, in functional programming, you can pass around functions like data. So these were implicit. But now they have taken you can pass around functions like data and then you can manipulate that. Another example that I would like to give is the example of the option monad. So option monad really refies the concept of partiality. Earlier we would either return a null or a result from your code. But now we have defined them into a specific data structure which is your sum and none. And now you can use these data structures further in your code to manipulate to write a program basically. So that's what we are really looking for. So it is definitely it is everywhere. Clisly is another example of the reification of the concept of effectful composition. Trampolin is there in cats. It's reification of stack safety. And finally DSL which we really want to implement. So reifying the steps of a combination of procedure in order to represent them without actually executing them is a very common implementation strategy in functional programming. So we want to do that with our DSL. We want to take the atomic units represent them but not yet implemented. We want to defer the implementation to a later point in time. So let's see how we can do that. So first what I'm going to do is I'm going to show how we can go ahead and do something like that. So if you see here, if you had to write these functions this would be functions in your code. You load the source, you take some source paths and you return a data frame. You write to sync, you take some source paths and you return a unit. So these would be implicit in your code. We took this, we created some data structures out of it. We created an ADD out of it. So we have these case classes load sources and write to sync and they extend this flow DSL. So let's look at what our DSL would look like. So we have these are the basic atomic units of our DSL. Load the source, load the user defined functions run the transformations assert data quality, write to sync send alerts. These are the basic units. Okay, let's just skip to one slide. So now if you see what we did is we defined a trade flow DSL and all my ADDs, all my atomic units are now extending this flow DSL. So load sources, it extends a flow DSL and one thing to understand here is it returns a generic type A. Okay, because at a separate point in time, while loading the user defined functions, I might want to return a unit, but while loading, while doing the, asserting the column expectations, I might want to return some expectation result. So that's why we have kept it a generic type. Okay, so now what are we going to do? We have created this DSL. What are we going to do with this DSL? So I'll just outline the steps that we are going to perform. First is we are creating the DSL. We'll lift it into something called free. We'll come to that in the next slides. Then we'll construct the program using the DSL and we'll provide an interpreter for that DSL and we'll invoke the program in this structure. We'll invoke the fold map method on the program passing the interpreter separately. So these are the sequence of steps that we intend to perform. So before I move on, I think it's important to understand what a free really is conceptually. So let me move on. So one thing is if you look at the free structure, now if you see on the left hand side, you have this free flow DSL comma A. It is of the structure free S comma A. You can think of this free as the program which uses S, which in this case is flow DSL, your language to compute the value of A. So free is the program which takes your language as an input and computes the value of A. Another two more distinct things that I want to talk about free is which are there on the right hand side. So free kind of provides a monad for your AST or DSL. Now what does that mean? See the monad helps you kind of take the steps of the atomic units of your program and not provide an implementation of it to it. What you do is you take those individual why do you create these individual steps in these atomic units so that you can sequence them and you create a bigger abstract program within sequence and you can create a bigger abstract program. Now your AST is not really a monad. Now in order to sequence them you need something which is a monad. So that's where free comes into picture. You cannot use your for comprehension on top of your AST. So you lift your AST into a free which we can now understand definitely has a flat map inside it which is helping you to use it inside a for comprehension. So that's what we mean when we are saying that we're lifting your AST into a free because it will provide us with a flat map it will help us use it in a for comprehension. And another very important aspect that we need to understand it is separates your DSL from the interpreter. So you have written a DSL of which you have not given any interpretation any implementation as of now. These are just algebraic data types. You will provide the implementation in a separate structure which we call as the interpreter and these interpreters can be different at the runtime. It can be a spark interpreter, it can be a beam interpreter. So you can just call them. You can just call your program passing different different interpreters here. Passing different interpreters here. Sorry. Yeah. You can call your program using different kinds of interpreter here. That's what we are looking for. Okay. So having said this let's complete the usage of free how we have used it in our program. Let's move to that. Okay. So first I spoke about so you had your AST here. This is your AST flow process. You have to somehow lift it into a free right because you need the flat map which is free is providing in order to be able to write bigger programs out of it. So we have written something called free flow DSL which is nothing but a structure which holds your flow DSL and returns a type A. Now if you see here this is being provided by the free data structure the cats free data structure used. It lifts your AST what you have written the atomic units and load sources the right to sink into a free. Now one thing to note here is the output or the return type we have specified as free flow DSL which is this guy and not because if we don't do that with the output type would show up as you know a load sources free of load sources but while we what we actually need is a free of load DSL we'll come to that why we need this. Okay. So once we have lifted this now what we can do is we can so we lifted this because we needed the flat map why we needed the flat map because now we can start using it in a for comprehension so we can have multiple variants of our business flow one can be a very simple business flow where we say load the source run the transformations right to the sink a very simple flow another can be you can load the source run the transformations and assert data quality so I'm not really interested in writing to some sink I want to get some report on data quality and maybe my downstream pipeline uses it another one can be you know you assert the data quality if your quality passes then right to sink or send alerts so there can be multiple flows that you can create let me show you some example okay so there's a you can see a lot of code here don't focus on the code just focus on the highlighted areas what ideally want you to look into so we have two methods here end to end flow e2e end to end flow maybe we could have better name for them end to end flow and end to end flow with expectations so if you see here what we're really doing is we have taken our load sources load user defined functions run transformations right to sink and we have put them in a for comprehension so definitely these we have lifted into a free and somewhere we are getting a flat map where we are able to use them in a for comprehension and now we're able to you know combine them in multiple ways so that we can achieve the desired output so in this case that if you see the desired output is a unit after writing to sink there's nothing we can return so we return the unit but in the other case because we the part of this free flow DSL you know we had kept it a generic type a we are now able to return at times we are returning a unit here and at times we are returning an expectation result so that is something we need to know because they can be combined in multiple ways and we have kept it as a generic type we're able to return whatever we need at runtime okay at least that is how we can write the abstract programs so once we have done this you know till now if you can see we have not given any we have not spoken about any implementation we just wrote some see here we just wrote some ASD we took that ASD lifted it into a free now we wrote some programs some high level program and we haven't we haven't said yet whether we want to do this work using spark using beam using big query nothing we have not spoken about the implementation at all we're just talking about in terms of some business language okay so now we will be providing the implementation so if you see here this is what the implementation would look like okay it's a function which if you see it's a function K of it takes a flow DSL and returns an ID of K okay what is this this is called a natural transformation so it's really it's a flow DSL is a representation of your ASD you can think of flow DSL as the source ASD which is being transformed which is being mapped to your implementation to your target ASD okay or rather I shouldn't put it as ASD to your target structure in this case ID is the target structure right so this is where you do this is where you define your implementation this is a very simple definition we just printed it out we just took the load sources what we are saying at runtime just print out the source paths and when you at while you've taken you know when you're writing to things just print out that we are writing to see but this can very well be some other implementation which we can pass some other interpreter which we can pass which will actually do some computation maybe use it to write using spark or using the right so that's what we are trying to do so you know having so one thing to note here is having really separated the abstract syntax tree having really separated out the flow DSL here helps us to interpret it in different different ways at runtime so now I've provided the ID implementation I might as well provide which we will see as well later I might as well provide some other implementation okay so yeah and this this sign here is actually a standard sign that we use for natural transformation okay so I'll come to that later and how we invoke it we've already spoken about it you take the end to end flow you call the flat map fold map function on it and you pass your interpreter inside the general form is program fold map interpreter that's it you're done okay so since we have spoken so much about free you know we ought to see how does a free look like or what is the base intuition behind the free okay so we have we have spoken about you know lifting something into a free and then writing some programs using it and then passing an interpreter so we ought to see what a free you know looks like so first of all one thing that we want to understand here is we wrote our DSL which was not a monad right instead we gave all the monadic responsibilities to free right so free must be must be some data structure which holds the steps of our competitions so that later we can traverse it and we can interpret it at our build right so as you can understand like our flow DSL had an algebra right free must also have an algebra which it uses to store our AST right so free has an AST of its own right but from our previous slides we can infer that free must also have some monadic qualities right it is allowing us to use it enough for a comprehension so free must also have a point and flat map but one thing that we are what we are trying to really say about free is free is itself reifying the concept of a monad you take something and you lift it into a free you suddenly get to call flat map on top of it what is this this is the same concept that we saw before reification of the concept of the monad itself so why not how can you reify a monad so why not take what is there implicit in a monad like a point and a flat map and try to reify that right take what is there implicit in a monad the point and the flat map the pure and the flat map whatever you might call and try to reify that okay so look at this now let me you know see what is written here you know our AST of free should resemble a point and a flat map we are trying to reify the concept of a monad so somewhere it should definitely resemble this point and flat map if you see here we have defined this trait or this abstract class which is the free which takes a higher kind of type and then look at this pure there is any similarity between pure and point okay takes the A and returns you are free of FA right so this is one part and the second part is although the suspend part is not actually you know exactly looking like a flat map but the first part of it is right it stores are F of free of FA right which is this F of a part right so if you can see from here free would definitely be you know a recursive structure where it would look something like this okay where you would have this recursive structure of suspend where F would be your AST that would be stored inside the suspend and it would be terminated by a pure right so suspend really what it does it uses it applies the F on the previous suspend to calculate the free okay so it's definitely what we can see is it's the terminal point here is pure and it's a recursive structure and A is basically the recursion carrier here so this F of A this A is basically the recursion carrier here okay so okay having said this let's see so once we have come to this definitely free is a monad so free should also implement the interface of a monad the point and flat map okay so let's see how it implements that okay so as they say in functional programming when you don't know what to do it's best to follow the type okay so first the point is really easy to implement if you see here we have this see great free and the point is really we can just take the data structure pure which we defined earlier here and we can just wrap the A within that pure right flat map will be a bit more involved let's look into that so first is we'll just follow the type so first we get the pure right so what we're going to do is we're just going to call we're going to pull out the A from pure and pass it to the F that's it you're done with pure okay next we have to implement suspend so since this returns a free it only makes sense to wrap it in a suspend whatever code we write it will be wrapped in a suspend because we have to ultimately we have to return a free right so next let's see what are the operations that are available to us that we can apply here so one thing that we can definitely see is we have a free available over here and if we have a free we can apply a flat map on top of it and pass the F inside so what I mean is this is the structure so the S is F of free of F right so if we can somehow pull out this free from this F of free we can always call a flat map which is this flat map itself we can always call a flat map on top of it and pass an F okay now the question arises how to pull out this free from this F okay so the ubiquitous function that we use to pull out something is the map operation right so we can use a map on top of S right we can use a map F dot map or S dot map and we can pull out the free with which we can call the flat map but the point is S is not a functor right we are not getting any you just this S is basically if you really look into it if you really look into it this S is really your F which you have defined which is your AST it is not a functor right so how do we do this so what that means is implicitly this is the only requirement that we have for a free implicitly we need a functor on your F a functor on sorry a functor on the AST that you have defined that's the only requirement so question is we have not provided a functor on F how would we do it okay so cats does that for you it provides you a functor on F okay so in principle if you see free is basically something which takes a functor adds a pointed part to it adds the monadic behavior to it the flat map part to it right and this functor itself is provided by cats okay there's a neat trick there it's really a lemma which is called Coyonada I'll not be covering that here but that's probably the last piece which you need to know this part is already implemented in cats in a structure called flat map and that's why you get a functor out of it so what is a Coyonada at a high level it is basically if you provide it a higher kind of type constructor which is this this is your higher kind of type constructor if you provide it that it will provide you a functor for free okay so that's how you get your functor and you are able to use it in your flat map as simple as that so if you guys are interested please go ahead and take a look at the Coyonada lemma and yeah that's been provided by cats okay alright so we have covered a lot of theory one second so now let me reiterate what I have said in a little more technical term so what we have done really is we have defined a higher kind of type right your the AST which we defined is really a higher kind of type F of something okay we lifted the higher kind of type to a free we wrote an abstract program using the DSM till now there has been no interpretation next we defined a natural transformation which was really F of A to G of A okay so what is this natural transformation it is really a function which you write from F of A to G of A what does this really mean the F of A is really a source a blueprint of what you want to do the AST which you have written is really a blueprint a model right representing something and the G of A is the target interpretation you are targeting it which you have written so if you see keeping this A constant maintaining the structure of A it is helping you to convert one higher kind of type to another higher kind of type so it is called a higher kind of type of first order higher kind of type what that means is here if you see this is a type constructed with one whole when we say a list it means it is a type constructor of one whole so I am giving just an example here when I say it is a list it is just a type constructor with one whole it is not yet a type when you say list of it then it becomes a type so what natural transformation helps you do here is it helps you convert a first order higher kind of type F of A to a G of A so here you have a type constructor with one whole okay it helps you keeping the structure of that A constant it helps you convert it into another container it helps it convert it into another monad or another structure that is what natural transformation really provides for you so if you see here same example there is some data expectation which we will look at next which we have defined is converting it into a Spark data evaluated both of these how do you know it is a higher kind of type it is a type constructor with one whole okay that is what we are saying so let us quickly look at the next set of items which are free applicative because as I said there is one thing that I mentioned at the beginning is although we needed for comprehension for sequential operation we also needed operations which were independent in nature okay so I will give you an example of these independent operations see suppose in our data set we have a column called gender and the gender column can maybe there are two things that we want to know about it one is is it non-null or not it should always be non-null another thing it should be one of the values between male and female okay now whether it is non-null or not has no implication on whether the values are male and female or not we want to compute both of them independently and then combine their result we don't want to do like once we have asserted non-null then only go ahead and compute whether they are within this value okay so these kind of independent computation is something that we were also looking to represent in the data quality piece and hence we wrote this interface which is our data expectation more or less it is the same interface with just a small change of how we are combining so I'll show that so here you have data expectation we have written some constructs on top of it should be present one of and these are the entries maybe male female and non-null always okay next what we do is if you see it is exactly same instead of defining a type with free we define the type with free applicative okay we put the data expectation in place of s and then we have a so this is the type we wanted to define validation and then free applicative also provides a lift which actually lifts your ASD lifts your DSM into a free applicative so it's exactly the same so you just lift it once you've done that now is the only change that you'll have to keep in mind so we want to check whether column is present or not and we want to check should have one of male and female and we also want to check should always be non-null okay these are examples of two programs so we use a structure here called as map N okay instead of for comprehension we use something called map N what this means is this is really a product where all of these will be calculated independently and the result of them will be joined in some business logic that we provide so here we wanted to add that maybe this is a Boolean and we are saying that only when column is present and it should have one of these entry and it should have it should be non-null return a true something like that okay so this is the only difference that we have okay and this is the implementation again exactly the same one thing to note here is the the the hierarchy type that we are using here spark data evaluated okay this takes a data frame and returns any so a data evaluated as we will see a little later that it can be of multiple types the interpreter the output type of the interpreter can be of multiple types so this is one thing that I wanted to point out here okay now invocation is exactly same program full map interpreter so you have your mandatory column checks you do a full map and you pass in your interpreter here okay I am sorry only thing to point out the applicative and free monad is free monad you will use when you have dependent computations and free applicative you will use when you have independent computations like we had for data expectation okay let me show you an example of the interpreter and then we will probably wrap it up so alright so before we have written a lot of interpreters in our project and we have made heavy use of the state monad so let me just briefly explain the state monad first which is in functional programming state monad is used to handle application state so what I mean by that is you start with the state and you calculate so you start with s and you calculate s comma a where s is the updated state and a is the value that you calculate okay the result that your function produces so it's not really s I like to represent it as s dash okay where s dash is the updated state okay so when you put it into a for comprehension the previous state which you calculate is passed in the next computation to be used okay so that's how the state monad at a high level works so I'll just show one example here so if you see here we are using a state monad for you know so what we are really doing is we have taken some data frames we are loading the data frames then we are running some transformations on them then we are running we are loading some user defined functions and when the sequel is running they are using those into the defined functions then we are running some computation if you see here this is how we use the state monad to do these computations we have the state monad here remember the the thing that we have to keep in mind is s to s dash comma a which is you started with a state which is this context okay so let's say we are calling the for comprehension in this order load sources load user defined functions run transformations assert column expectation right to sync let's say they are in this order okay so when you start with a state first I will pass in an empty map let's see first I will pass in an empty map if you see here an empty map is being passed we loaded the source okay so this s here converted into it was an empty map it converted into a map of string and data frame here you actually have the loaded sources with you and in the a I just passed back the empty map okay which is really this line here so the context if you see the context is passed as a a which really does not have any impact but whatever state we loaded right whatever sources we loaded we passed it in this left-hand side in the s dash part okay so next if you see next we have the user defined functions so generally in spark we have the user defined functions return a unit it does not have any change or any bearing in the current state which is your map right so nothing changed in the map whatever map came here we just passed it as it is and we loaded some user defined functions and that's why we have returned a unit here so whatever we are calculating we are kind of returning that so we have returned a unit here okay next step let's see next step we have run transformations okay so we are now we need whatever map was here whatever map was passed here we ran some transformation on top of it some queries which will transform it in different ways so if you see here we passed this map and the enhanced map was created this will have more data frames and you know more data frame names inside it and on the right side we just passed whatever came from the input okay once we did that the enhanced map is now the input enhanced map is now the input to the hazard column expectation part so here we really don't do any transformation or we really don't change the state what we instead do is we calculate some data quality checks on top of it so we run all sorts of data quality checks as part of data expectation which we have shown we run all those checks now if you see in we are returning a column expectation from it okay so if you see here we have returned we have run the column expectation and the output of this guy is actually a column expectation which we can choose to use or throw away whatever right but the initial state remains unchanged and finally we pass it here where we are writing to the sync here again the state does not change but we write the output to you know some external storage okay some blob storage and that's what we are doing here we are taking the map and we are writing it out to some external storage so that's about it okay that's how we are using the state monad okay so one point one note that I would like to make here is when you are writing these interpreters the most popular choice we saw is the state monad because what you are really trying to do is why have you written this for comprehension you are really trying to pass some previous state to the next computation you are trying to take the output of the previous atomic units atomic unit and pass it to the next computation and that's why in most of the cases state monad is the most popular choice probably but then we also used a lot of these readers okay the reader monad so if you see here we have reader of data frame writer this is the spark data frame writer reader of data frame reader and then we have reader of p collection this comes from beam these two come from spark we have used a reader monad here what a reader monad really does is it takes an environment and calculates something using that environment okay but in this case also this data frame writer and this p collection reader really stores a state inside it and that's why what we really did is we took the data frame writer set some state on state on top of it okay so the data frame say reader for spark you want to mention whether you want to load a csv or not so you just set an option csv that's it so the entire state thingy is maintained by this data frame reader so we did not need to use a state monad but otherwise here also popular choice our popular choice would work you know we would be using a state monad just because these guys are storing their own states we use the reader and we make made them you know independent so that's one thing but otherwise also one point I would like to say is we have to be creative while you know writing interpreters it can even be a function like I was trying to show earlier in case of data expectation what we really wanted to achieve is we will have different different you know interpreters where where the type itself would take a data frame and return a if you see this is the type this is the type with one whole where a is something which is really malleable we can change that according to our wish and it itself returns a function which takes the data frame and returns a function okay so you can even be a function your higher kind of type need not be a data structure it can even be a function so you have to be creative there whatever fits the bit basically so yeah that's about it I think I'm done with the time as well so these are the repositories that we have open sourced till now so zio flow and data IO has been marked public so the first two data expectation is not yet public we will be doing that very soon maybe in the next coming weeks for some people who are using spark we have also open sourced spark property tests library which is to write better property tests on spark and it is available on maven as well so please go check it out and we have Gitter channels and it threads as well so you can comment there and tell us about your feedback so yeah that's about it and yes thank you for joining this talk thanks for sharing your experience with us today