 Yep. So a little over a year ago, I came to the Haskell SG meetup the first time and told about a project of mine that actually came out of the previous work that I spent, I think, two weeks making an API for some web service. And yeah, I thought, you do it first time. It's learning, you can figure out certain things. Even if you could automate it, maybe it's not the best moment. But you do it a second time. That's not learning, so that's not pleasant. And if you do it a third time, and you didn't automate it, and it's a repetitive task, and shame on you as a program. So I didn't do anything second time. And actually, I plan not to write anymore. JSON disentangles mines for the API because I don't need it. So I started JSON Auto Type Project. It was initially quite rewarding. So I played with the union types and got basically something that parsed the JSON file and generated Haskell data type out of it, so that the Haskell data type would be serialized to the same structure of JSON and the other way around. And this is the story of how you use it, basically. So I started with something that I told like one year ago, so how JSON Auto Type works, and then how I used it, and what are probably the things to do next. So one year ago, I kind of was very creative about what can I do with it, like maybe servants, outer generation, maybe something else. And now I have more concrete vision out of the applications. So the advantage of the types everybody knows is very comprehensive API description. So nowadays, every documentation of every open source or non open source project has usually automatically generated documentation that mentions the types of all the input variables. If there are no input variable types, because the language is rather dynamically typed, then you still quite often anodized it with what should be in. Then second, it allows very concise input-output validation. It facilitates a lot of runtime optimizations and allows us to decompose certain problem domains. That's the theory. Then you have JSON that is used, basically, fragmented very dynamically type JavaScript. So there are arrays. There are dictionaries. There are strings, numbers, and booleans. And that's all. And that's used to represent potentially very complicated objects. And it is convenient because it's easily shuttled to the browser from the browser. Even if you have nowadays some program running in Haskell in the browser, it will probably use the JavaScript as kind of virtual machine. So it's nice to debug something that is flying in between as a structured text. Maybe people like me who are very experienced with playing with XML think that it's a bit unwieldy to have a big data structure in XML. And some people that actually work with big structures in XML tend to use different kinds of pad words for very, very long documents. Because as you probably know, if you write something as XML, it's usually quite a little bit longer. So the next step after writing the output or any data structure as XML is usually to compress it so that it goes back to the size of the initial text representation. So then the JSON saves us a little bit. There are many other reasons. The foremost I would say is that it's passable from any language very quickly. And all languages have the parsing libraries, including Haskell. There is Eson. But the problem is with Eson, you have this representation of the JSON after parsing. And it will, of course, mention this dynamically typed structure. It has no concept of the underlying Haskell types that you want to encode it as. It doesn't check anything. There is a JSON schema, but most people don't use it yet. So what do we have? We can translate it. So basically, if you translate it using Eson to JSON and from JSON classes, you translate Booleans to Booleans, numbers to either ints or doubles so that you don't lose accuracy, or you don't introduce some extra insignificant digits. You translate arrays to either vectors or lists and strings to text, naturally. And the problem starts that Haskell is very strict. So if there is a string, there has to be a string. And maybe we would like to know that certain attributes are missing. So we add maybe types for the values that are absent, or they are null in JSON. For dictionaries, we use the records. Because usually, if we have dictionaries that describe all sorts, they will always have the attribute name. So we make it strongly typed records. So whenever we will pass a new input that should conform to a similar format as a formal input, we will use the same record format. And our software will be robust, because it will rely on these statically enforced types. And because JSON has a lot of untacted unions, we introduce the union type that is basically here at the end this colon bar color, colon pipe colon, that works just like Haskell either type. So it's either union of the left A type or right B type. Whatever is inside, it can be either any of the types above. Because normally Haskell tags, or at least has on tags the either type, and we don't want it. We just want to guess. So we try to parse type A. If it fails to parse type A, we parse type A. And this is a simple example. You have a JSON document with color name and hex value. And then basically, the parser will automatically parse this structure, generate the parser, and generate the validator that checks that the input file conforms to what we expected to be, and makes it a list, a vector of records that have either value or name. When we have it, we can, of course, shuttle it in both directions so we can basically assemble the first stage of software development. We can assemble example requests like test requests that we generate and those that we accept, and we have types for both of them. This is another example. This is union of two different types of records. So we have either parameters with value, that is integer or Boolean, or string. And the JSON auto tag will correctly infer this possibilities. And then again, we'll make it a simple data structure. So actually, if you have a very, very long JSON document, then you want to discover underlying structure. Maybe best way to just put it through JSON auto type and discover it this way. For example, I got quite a few kilobytes of YouTube API and recovered quite comprehensible record that describes what happens with different videos. So the way we do it is that basically we have these JSON values. We encode them as types and then use the union type unification. And the union type unification is very similar to normal unification, except that first there are no variable references that can cause or cause check to fail here, because the input initially is basically the static term without variables. But whenever two terms clash, we say that there is a union of possibilities there in this space in the term. So those for you who know how unification works, it's not actually unification. Because normally, for unification, when there is a clash, it's basically a failure. We don't recover from it. Here, it's opposite effect. We don't have variables, but whenever we would have a failure, we say, oh, there is more than one possibility. And when we have unions, we use some intelligent joining to preserve this fact that the type can be nullable or not. So we can treat the types as refinements. And nicely, we can do it without swelling types too much. So actually, usually, the JSON auto type generates a shorter description of what is on the input than the input itself. And actually, I already got queries from C sharp community from OCaml community and from Java programmers that it would be good to run this JSON auto type and generate code in their language for parsing the inputs. Obviously, there is demand. Unfortunately, I was so far not. I didn't really have that much time to actually finish this part of work. But at least Haskell part of JSON auto type is rock solid. So it's version 101 is automatically tested. I don't see any bugs reported anymore. And yeah, I think I initially experienced with open source there as soon as I put it on GitHub, within two hours I got the first issue. And it was rather embarrassing. So yeah. So now I'm happy I don't get any issues. So I hope somebody uses it. They don't yet report success story, which is a disappointment, but at least it doesn't break. So an example application was during the International Conference of Functional Programming, people present the awards to the coolest hacker teams from all the world that compete approximately one and a half months earlier for 24 up to 72 hours. So upper time limit is 72 hours. 24 is Lightning Division to write a program that usually it's some kind of game. So performs best in a certain kind of game. This year it was also a game, but all the descriptions were in JSON. So when the contest started, they basically put, I think, 90 something like this, example files on the web as test cases. And normally the first few hours people waste writing the parses, making sure that the game environment is simulated correctly and so on. I didn't take part, but I was curious about it, how long to estimate, how long it would take me. So I ran JSON with Autodype. Then I thought maybe I should have better names for these data structures. And in total I spent five minutes on the issue of getting the data in, which I'm very happy with. If I would not be a pick about variable names that would be literally five seconds, plus one minute for downloading the sample files. And maybe a few more minutes for reading the description. It took me about one hour initially to consider all the consequences of the description of the instructions for the task. So that's a good start. Then there is a company called Transcriptic. So basically they have the JSON description of the biological experiments that you can feed them. You can pay them for execution. And then somebody in a web lab makes sure that all the steps of the reaction are followed and that you get on products. So you send a protocol. Some of these protocols are basically books. You buy a book, like RNA purification protocols. Such a big book that tells you which reagent to choose in which conditions. And what is the exact ratio of this reagent to water? And what is probably the good way to check that your company that provides you the reagent provides it of high quality. So maybe it's not like you licked the bottom of the tube. Usually it's safer nowadays. There's all biosafety, you know. Now it's like submerging a piece of paper in the sample or something like that. It's crazy how you do it. But it's not unsafe anymore. So for the protocol, you basically mark the steps that they allow. It's usually mixing different substances. It's centrifuging them. It's mass spec. Against all odds, it's very, very, very work intensive. If you do it as a postdoc or predoc, I did it a little bit only. I know people that did it for a long time. So if you have a protocol and you have a reproducible way to do it, paying 300 along with this JSON request, it's like, oh, it's so cheap. I mean, it's cheaper than PhD students. Like, really, you need to train them. So they don't, for example, put their own skin by axing them into sample. Certain samples are killed totally when you just touch them or you breathe on them. So it's very good. And the just API descriptions that I downloaded that was few megabytes of documents, some of the fragments, these are just examples of the protocols. Some of the fragments are well-described on documentation. So initially, I thought maybe that's so simple or so apparent. You can just pick it from documentation. No, no. Documentation is mutually inconsistent if you copy and paste examples from documentation and examples from the other things. JSON autotype will show you that there are ors in the places where they shouldn't. So what I did is basically prefelt it. So I ran the JSON autotype on it. And then I checked where there are these unions that should not happen. So for example, if you have a protocol, you should not have a choice between water and centrifugation. Because water is not a protocol, unfortunately. So I'm exaggerating a little bit, but not much. Obviously, during the development, themselves, they didn't check, they didn't type check their own examples and schemas. So they are a bit. So it's not like everything would be tested with autospec and Hspec. And here documents always are executed to check what is the output. And in the large project like this, because it's basically a startup company that just wants to provide the service of biological experiment execution, that's a problem. So there is a big team of programmers that tries to sanitize it. And because they generate a new code faster than they sanitize the old one, you can expect that it's not enough. So yeah, there was also another issue. Many people use dependent records in the JSON. So basically, there is always this one field that tells you what type of record you have, or what type of object you have. And JSON autotype doesn't yet handle it. It will have to handle it, though. So I will add it. And Twitter, API. Of course, everybody who read the Joel Gross book, there is, of course, this big, big example on YouTube videos and Twitter API. And it should be simple. So it is simple. You just download one example. You just register with the secret and the API key. And you download one example from Twitter. And you run JSON autotype. And you feel like you've just passed it and you check that it's right. And if you write a program, it should work on the future versions of Twitter API because without any problems, it will at least warn you when the schema of the JSON changes. So then I notice maybe we need more than this. So maybe it would be nice for JSON autotype to first tell you that there is a similar library that already handles this API or this data. And the second, it would be nice to wrap the requests for the API. So that would be my next step. Because obviously, for large APIs that you have like 100 event points, I already played with basically downloading the data from different endpoints and then running automatically JSON autotype on them. But it would be even nicer to wrap this process entirely. And of course, people also, besides the support for other languages, people ask me for date and hex format support. It's probably good if I start generating type mismatch errors for JSON to get better error messages. But strangely, no user complained about it so far. And my personal challenge, if you are using any kind of software to parse a lot of data, then there is this first discovery stage. So you spend quite a lot of time just finding how to parse it. There are other libraries to parse it. And which of them is the best? And is it beautiful, so in the case of Pythoners, is it one or the other of the? Is it R to parse or parse based HTML parser for Haskell users? It really depends. So my challenge is to discover a way to make this whole phase automatable. So you just put a set of input files in the directory and you run your program and you get everything passed and you automatically install all libraries that are needed for it. And you can start processing the data. That is a lot of our analysis tasks because my friends told me that they spend as much as 90% of time actually getting the data in and trying to sanitize it. That's already an issue. And if you doubt it, you can refer to Digital Web Analytics Meetup, for example. Any other data science meetups in Singapore? Any questions? Say you're saying like 90% of clients spend in any very attacking data parsing. Yes. For applications, web applications, which require very, very, very short latency. Yes. And using Haskell, is it like a roadblock because you need to do all this because there are ways in which you don't need to do all this? Why the opposite? So Haskell compilers have went long way, particularly for the past 10 years. So you see Haskell often challenging knife implementations in lower level languages like C or Fortran. And in case of web development, it certainly has a good chance to be much faster than uncompiled at least PHP, Python, and Ruby. I mean, by code, a lot, lot faster. I was wondering because some of the, so right now I'm doing this web application, which is really time-transitive in the frame per second, like, you know, you need to achieve a particular frame per second because it's stringing data really fast. What kind of data do you stream? So it's the easy data, biomedical data again. It's like, like 20 program, brain data. So we're building like, there are already companies in the US and everywhere which are doing ECG streaming, live ECG streaming in your web app. So like your doctor can look at your ECG in the hospital if you're, let's say, having a heart problem at home, and things like this. So we're trying to do the same thing for brain data as well, because people with epilepsy, they get a tax and then their modulation in the brain waves to crack it at that particular time is very poor. So the latency is very good in the data. I would say the answer, you are right, is technology. So I would not use the HTTP. I would use the name HTTP. Like, I would not serve the HTML request each time that you get one more second of ECG data. You use WebSockets, WebRTC. But I'm going to play. It can be, but then you need to use a WebRTC or WebSockets, not naive HTTP request. And there is WebSockets support in Haskell, of course. There should be no problem with it. And in the case of Haskell, if you have very, very small amount of data like for ECG each time, then actually, garbage collection should give you optimum performance, because it will be probably mostly all the time in cache. So you should have very, very predictable software time or maybe even hardware time behavior. I'm interested in what kind of technology to be used right now for that. So we're using a JavaScript library that is PCHR, P-E-E-C-H-A-R-D, border license for that. And we are having a lot of problems. We initially used D3.js, because it gave us nice visualizations, but it performed very poorly on the frames per second. We needed a minimum of 20 frames per second, and it was given us maybe eight or nine. Then we used PCHR, which stripped away the fancy visualizations, but at least gave us the acceptable frames per second. So that's what we're using right now. But it's all on the client side, right? So that was of course our thing, like also, what would you use, stream the data, or what would you use as an API server or whatever? Yeah, right now, we're just working with real patient data, but it's already stored in the cloud. And we're just using the API to pull the quest. So right now, we're not streaming the data through a database from the patient side. We still have to build that. We only built this part where we already have a database, which is stored in the cloud, and we have an API which is in Python. And that basically, your front end calls an API to get it. It's not through your time. No, for now, the real time, it will be. Yeah, so trying to explore what we could potentially use to make it, yeah. But again, the challenge was, first of all, you have to be able to get many frames per second. That's the first step. So now we've achieved that. Now the next thing is to, how do you, so one part of the pipeline is done. Let's say like the cloud is your middle man, and then there's a front end and a back end. Back end is the patient which is producing the data, and then there's a way to stream material time to the cloud. And then the front end, which is like API is pulling and passing it down to the front end. So this part is done, but this part is a question mark right now. Yeah, because like HDP itself won't fit this kind of limitation because you have to, it's back end that initiates possibly information that, so otherwise, you have to follow your path from the front end. Yeah, so right now we're doing caching also because, because we can't pull that fast. That's another aspect of what we, we have to do. Yeah, I would say like, yeah, case for platforms, we have to spend similar issues for radio streaming information, like charging and drawing it. So like, there's no way, amazing to be able to do this, because we have to really quickly pull the crypto, and as your clients grow, they'll be like, huge as of in fact, when you're back end and go, so like, you know, 10 or 20 users would just like, put that on your server. Yeah, and the other thing is we don't even want to do a little bit of the caching part, because I mean, caching is not real time, you know, like, we want real time, like perfect real time communication, you know, but, so, but at least frames per second part again, as I said, like the problem that we initially started with the sort, now, you know, like, we go backwards and it's one of the reasons for the, yeah. So your frames per second limitation, I mean, you said you changed libraries on the front end. Yes. So is that an issue with the front end dominant or is it an issue with crossing the industry? I think it was more of an issue with the policy of the industry. Yeah, just one second, what I already said, like, if you would, for example, use something like Python or Ruby on the back end, then I'm sure with Haskell, it should be much faster. If you already use something like C++ or Java, then, like, Haskell, if you could comment with that, if you're just on profile, I think, you know. No.