 All right, so I'm going to present applications of DSLs. I'm going to show some examples of code, but most of it is about the applications, why I decided to use Kotlin, why I decided to use DSLs. All right, so I'm research assistant professor at UIC here. I'm working with natural products and drug discovery. So we are looking for new drugs in plants and microbes. So we go and collect plants, collect microbes in the soil. For example, most of the antibiotics that we have are coming from bacterias that were found in the soil. And I'm working also with the Institute for Tuberculosis Research. So we are looking for drugs against tuberculosis and most of the really nasty hospital pathogens. So the name of the field is called pharmaconeosy. Many people don't know that. So it's really about finding drugs. So it's not just natural products, like generic term for natural products can be anything you make your clothes with. You can make baskets or whatever. Pharmaconeosy is just about drugs. And we are dealing with a lot of different kinds of data. We are dealing with taxonomical data, so it's classification of plants and organisms. We are dealing with chemical data. So we are dealing with structures of chemicals. And it looks easy when you see a structure of H2O on paper. But when you want to have the structure of sugar, when you want the structure of a protein or things like that, it's much more complex. And it's not something you can just look at with a machine. You have to have machines which will give you tiny bits of information that you have to link together. And then you can find the structure of the compound. And that's what I refer to in the spectrometric data, too. We have biological data. So I'm going to gather microorganisms for the soil, make an extract, like a tea, with it. We are going to separate the different molecules from it and test that, let's say, tuberculosis growing on a glass plate. And we have to see, oh, does this thing kill the tuberculosis? Does it make it grow more, or things like that? When we give it to humans, what happens? Do we kill the humans? Do we make them live longer? Does it cure the disease? Does it cure something else that they had? So that's all part of what I call biological data. So yeah. And when we do drug discovery, we deal with a lot of different things. So we deal with, well, you're on the screen. So we deal with bacteria. We deal with crystals of proteins. So we have to send them to argon, where we have the particle accelerator and shoot a bunch of x-rays on it, and to allow us to have the diffraction picture of the crystal, which indicates us what things are in that crystal, and how they are shaped and connected together. And that gives us things like that. We have people in a safety lab, where you have to be fully gone, because we are dealing with the worst kind of TB. So you don't want them to breathe it. You don't want them to bring it out. We have robots that handle liquids. We are dealing with superconducting magnet. I will show you a bit more. We are dealing with mass spectrometer and the shit ton of data. All right, nuclear magnetic resonance. So you have probably seen of being in an MRI, and it works exactly the same way. So what happens is some nuclei, for example, hydrogen of specific kind of carbon-13, when you put them in a big magnet and you shoot them with radio waves, they are going to get in that direction. Your magnetic field is like that. It's going to align with magnetic field. You shoot them with radio frequencies. They are going to try to follow the radio frequency. So if the radio frequency is coming from here, it's going to try to follow it, and then it's going to rotate. And depending on the speed at which it rotates, we can know what is around this specific atom. So what we look at is the speed of all the atoms in a specific sample. So if I take some beer, I put some beer in the tube, and I will see the... So water is symmetrical. I will see one signal for the water, and I will see several signals for the SNL and several signals for whatever was in your beer, hops or whatever. So we have complex equations for that. We need really, really powerful magnets. So what we use is superconducting magnets, and they are based in liquid helium, which is itself based in liquid nitrogen. So we keep everything pretty cold, so it can still be superconducting. It's millions of time higher than the Earth's field, so it's pretty fun machines to work with. And in the end, the sample is something like that. And when you are in MRIs, the only difference is exactly the same technology. It's just that it's a much bigger magnet with a lower field, usually, because it's harder to get a nice field real big. And, of course, you are inside, so you want to do imaging, so you use different techniques to be able to have a slice of your body. But what you can do with MRIs, for example, use the same principle of what we do with NMR, and we could focus on your stomach and see what is in your stomach, for example. That's something we can do with MRI. So in practice, that's a simulation, but that's a signal we get out. We see the things that rotate, and okay, so we have two together which rotates in that thing, and then they get back to where, when you stop sending radio frequencies, they get back to the alignment of the magnet. So that's what you see. They rotate, and you start to not see them anymore. You do a Fourier transform, and then you see, oh, I see the speed of that particle and that particle. And using this information, we can have the information in the molecule. So it's used for a lot of things. You can study how birds are orienting themselves. You can study a lot of things. The problem is when you simulate this data, you start to have huge matrices, and the way we deal with things, with the way we do it, or when we say spins, we'll say atoms. So we can deal with only 10 atoms at a time, with this kind of simulation. We have ways to say if they are not connected together, we can calculate separately, or things like that. In practice, with good computers, we can reach 15, and after that, it becomes impossible to do, unless you have tricks to make it easier. Another problem is that academics are pretty bad at making software, usually. So it lacks documentation, it lacks all kinds of things, and so we had software that we wanted to convert between one to another, because they use different data formats. That's one of the coaching program I made, is to parse these really weird text files that dates back from the Fortran times. So yeah, pretty interesting thing, and parse it and generate an XML for another kind of software. So you start with the effects, and as you can see here, yeah, I'm loading, it's a bit small, but I'm loading the data from the old Fortran-like text file and generating the XML from that. Then I play, then I say, oh, this was my first, Kotlin, one of my first Kotlin program is, oh, let's solve this equation that I had before, and let's try to see if I can do that in Kotlin. So I made that, and it works, it's pretty bad, it's sorry, I looked at the code today to find that part, and yeah, it's pretty shameful. But yeah, and that's how I learned, oh, Kotlin is actually pretty easy because I never used Java before, I decided, oh, I'm going to learn that, because I had some libraries I will talk about later that I wanted to use, and turns out that it's pretty efficient, and I was able to write that in a few days, so that's pretty cool. And that's the output of the thing. It's what you get with a small peptide, so it's amino acids, five amino acids, and that's a spectrum of a small peptide. And that's what's to show to the people that made the software, hey, we can do that by converting the data from this old software. So that's what the real data looked like, that's what simulation in the new software looked like, and that's what the simulation in the old software looked like. And if you superimpose the one in the two software, which is what we wanted to do, because a new software does not handle as many features as the old one. So we wanted to be sure that, so we had to convert certain things, to be sure, and obviously it worked. Another part of my job is to manage Naprolert, which is a database of publications and their content about natural products. It's about what do people find in terms of molecules from plants? What do people use traditionally in terms of plants? We call that esnobotany, and it's about going to a country, even in US, knowing what people are using in terms of plants, why do they use them, what kind of disease they try to treat with that, what did researcher and labs found with that plant, or inside that plant, did they find? So it's about gathering all these data. And I irritated from this database, it was made in .NET with Microsoft SQL, it was quite an horrible piece of code, so it rewrote it completely in Django, and at the time the data from it was made by humans, so it's people that went through 200,000 academic papers and annotated all the information from the papers in the database. And the problem, I don't know if you're aware, is that in academia, one of the problem we have is that the rate of publication increased exponentially. So we have more and more publications, more and more noise with the publications, and that's a big issue for humans, because we don't have exponential increase in salary lines. So that's the kind of information we have in a publication, okay? So it's pretty hard for a computer to deal with that, everybody wants to write it the way they like, so nobody writes it the same way. It's a bit like health records in hospitals, it's a bit of a mess. It's still English compared to health records, it's a bit easier. And one of the projects we have is to create an ontology so we can use automated name entity recognition and things like that, to be able to transform that text into a graph like that. So it's exactly the same information we have in that graph and in the text here. It's made by hand, but the objective we have is to make it automatically. So I'm playing with all the new fancy tools like Word2Vec, which is a way to translate words in a vectorial space so you can do computation with words. So you can find things such as King is two men, what Queen is two, and have the system tell you woman, okay? And in practice, what is really bad is that humans have huge biases and if you feed them with stuff from the internet, it ends up like the Microsoft robot and you find really ugly things. So you also have to be pretty careful with that. So I played with that and used alpha million abstracts from publications and same thing, that's using tornado effects with a web view components. So I can display all the words are related to each other and that way when we build our ontology, I will talk a bit more about that, but it's a bit of fancy sensory, fancy vocabulary system where you link words together. I can know, oh, people in this publication, they use that word to mean that and they use it in exactly the same way as that word. So I can put them together or not. So to deal with all these abstracts, one of the main sources is PubMed, where we don't have to pay for it, we don't have to sign NDAs or whatever, so it's a bit easier. The only thing is that it's a lot of XML files with a pretty weird format, 260 gigs of XML, it changed all the time and as I was developing the method, I wanted to run that all the time. I mean, adding a new field that I want to extract from the XML so I don't want to wait two days for that thing to get out. So I played with it, I tried using SACs, I tried to use SACs on and DOM and all that kind of things and the problem is that it was using a lot of memory, it was pretty slow. And in the end, I decided to use SACs, which is a streaming application, whatever, engine of Java and I was able on a machine without SSDs to treat these 260 gigabytes in 400 seconds. And I was pretty impressed by that and it works really well. But I did not stop there because I said, oh, the SAC syntax is horrible. So John was talking about that when you have a lot of code that repeats itself. So you have this example of code here, it was an example about books and pages from books or something like that. And you have to write that for the list of books and then you have to write exactly the same kind of function with readers next, events next and all of that, but this time for the book itself and the chapter from the book. So you have to repeat that code all the time. And he said, oh, that's a perfect use for DSL right here. So I made this little piece of DSL that acts directly on the stream reader, which is a thing you get from SACs. And in lesson 100 lines, and most of it is commands, I was able to have a syntax like that. It's not completely finished, I'm pretty sure I could do a better syntax than that. So it's a DSL but not a flat DSL like what we can see with HTML because I'm using still a lot of logic in it. So it's a kind of hybrid between DSLs and fancy functions. And that way it's exactly what you do with Xpath. If you have played with XML, you have Xpaths that you can use on DOM and things like that. The problem is that it's really slow, you have to go over all the file, you have to store it in memory to be sure that's what you're looking for. And in that case, it's purely streaming. So the application was using just a few hundred megabytes for the 260 gigabytes. And I was able to use coroutines, so it was treating 12 files at the same time on the machine. And the way it works here is you look for the PubMed article, I create a PubMed tactical data class. Then I look for the medline citation objects, I look for the PMID elements in that object, element XML tag in it. And once I reach the PMID, I store it inside the PubMed article data class. Once I reach the article one, I select only the journal and abstract ones, et cetera. So instead of having pages and pages of code and repeated code, I'm able to just do that, replace Xpath. And in the end, it's using less codes than what I had with Xpath. Okay, another machine we have is a mass spectrometer. So that's a fancy balance. The only thing is that it's a highly precise balance. We are at 0.1 part per million precision. So it means that if I have a fly on my shoulder and the fly just lose a wing, that's a kind of difference in weight we're going to detect. It's just that we are doing that for molecules. So we are able to detect exactly how many carbons, how many hydrogen, how many oxygens we have in a molecule. And we also have the inside of the machine, we send the things into the source, you ionize things, you separate the ions, depending on their mass. And we have a collision cell. So we send nitrogen on that. We break the molecules and we can weigh the fragments of them. So we can, it's like knowing your weight and breaking your arms and legs and weighing the arms and legs. And oh, you're missing a leg. Okay, and in our case, we use that to identify what is in the extract we get from bacteria. So we get information about what kind of molecule is in the extract. We get gigabytes of data for one sample just from that. We have biocellata. So that's a plate here. In yellow is bacteria you want to kill. And in the way you see the white stickers is where we applied antibiotic. And you see why you see a big circle around. It means that this is something that kills bacteria. And you see that that thing here doesn't kill anything. So by measuring, so we do it in a morphincy way. I showed you 96 well plates here. So instead of doing it in big plates like that, we do it in tiny wells. And we have, we use luminescence kind of bacterias. So instead of looking by high, we put that in a kind of microscope, which is sensitive, which can detect single photons. And that way we can see, oh, the bacteria is growing because it's luminescent or not. So using this biological data and some spectrometry data, we can correlate and say, oh, it's likely this molecule which is responsible for this effect. So instead of what we usually do separating all the molecules and spending weeks in the lab, you can say, oh, it's likely that one or likely that one. So that's what it looks like when you look in mass spectrometer. And you have to think that at each point of this curve, we have a full mass spectrum. So that's here, it's the total mass when it detected things. And each of these points, we have a full spectrum with thousands and thousands of different masses that they detected. That's why the phyla are so big. So we have to treat them, filter them, process them. And then we can network them so we can say, oh, this extract is really, this bacteria looks really like the other one we had two years ago. So we can say, oh, these two are likely together. So we can do genomic analysis and find out they are exactly the same or really close. We can do the same thing with molecules. You can say, oh, when I break this molecule, I always have this harm fragment. When I break that one, I also have an arm with exactly the same weight. And that way we can put them together in a network and say, oh, these two molecules are likely the same one. So you don't do the same weight, but they lose the same parts. And the parts they lose do the same weight. So we can put them together and say, they belong to each other. And if we know one of the molecule in the network, we can infer that all the ones in the same network look similar to that one. So using all of that, that's another software made with Kotlin, which allows me to correlate all the MS data and the biological data. So I can navigate in the gigabytes of data really quickly and say, oh, show me in all those files, show me with that specific mass, and show me all the extracts that contain that compound, and show me all the bio activities, okay? So it allows me to navigate really quickly and know, oh, if I want to find this compound, I can just look into that, type the weight of the compound and find it. If I want to find anything which is really active on this bio essay, I can just select the bio essay and find what is in it. So I was talking about ontologies. John was talking about RSS, and it's using a format which is called RDF for resource document format. So that's the W3C specification. And we use that for creating the ontologies and being able to make graphs of information and connecting this graph together and doing inference on those graphs. So the way to work with this kind of graph and you can describe everything as a triple, which means there is a subject, there is a predicate, there is an object. And the predicate is what is going to link the subject and the object together. So if I say that Kotlin is a programming language that Kotlin is created by JetBrains and Kotlin is compatible with Java, that's the way you can describe things together and you can make a graph out of that, okay? We will have the node Kotlin and we will have JetBrains going into it, programming language and Java. Then you can do inference with all of that. So you can say Flipper is a dolphin, dolphin is a mammal, so Flipper is a mammal. There are a lot of rules for inference, really complex things involving first order logic, involving things that computer cannot solve, so you have to be pretty careful on what you do with inference engines. And I'm not going to go into that. So you can do fancy examples like that. That's what the rules look like. That's a demo rule, but that's what they look like. So we want to do ontologies. So for example, why do we use ontology? So that's something that Google, for example, use a lot for the search engine. When you type something in a search engine and they want to know what, when they use that word, it can be 10 different things. But that person use that word with that other word. So it means that they are talking about a specific field. So they can restrict things and take a graph and restrict things that are in that specific domain. So they can know which domain you're talking about. And they can also use what your search history is and infer from that the kind of information. So they used to use RDF because they bought the companies that made a huge database using RDF, but now we don't know exactly what they use. It's likely they still use it, but it's not certain. So the advantage of doing research with ontologies, and that's what Google does when we type a query in it, but we don't see it. If I do a query and I want to find anything that happens in the brain, some people will do academic publication and they will say, oh, in the brain, this thing happened. The problem is that it's not directly brain. And nowhere in the paper, they may say brain. It's just that people especially in the field, they know that this is a part of the brain, okay? So if you want to do a query in a traditional way, we will have to go through all the different parts of the brain and query all of that, merge the queries and go with that. If we use an ontology-based system, we just have to search for brain and it's going to search also for other parts of the brain. And if I decide tomorrow to, I discover tomorrow a new part of the brain, I can just add to the ontology, oh, this new part is part of the brain and now all the publication associated with that can be linked to the brain. Without having to go back to all the publications and re-annotate them. So we just change the domain description instead of changing all the data. So that's an example you can do. You can play with that on Wikidata. So that's a free system based on the Wikipedia data. So the query, it's a stupid query but it's just to show the power of that thing. I'm looking for anything. So when I have a question mark like that, anything which is a drug used for tuberculosis, I want another illness treated by that drug. I want a city which has a population of that many people. So these are the predicates. So if you put your mouth on them, they are going to tell you, oh, that's population of city X. I want that other disease to be named after someone. I want that someone to be born in that city with that population. And I want to have the molecular weight of that drug. And I want all the cities with a population of less than 500 people. So if you have to do this kind of query by yourself, it's going to take an awful amount of time. When you want to do it with a graph-based database and using RDF and ontology, it takes 14 milliseconds. And that's a server used by a lot of people. So you have a triple store, it's called, that stores the triples, and then you query with exactly that language. That's a standard language called Sparkle. So it's a query language for semantic web. So you can take any, you have five or six major triple stores, some free, some not. So you just load the full RDF data, freaky data on it, it's pretty big. And then you have a query engine and you just call the API or have Java functions to call that. And I'm going to show an example of that. Oh yeah, it's, I'm not sure you can load the full wiki data set on your laptop or a small computer, but that's absolutely, absolutely doable. So, and that's a library that moved me to Kotlin. So I wanted to use that library because it's one of the most useful and efficient one to deal with RDF data. I was mostly using Python, the libraries for RDF were not really good. So, and I did not want to do Java at all because it's something I could not write. So I started with Scala, spent two weeks looking into Scala books and they are not for me. It's really cool. I found that really nice and really interesting to learn, but to use it and start writing programs, I was always stopped and have to go back into the book. So it was pretty hard for me to put my head around everything in Scala. So the week after I decided, oh, let's look at another Java compatible language. And so you look at Kotlin and the same day, I was able to write code for code, but I, and it was really easy for me to write code. It was pretty close to what I was doing this space on. So it's, yeah, it was easy for me to do that. But then I started to find out about Kotlin specificities and say, ah, I can probably make that easier. So I decided to make a DSL to deal with, I did not show the original code, but to make a DSL to be able to deal with RDF data. So, and I could use the builder template, so I don't have to build anymore, but I did not put that here. So now instead of having all four rest of Java functions, I can just declare that my RDF has this specific namespace that they have different subjects and each of these subjects have a name and our person. That's exactly how you describe the data in RDF. If you run that, you can directly export the RDF or do Sparkle queries immediately on that. And that's also something that I added is being able to use all the collection functions of Kotlin on the results from the queries. So you can run a Sparkle query here and just get the data out from the Sparkle query and use all the Kotlin collection functions. Just coming from Java, I was not able to do that. So I had to use iterators and all of that. So that's how I made the iterator. So that's something I didn't learn yet how to work with collection. So that's something I did now in the XML project. I'm doing it a bit differently. So I'm transforming the objects directly into iterable and sequences, which I could have done here too. But at that time, I didn't know. So I just needed maps, so I just coded the map function. Traditional queries, and they try to make the life easier for people. So traditional queries with RDF 4G look like that. And just for the fun, I decided to, same thing, make a DSL for that. And that's where you can write the queries that way. And the big advantage I have when doing that is that now I can put functions in there. I can iterate other things and I can have more complex queries than what I had. And same thing, it made use of infix functions we are talking about, John was talking about. So whatever is coming before, I have the infix function offset. It's applying the offset to that object. So I will have some work to do on that. Better infix calls, a lot of things using the markers. So I can still do things like query, have an expression and call query again. It will not stop me from doing that. It will complain, the compiler will complain about that, but it will not stop me directly from doing that. So I need to stop that and it will offer better code completion, I think like. And using all of that, I made another tornado effects based software where I combine all of that. So that's just an old screenshot for me. But I can query all my XML based database, Sparker, standard Postgres database, that they have on some servers. I can query everything from one software. And why I made that is because I have some users that deal with the database but don't know anything about software programming, don't know anything about anything. So they just want to get in a software, load a query, they can tweak SQL queries, they learn with time mode to tweak them. They don't want to go beyond that. So that's why I made this software so they can go in, take the queries, change the name of the plant, for example, in the query and just do that. And if they want to play and say, oh, I just want the plants that have been studied more than 500 times, they can add that little thing in it. So it makes my life and their life easier. So thank you, and I will take questions. Okay, I have a quick question about one of the comparison charts that you had with, oh, what was it? It was, you had like the Kotlin rendered simulation versus like the regular. I was curious to know what program was the Perch simulation done with? So the Perch simulation is the name of the software, and that's a software which is not developed anymore, and that's the one I made the converter for. Oh, okay, okay. So it's a software that we still use, but the company is dead, so they don't do it anymore, and that's why we want to port all the data we have made with that software to the new one. Oh, I see, okay. And then you kind of mentioned previously that you've used like Python in the past, like, you know, write out programs. Maybe I missed this somewhere in the talk, but I was interested in hearing why you wished to like kind of like move on from Python, because I've talked to a lot of data scientists, and it's been a tough transition. Yes, so I still use Python for everything which is visualization, for example, because there is nothing in the Java world that reach what we have. With D3 you can do things, but you're going to spend days to make one visualization, whereas with Python and Pandas and things like that, it's really easy to make visualization. In terms of speed, nothing beats what I do with Kotlin compared to Python. It's much, much faster. And I can use coroutines and parallel things really easily, and that made my life much easier than working with Python and trying to do threads or processes in Python, synchronize them and all of that. Oh yeah, awesome stuff. Also I love that you used tornado FX, huge fan. Any other questions? Nope. I think it's no, good. Yeah, sweet talk.