 Thank you so much for that wonderful introduction and look I can assure you I'm no unique on anyone can do what I'm doing. In fact I think I'm looking at a lot of people here in the audience that can do exactly that and will be doing exactly that in a couple of months time. So with that let's kick off. Live Science at Scale. I want to tell you three stories. The first one is around the organization that I work for CSRO in Australia. The second one is around the research that we do, the disease genes that we're finding and how we're doing this using Apache Spark. The third story is around once we identify the disease genes can we correct them and doing that with a serverless architecture. So with that let's jump straight into the first one. So CSRO is Australia's government research agency and we're in the top 1% of global research agencies. And CSRO really passionate about translating research into products that people can use in their everyday lives. Probably one of the most famous products that we developed is modern wifi that is now used in billions of devices all over the world and is contributing to our healthy research budget of one billion dollars annually. But we also developed a vaccine for the handra virus which is a virus that is three times more deadly than Ebola. But on a lighter note we also developed the total well being diet which is a book filled with healthy or delicious recipes and it's actually rivaling in the top selling list to Harry Potter and the Da Vinci Code. So I think it's a fairly nice balance between stuff that people use and stuff that people enjoy. So on that note I'm working for the eHealth Research Center which is quite a unique digital health agency in the world by covering the full spectrum from cells to basic research to developing technology that can be used in the clinical practice today all the way up to measuring the impact these technologies that we've developed had on improving people's lives. So therefore the vision that we have at the center is to improve healthcare research through digital technology and services. So therefore our wifi equivalent is the CardiHap which is the first clinically accredited mobile app that helps people through the rehabilitation after they had a heart attack. And typically you would think that having a heart attack is a life changing experience and after that you're rethinking your lifestyle choices while it turns out it's not quite as good as in the center as you would think. So therefore having this app which makes it more convenient and gamifies the whole approach of going through rehabilitation has increased the uptake by 30% and the completion rate by 70% which is quite staggering so this little app already has saved lives. So therefore jumping straight into my research area which is finding disease genes. So as you might know the genome holds the blue brain for every cell in our body. So it affects the way that we look, the disease risk that we have and even our behavior. So usually I do this little exercise where there's a particular gene that causes the last digit of your thumb to either be straight what I have or go all the way back. So I have a normal oren thumb, what about you, any, oh yeah I can see some really impressive specimens in the audience. Similarly with coriander, so there's a gene that makes it out as the way that you perceive the taste of coriander and there are usually one in six coriander haters in the audience. I think that's a little bit reduced here in India but can I see any show of hands who hates coriander, it's not your fault it's your genome. So with all of that, oh sorry of course there's a more sinister side to it in that the genome also holds your future disease risks so for example cystic fibrosis says one mutation in the three billion letters that we have and it causes this devastating lung disease. So with this it's no wonder that it's actually used more and more in the clinical practice. In fact by 2025 50% of the world's population will have been sequenced at least that's estimated by Frost and Salomon but that means that genomics will produce more data than the typical big data disciplines. In fact it's producing you know more than YouTube, Astronomy, Twitter combined and that will amount to 20 exabytes of new data generated each year which I think is quite exciting actually. So and the reason that I would know that you know analyzing that kind of data is actually quite challenging is because we are part of the project MIME which is an international consortium that looks at the origin of a motor neuron disease called ALS but you might be familiar because Stephen Hawkins suffered from it or from the ice bucket challenge. So therefore this consortium with all that publicity was one of the first ones if not the only one that has the power to generate large volumes of genomic data. In fact they will generate 22,000 whole genome data sets in order to find out what is the origin you know what is the disease gene that causes ALS and ultimately what could be a treatment for it. So therefore the process is that all these patients and healthy problems controls will spit in the tube or have the blood taken and then from there the genome will be unlocked and then together this large cohort of 22,000 individuals will help identify the cause and then ultimately the treatment. So how to actually find disease genes? Well as I was saying we need to accumulate a lot of data to compare individuals. So therefore each line here represents an individual. We then identify the differences between this individual and a reference genome. So this is on average between you and the person sitting next to you. There are two million differences. Some of them are very good in that they define who you are. Others might be less good in that in some individual might lead to the diseases. So therefore each box here represents a difference between individuals. And then as I said cases which are the ones that have ALS versus controls that are the healthy individuals and then we just spot the difference. In this case it's this these lines lined up. But reality is that complex diseases are not as easy as that. In fact is that it's not one location typically contributing to the disease but it's a set of locations. So it might be some drivers in there and then some modulating factors your genetic background. For example in ALS usually the time from diagnosis to death is three years but some people manage to hang on longer for example Stephen Hawkins and manage to delay the progression for 40 years. So there must be something in the genome that is protective. So identifying this will be part of our mission. So therefore what I'm saying here is that we need to build models over this whole feature set of the three billion letters in the genome and we not want to identify the single feature that contributed to it but the set of feature that jointly contributes to it. And therefore there needs to be some machine learning involved in that and particularly for us it was random forest. So but doing a machine learning task on this amount of data is quite challenging. So just to put it back in our head we have 22,000 individuals and we have 80 million features. So the two million differences on average we have 22,000 individuals amounts up to 80 million features. So therefore our matrix that we compute or we do the machine learning over is 22,000 times 80 million which is 1.7 trillion data points and this is by no means an easy and easy feat and again our task was to identify the features so the columns that correspond to the truth label our disease status. So at this stage because I know that you're not a biological audience at this stage I would like to take the time for us to think about what other use cases there could be that will experience that kind of data maybe not today but going forward. So for example you might want to predict the churn rate or the occurrence of failure in an industrial plant or even fraud or attack detection. So instead of 80 million genomic variants we might have a time series data or concatenated data from multiple events or sensor data like the IoT community here will probably attest to that the amount of automatically collected data points is easily soon going to millions of features or it might be log files. So therefore the task here then rather than detecting disease genes will be to find predictive markers. So for example in a plant you want to predict in two weeks time you know the failure rate and you want to identify which sensors in an industrial plant can forecast this catastrophic event. So therefore what generally do we need to do in order to analyze this kind of white data be that genomic data or be that a data set that you might have to deal with going forward and bear with me while I tell you how I think about sort of this ecosystem. We're all familiar with the desktop compute which really is geared towards small data the convenience of running your analysis then and there but of course it's limited to the amount of compute that you have available. Typically you have one node and there are a couple of CPUs on that node. Now the next step up in my mind from that is high performance compute which is basically a set of these nodes string together and you compute things in parallel on there. So therefore the use case here is that it's compute intensive tasks where each individual calculation can be done independently on the rest and if you have to share information then that gets a bit complicated but you can do it by writing this spoke code like open MPI for example. But the problem is that this sharing of information between nodes is quite cumbersome and it's not automated. Therefore it's not catering for data intensive tasks what we have here. And the data intensive task applications the ideal use case for that or the method for that in my mind is Hadoop Spark because the way I think about it is it dissolves the boundaries between those nodes by having this standardized way of transacting between nodes. So therefore we can use all the CPUs on your Spark Hadoop cluster rather than being siloed into the different nodes if that makes sense. So therefore when we developed our algorithm we used Hadoop Spark for doing so. So therefore the tool that we developed is called variant Spark. As I said it's a random forest approach and we benchmarked it against the machine learning technologies that are out there that typically use or are used for random forest. One is a Spark, we use a Spark as well which is Spark ML. The other ones are our implementation C++ implementation and I think H2O is the C implementation. So therefore what I'm plotting here is the accuracy so how well the tools did on the dataset that we have against the speed. And as you can see variant Spark in this particular example is exceeding both in accuracy and speed of the other technologies. Speed that is probably to be accepted because I expected because you know we designed variant Spark exactly for that application case but accuracy was actually quite interesting because underlying all of this is the same sort of algorithm random forest therefore you wouldn't expect it to be an actual difference in the accuracy. But the interesting thing here is that all the other technologies were not able to cater for the full dataset therefore we had to subset the dataset and then basically on each of those subsets compare the technologies and what I'm plotting here is the best subset that a tool was able to run successfully on. Therefore Spark ML was only able to use 80% of the dataset and the other tools were even further so H2O for example I think was only 50% of the dataset. And that clearly shows you that using the full dataset in order to make your decision is a good approach rather than doing a feature selection beforehand and then going in because when you do that the typical way of you know doing feature selection first and then building your beautiful complex model on that is that you subset the data on features that are individually predictive but there might not be the set that is actually the most predictive and this set might individually have not strong association with the truth label but jointly will make the difference. Therefore going in completely unbiased picking from the whole dataset picking the ones that together predicts in our case the disease was the best approach to going forward. So this one is just to quickly show you how variance Spark is scaling with the number of samples so there's the traditional way of thinking about big data is you have more and more samples which is basically this dimension but also it's scaling well equally linear with the number of rows that we're adding to the dataset. Good so with that variance Spark is already used as I said by Project Mime and by a couple of other universities in Australia most noteworthy Macquarie University but is also picked up by some commercial partners Databricks for example and partnered with us to generate a notebook and I'm going to show you that in a minute. But again let's take back a step and think about the cloud application or the typical workflow in a data science application. So you start with a business case in our case that was predicting disease genes in your case it might be something else. You then curate the data in order to you know make it computable and arguably this is the most challenging bit of the whole thing because we know data is noisy it's missing you have to consolidate it from different silos so this in itself is already you know lack magic some people would say but there are certain tools that will help you do that and certain practices and skill sets that you can learn. So once we have a clean data set we'll build the actual technology on to predict something for example. So this I call the minimal viable product and here we need to scope the technology of what kind of language we want to use Python, CER, whatnot and develop the prototype and then read to write because the first thing that we put together is probably not going to be the best approach. Once we have that minimal viable product in order for it to be used for the business case you probably have to put it to a stage that is actually production ready and for that you need to provide an endpoint and you need to test at scale. So on premise this going through this is quite easy other than the challenges that we discussed but technologically it's quite easy. The only problem with on premise is that it's quite expensive you have to maintain you have to put money into maintaining it rather than computing it and it's potentially not scalable therefore having a cloud-based solution might be in the majority of cases the preferred way to do it but it's quite challenging at this stage still to put something on the cloud that covers this full spectrum from doing experimental work you know in order to get the minimal viable product tear down solutions come up with new ones and then go to the next stage of having a stable endpoint that is easy to maintain. Data breaks for example is able in my mind to cover the first two boxes of curating the data cleaning it and building the minimal viable product it's probably not a good idea for the endpoint but you know being grateful here let's start with data breaks is the first instance and see how we go. So specifically variant Spark is set up on data breaks and if you're not familiar with data breaks basically what it is it's you can spin up a Spark Hadoop cluster from a data breaks notebook and you can put in the code as you would with a Julia notebook. So how many of you are familiar with notebooks? So this is exactly the same thing where you can have code blocks annotation blocks and put in some graphics in there. The other nice thing about data breaks is that it has Amazon and Microsoft Azure as the endpoint so depending on which account you have you can use either or both. So obviously we wanted to put something out there that people can use and can play with but putting genomic data in the cloud is not a good idea. Therefore we came up with this synthetic data set and we wanted to make it a bit of fun. So therefore it's the hipster index and we score people whether they're hipster or not which is a truth label and then from there we predict the genes that make you a hipster or a non-hipster and looking at the audience there are some traits of a hipster, textured, beautiful hair, coffee consumption. So if you're interested in this and play around with this really fun data set, I encourage you to go to our data breaks notebook and download it. In fact this is what we're going to do on Sunday in the workshop. With data breaks being, you know, nice and easy for building the minimal viable product but not so nice and easy to providing the end point. We were thinking of can we do this whole thing without data breaks? Can we set up something directly on AWS? And let's walk through the steps that are actually involved in order to do that. So we first need to put variants back in a Docker container. From that we then need to have something to provision this elastic Kubernetes service on AWS so to have all those master nodes and then from that we need to spawn the worker nodes in that cluster and this beautiful infrastructure we need to connect from the outside in order to monitor it which is the connecting to the elastic Kubernetes service for monitoring and then also in order to have this nice data science approach to it we want to connect a Jupyter notebook instance to all of this in order to trigger runs and collect the information back. Sounds relatively trivial at least that's what I thought. Therefore I asked Lynn to look into that. Lynn Langert so you might know her she's a very famous cloud evangelist and she's really at the cutting edge so I thought this might tickle her fancy and thankfully it did. So she came up with this beast of an infrastructure which stands up exactly this complicated workflow that we just discussed for variant Spark. Now I don't expect anyone to have the skillset that Lynn has to to be able to create that so therefore we went one step further and put all of that in a convenient infrastructure as a code template. So how many of you are familiar with IAC infrastructure as a code? Okay so let me quickly explain the way that I think about it is that you have these beautiful architectures in the cloud that you might have put there manually or through the command line interface. In order to replicate that maybe in a different availability zone or to share with your friends you don't want to go through this painful process of setting it up the second time and you already know what you want. Therefore this infrastructure as code provides a template a text file a flat file that is JSON or Jammel and it basically describes everything in your data structure so the permissions the services the connections between it the S3 buckets and all of this is described in one flat file. Therefore this flat file is given to an interpreter which in this case it's a cloud formation template from AWS and then this cloud formation spins up the whole infrastructure from the flat file. So therefore what Lin managed to do is put it into that flat file so I can share now that flat file with each one of you and you just press a button put it in cloud formation and you can stand up your complex piece of a kyber needed service machine learning with variance bug connecting to us three buckets and having the whole complicated analysis done for you. So if this tickles your fancy and if you think I would like to help in you know finding disease genes or writing these cool infrastructures yourself and you think can I help and the clear answer is yes just like Lin did there are lots of little things that people can contribute in order to build this ecosystem together. So in fact if you if this slightly interests you get in contact with me right now and say yes I would like to be a volunteer. Good so with this let's jump into the last story which is around can we correct the disease genes that we identified and you might have heard of a technology that is called CRISPR which in my mind revolutionize the way that medicine will be done going forward because it enables you to edit the genome of a living cell in order to remove a disease gene for example. In fact there was a paper last year that managed to do exactly that in embryos that suffered or that would have suffered from a heart disease called hypertrophic cardiomyopathy which makes the heart muscle increase and then eventually the heart stops working. So they managed to do that in correct that disease in seven out of ten embryos which is great but that also means that in three out of ten it did not work and if this is your unborn child then three out of ten failure rate is just not good enough. Therefore we want to come in and make this process more efficient so that it works ideally the first time every time. So with us we generated or we developed the search engine for the genome at least what we think of as a search engine for the genome. The researchers can type in the gene that they want to edit and how they want to edit it and the tool ranks all the possible ways of doing that so that researchers know straight away which is their best option and their best choice in order to put their precious resources for example an embryo not a danger. So therefore there is a user interface here where you have each line represent a location where the genome can be edited and in green I'm representing the sites that are good in black the ones that are not so good and as you can see they're quite close to each other and it's hard to differentiate them unless you have a computer done over it. So why is this difficult? Well the way I think about it it's like finding a grain of sand on a beach. It needs to have the right properties or the right color the right size and the right shape for the editing mechanism to interact with it. But once you have all your candidates on the sand a bucket of sand you then need to ensure that this sandporn is actually or the sand grain is actually unique on the beach because the equivalent in the genome will be that you want to make sure that it's editing this particular gene and not another healthy gene. Therefore what you need to do is you need to make sure that the grain of sand and you need to compare this grain of sand with all the other grains of sand on the beach and this makes it a quite complicated and compute intensive task. So therefore it doesn't fall into the typical categories that we just discussed like it's not compute intensive all the time but it's certainly not data intensive so therefore those previous two solutions are not quite what we are after but thankfully there's this new technology that just came out which is called serverless. So serverless really is geared towards being agile you recruit so the way that I think about it is that you recruit CPUs free floating CPUs and you recruit as much or as many as you need when you need it instantaneously and this for us the search engine for the genome which is a web application is exactly what we needed because people might want to search one gene or they want to search the hundred thousands of genes or locations in the genome that they are. Therefore the task can be quite small or it can be enormous and you don't want to have an enormous spark cluster running all the time. Therefore serverless compute was the exact thing that we actually needed. So this is the architecture I'm not going to go into detail suffice to say that there is a web service where the user can interact and this web service is connected to an API gateway and from there all the tasks are triggered through using different services from AWS. As such it was one of the first applications serverless application that went beyond Alexa skills to really demonstrate that you can stand up this complicated infrastructure that can cater for something as sophisticated as research handle that. Therefore we received a lot of attention. Now it is actually also available on Alibaba and thanks to Sabbath and Jason who managed to do that from the serverless team that they have there. So as you can see it's looking strikingly similar and there are similar components in both things and this was something that was really important to me in being cloud agnostic, having a technology that is running on AWS as well as it might be running on Alibaba. So just a quick comparison between Alibaba and AWS that I thought you might be interested in. The database that we're using in order to collect our buckets of potential target site is stored in a SQL like serverless database. Alibaba is called TableStore, AWS is called DynamoDB and the difference is here is that TableStore is able to store slightly larger volumes in each cell which for dynamic research is actually a plus. Similarly the actual functional compute, one is called function compute, the other one is called lambda function and the difference with here is that Alibaba managed to have functions for other functions which is great for spawning our tasks and collecting the results so doing parallel serverless processing. But AWS has a workaround that we used through the SNS service which for us gave a similar result. The other thing is log service versus cloud watch. In my mind they're the same thing and cloud formation template, the infrastructure as a code thing. Alibaba has a similar tool called fun. It doesn't even have a logo so it's not very advanced yet. It does what it needs to do so GT scan is available as a fun thing. But cloud formation is for sure more mature than fun. So with this, in my mind really once you go serverless you never go back because it's so easy, it's so convenient, it's so cheap, it's so economical. Therefore innovation is not slowed down by having to think about what kind of easy two instance do I need to have, can I afford it going forward and things like that. You just write your function and let the infrastructure do Alibaba or AWS or Azure do the rest for you. It allows or it caters for burstable workloads. While you can do auto scaling in a traditional way, it is quite slow so it's not as instantaneous as serverless. So for burstable workloads serverless is the way to go. And innovation becomes easily affordable as I said. You can stand up a minimal viable product quite cheaply on using serverless architecture. And in fact you then are able, because everything is modularized, you can exchange individual components and going to show you how easy that is in a minute. Actually I'm showing it to you right now. So with all this ease of use, what is difficult about it because it's distributed infrastructure is actually optimizing the infrastructure or finding the bottlenecks. So in our case, for example, we know that there was one component of TT scan that was actually quite slow. But before we go into that, let me introduce you to something that we call hypothesis driven architecture. And if you've been to one of the Yau conferences or seen James Lewis talk about it, it's sort of in the way that he's talking about it. Where you start from your infrastructure as a code, from your JSON or YAML file that defines a specific architecture. You then evolve it, making small changes to it, like replacing a particular function, for example, with another function. You then deploy this new architecture, this updated architecture on your provider of choice and you evaluate the runtime of each component, ideally with a method that automatically detects the infrastructure that you just uploaded. And before we were doing it with Xray, which is an AWS service on AWS, it worked fairly nicely. But now we're using Epsilon, which is a startup from Israel that is specializing on detecting the infrastructure and evaluating each component in that infrastructure and really having a nice visual interface for doing that. So once you collect your measurements, you can evaluate whether that small change that you made is actually a good idea. And then the cycle iterates. So through this, you can do DevOps in my mind more securely, more easily. And therefore, we published a quite controversial blog article on DevOps.com that we titled DevOps 2.0. So this is a new way of which we think that DevOps should be done, that you have your environment, your production environment running in the same availability zone, in the same location, you deploy your new experimental infrastructure, you evaluate both against each other, and then you swap over to the new one and the cycle iterates. So with that, again, we will be doing that on the Sunday workshop. So standing up in infrastructure and evaluating it. So coming back to the use case that we had where we wanted to improve GT scan and find the bottlenecks in the system. So therefore, we recorded the runtime of all functions that we have in the system. And this is what I'm showing in the different bar posts. And as you can see, there are two offenders that stand out and really suck up all the runtime. So in that architecture that I have, there are these two lambda functions in the middle, so the orange boxes in the middle, that bring in information from a DynamoDB database, compute over it, and spit out the result in another DynamoDB database. So those codes were academic tools that we were just using in a lambda function. But being a machine learning team ourselves, we thought, well, maybe we can do it slightly better using machine learning. And this is exactly what we did. So we replaced those two functions with a new function that did the same analysis, but with machine learning this time. So it was, again, a random forest approach. But we don't need to go into detail. But what I want to show you is that the runtime reduced dramatically. And we were able to evaluate that and quantify it, the improvement that we made to our architecture. So therefore, the business case that we had is that by replacing these two lambda functions with the one lambda function that we had, we were able to reduce the runtime by 80%. And that's probably a use case or a business case that anyone can get behind them. Let's quickly walk through the rest, though. So with recapping the use cases that we had, so again, remember it was the business case. From that, we need to curate and collect the data that we need in order to act on the business case. We want to build a minimal viable product, and then we want to prepare for production. So for variant Spark, the use case is finding a disease gene. The curated data is genomic data, and we had to pre-process it using Python R and STL. The minimal viable product, it's still a variant Spark. I mean, yes, it's mature, but it's still not a production-ready environment then. So therefore, the minimal viable product, variant Spark, build on Apache using elastic method use or data breaks, or now it lends the elastic carbon ED services. Therefore, preparing for production is to make these elastic carbon ED services to offer them as an infrastructure, as a code, so people can just press a button and it spins up automatically, and testing at scale. So we will be testing it on project mind data, which is 25,000 individuals. The other thing that I showed you was GT scan. So here, the business problem was that we wanted to build a search engine for the genome. The genomic data, or the data, again, is genomic data. It's located on S3 bucket, and we want to access it through NoSQL. And the minimal viable product is GT scan. The serverless ecosystem in AWS, but it's also now available on Alibaba. Therefore, the research community can access it through an API gateway. And testing will be done in a research facility in Australia that has 10,000 mice each day coming through, getting added to it, and out the door. So therefore, where to from here? Really, what we want to do is we want to find disease genes for a range of different diseases that are out there, ideally for things that really affect the healthcare system, like stroke and heart attack. From that, we want to be able to potentially correct, or at least replicate the information in the clinical, in the laboratory setting in order to identify new ways of doing or finding drug treatments. And this is where GT scan comes in. And then all of this is still firmly in the research space. We want to go into the clinical practice and really have impact there. And this is a tool that I haven't showed you yet. But it's called GenFen Insights, the genome phenome, which is the medical data insight. And remember, I said, once you go serverless, you never go back. This one is a serverless technology. So next time when you invite me, I can showcase that to you. So the three things to remember. The datafication of everything will make all data sets grow wider. There's no doubt in my mind that in IoT where it's automatically collected information about an event, the amounts of rows that we're dealing with will grow into millions, if not billions. So therefore, while genomic might have to deal with it today, you will have to deal with it probably going tomorrow and going forward. So therefore, in my mind, this really represents a paradigm shift in machine learning. And we need to come up with new ways of dealing with this imbalance between samples and features. And GTs, sorry, Variance Spark, is one option of dealing with that or one solution capable of dealing with that. Serverless architecture can really deal with application cases that are not just Alexa skills or individual components, but they are able to provide this huge ecosystem that can cater for something as complicated as a research application. So therefore, I would highly encourage you to investigate this area. And in fact, Forbes was saying that 50% of the companies that they interviewed actually seriously thinking about moving to serverless infrastructure. So this is coming and it's predicted to be a $7 billion market going forward. So if you want to jump in now, or if you want to jump in now is probably the time for it. But the main take home message I think from my talk is that business and life sciences are not that different, right? The tools that we developed in one can be used for other areas as well. So for let's build a healthier future together with that. Thank you very much. Perfect timing. Two minutes for questions. Fantastic. Right. We have a hand there. Hi. Thank you for the session. It was very nice. So just a curious question about the two lambda functions, okay? Where when you reduced it to one, the time was less. So was it the case that the lambda functions had some interactions and they were doing something redundant? Otherwise, it looks pretty odd that two lambda functions when made one performance, there was a drastic difference. Yeah, so don't get hung up that it's two functions doing one. They were not doing redundant things. They were just doing things horribly inefficient. I am not not meaning to trash the academic community. Their task is to come up with new ideas and bring that and demonstrators, but they're not known for implementing stuff properly. So therefore, the statistical analysis that they were doing and their functions could be easily replaced with a machine learning that be trained offline and then had just to deal with the classification on the fly, which of course reduces the time drastically. Yeah, I have a question. You mentioned that serverless is the future or is the present? How did you guys solve the monitoring in production actually? Monitoring. Monitoring of your applications in production? Yes. Because serverless makes it really difficult to monitor it compared to the server architecture. That's exactly right. We face that problem actually in production. And I can wholeheartedly feel your pain. Yes. And none of the cloud providers have a good solution for that. So the one that I'm intimately familiar with is X-ray, right on AWS, which lets you to some extent label the function and then you can monitor them, whether they are down, whether they time out and what kind of resources they ingest. But it's painful. Therefore, this is what, you know, not to be too marketing here and have no stake in Epsilon whatsoever, but Epsilon was the solution, really, the savior for us in that they take care of all of this and that they, you just pointed to new architecture in the cloud. It automatically surveys the connections between the individual components and then monitors them in a dashboard. For us, the runtime was the main thing that we were after, the end-to-end runtime and where to, where most of the time will be drained in or what kind of processes will be running over and over again and we need to focus our optimization efforts. All of this Epsilon gave us. All right. Thank you, Dr. Dennis. This is very helpful. I hope people are asking for more tech stuff and this probably gives a glimpse of some of the tech stuff that's going on in the data science community. So thank you so much.