 Okay, let's start so Who we are so I'm technician. I'm working since 10 years in a plant research Institute which is nearby about 200 kilometers from Berlin and We our focus is to to search for genes which are involved in plant pathogen interactions And we use Bali and powder mildew as a model system and here you can see Bali plants young Bali plants which are infected with powder mildew Which is a major problem in the field outside. So you have a lot of losses due to this pathogen and We we try to find genes which are involved in these interactions by reverse genetics So how we can do this So you might know that the the genome of higher organisms are very large. So for instance for Bali, it's almost twice as big as the size of the human genome and This means you have about 30,000 genes. So how to find genes which are involved in only in plant pathogen interactions so how you can do this and We use a tool which is called our now in the interference which was discovered in the 90s and People thought they wanted to make petunia plants which are light blue They wanted to make a dark blue and I thought okay. We know the DNA gene for the for the blue color gene and Let's add more of this in the plant and then we have petunia plants, which are dark blue They did this and actually what happened was exactly the reverse so they found that the flowers are white or they have only stripes of blue color and They sketch the head and repeat its experiments But it's always turned out to be the same and it took quite some years to find out that There's a mechanism behind this which is called RNAi. So how does RNAi works? So if double-stranded RNA enters the cell, which is very unusual So people think that this mechanism is used to defense against RNAi viruses So if double-stranded RNA enters the cell, it's recognized by an enzyme called DISA which can see here and The DISA cuts the double-stranded RNA into small pieces which are about 21 base pair long Which are called small interfering RNA or SI RNAs Then in another enzyme detect these small RNAs Take it and remove one of the strands which is important for later So only one of the two strands stays and builds a so-called risk complex This risk complex then is attached to the target, which is the mRNA Which is here and this Binding of the risk complex blocks the translation and you have no protein So if we go back to our blue collagen, so you have the natural blue collagen in the plant Which is the mRNA Oops, I lost my So here you have the the natural gene in the plant and the people added more of the blue collagen here So the double-stranded RNA This was recognized by the plant It was buying to the natural gene and blocked the Translation for the blue collar. So you have no color anymore So this is how it works. So with this you can efficiently silence and knock down genes and Works very well since many years so people use it in human in worms and plants in many species And we also use it so we have developed a testing system where we can use this tool It's very efficient and very cool. I think So that's why we we call it reverse genetics, so if you don't know What to do use you switch off gene by gene and look what's happened Okay, so people use it and but there are two problems with it So the first problem is that you might have mis-target of other genes and the other problem is that not Every silencing is efficient So how we can how we can Yeah, we wanted to to to make a software to to predict these problems and Also to use this tool for designing good and efficient constructs So we we wanted to make a software which can predict off-targets which can predict efficiency and Also, we wanted to test this software So we we made this software some years ago But in the last year we wanted to validate it and to make real experiments and to see whether this prediction is true And of course we want to have a nice software which runs on Windows and everybody can use it and combines all these features So what is an off-target so imagine that you have You want to make a construct so you want to knock out the gene So you make artificially you can do this in the lab You make double-stranded RNA construct and then you you have to check whether you only target your gene You want and not only another target. So as I said you have about 30,000 genes in the Bali genome So the chance that you might hit another target is quite high So as you can see here so again the double-stranded RNA Split into small RNAs and then we use a short reader liner, which is called bow tie over python and Bow tie tried to find all the matches from the small interference against a large database For instance the Bali genome or the human genome, whatever and If you designed the good construct you should only find hits of your target But now and then you might have off-target which is here which have of course much less Hits, but they might be efficient enough also to to cause silence to have an off-target effect So you might not only hit your gene which you want but also other genes because for instance in Bali It's very common that you have gene families, which have the same sequence so very similar sequence and It might be that you hit also another gene from the same family, which you don't want usually so it's important to predict this and to know it and Maybe to to change your construct design for instance So then the question rise how much similarity is required for being an off-target and we used a Theory a model which is called a molecular clock Which says that about every one million year in evolution one nucleotide is changed so if you have a common ancestor Which has the sequence like this Then after one million year you have usually one base pair exchange so here for instance I have to see C turns to G and Here and a turns into T and So on so after two million years we have another change And we use this to design our experiments So we constructed 15 constructs. We made them synthetic constructs Which have so we used one target which we know it's work So we worked since 10 years on this and we have very good efficient constructs and targets and tested and everything So we used our best candidate and made 15 constructs synthetic constructs with decreasing matching similarity So you have to the zero million years. So you have no change. So you have hundred percent identity and First we did this in smaller steps. So 98 percent 96 percent so on to 90 percent and then in larger steps And then we run this over the the short retail line of about I and you expect that the hits you get a much Decreasing as well as the identity So in the best case 100 percent you have 408 hits to from this iron is to the target and it's decrease To to zero at the end. So you have no you don't hit any more your target Okay, so we we made these constructs and we tested in our system and Unfortunately, they have no time to explain how it works. So you have to trust me Here you can see all the constructs and 100 percent means the constructs have no effect. They are under control. So the control is 100 percent here and As much as stronger they go down is more they have the effect. So zero would be the best and The red bus means that the the construct is significant so we usually do about five experiments more we cannot afford because expensive and very laborious and After five experiments we do statistics and then we see what stays significant or not So the red means it's significant the blue means has no effect not significant and you can see here It's very interesting that to the four million years to the 92 percent identity. You still have an effect So this silencing works you knocked out a gene But starting from 90 percent to 50 million years or to 30 percent you have no effect anymore. You lost it It's not working anymore Now the question rise by 90 percent you still have 59 hits of the from bow tie from the linemen So you have 59 as RNAs which makes your target So why you don't have an effect anymore? And this is raised the suspicion or also what the people know since long time that not each as RNA is efficient So you might have as RNAs which are not efficient and as you can see here Most likely studies to this 59 you have it, but they are simply not efficient enough So when we started to to make the software we used very basic words to Estimate which as RNAs are efficient because people did not know exactly what's going on and what Meets these efficient so we use stuff like a and two at the beginning and the GC content and temperature but in the last few months we wanted to Reimplement the software and to have a better prediction of the siren a and also to confirm it with our experiment so in the last couple of months many research papers appear which Guess it a lot this efficiency especially in humans and Turned out that the thermodynamic properties are very important also the structure of the RNA so things like minimum free energy is a folding free ends or target accessibility So we use this and the first very important thing is the strand selection So you have the siren is here and then they are bind to an argument and one strand is removed if you remember So how does this happen? This happened like zipper and this happens if the so the less stable strength is Easier removed and open and this trend is taken from the from the risk-complete and one strand is removed So this goes over the the minimum free energy if it's larger so it's less stable. It's removed from this side But now you might have the problem that the wrong strand is removed So it might be not complementary anymore to your mRNA you might have gotten the wrong strand So this is RNA is completely useless So this we included in our software to remove to make a strand selection and to remove all those siren a which are Not targeted anymore. So we all the calculations we did after we first run it over this strand selection to work only after with this which are really hitting the real target the second thing which is seems to be important and it's published many Articles This is the target site accessibility of the of the target. So as you can see here the RNA is This is the primary structure of the RNA which is in the cell. It's not really like this You have a lot of folding so you can see here you have You have complementary bindings and bloopings and such stuff and this is the secondary structure and this is the Sorry And this is the actually how it looks in the said this is a very small RNA and You see how much is folded and how much is going on? So it's not a flat line and you can see already here that on some places It's easier to target it then on others. So if you have for instance Folding a matching here. He is very difficult to to access for the siren a because it's already double-stranded So these parts of the loop for instance might be much easier to access and you can calculate these things this tool called RNA PL fold Which we also use over python Which calculates the local base bear probability? So it's takes the this RNA and the the target Part of the target. So we do this only on a part of the target of the mRNA because it's much easier to calculate then on these structures and It's calculated probability how much How accessible it is From this RNA to the to the target and you can do this for for for each base pair But also for an average so you can do this for the one mere the two mere and so on and so forth for the whole sRNA so 21 more and People found that it seems to be a reasonable result and also interpretive results if at least eight base pairs of this RNA can can access the target So I guess if it's less so if it's like two or two or three base pairs The energy might be not enough to stick the rest of this RNA And so people found that it's a good to take and to to look on the data if there at least eight of this I am matching to the target and This is what we did so we Took our constructs so far for simplicity I so we check for each of this mere so one mere two mere and so on to force for each context of force simplicity I Will show only the eight mere average because as I said we found the same that this looks the most Reliable if you use eight miss and you have our constructs here, which we are tested so you can see here the The position of the sequence position so you have 500 base pairs of sequence of the target and Here it's shown the accessibility so zero means these sites or this position of the target was not accessible And one means it was very accessible and You can see here So if you remember the zero million years and to the four million years had an effect and the five million years not So you can see here that for the zero million years The one the two and three and the four have quite a lot of accessible sites So this is the last construct with an effect and still has quite some accessible sites where for the five million constructs They completely disappear So this fits to our data so with this you can efficiently say You can say how efficient are the siren is you you make with this construct If you look more closer to the first Constructs to the one million year zero million years You actually can see that you have Clusters so for instance from zero to hundred Base pairs you have not so many Accessible sites from hundred to 200 you have some from 200 to 300 you also have some and from 200 to 400 You have known or not only very few and from 400 to 500 you have quite a lot So this brings us to the idea to make five other constructs Which are called which we call window constructs. We're only hand these hundred base pairs match to the target so you have From one to hundred a match which makes only target and the rest is random sequence So it's will not measure target then from hundred to hundred two hundred two hundred three hundred to four hundred and four hundred to five hundred So we created these constructs and tested in our system And the results looks quite interesting I have to say that I just stopped to look at the results before I went to the to the conference We just you're not finished yet with this But it looks already quite interesting So we have got the strongest and most significant effect this 400 and 500 base pair construct Which you also corresponds very good to the accessible sites. So you have a lot and We have no effect no significant effect is 200 to 400 where they are also no accessible sites And for 200 and 100 it's similar the only what makes us a little bit prominent month It's the one two hundred window concept But as I said, we still look on the data and it's also we have to find here threshold and to define it treasure because this What was known from the human is Does not fit to the plan Okay, then already I want to come to an end. So we created a Sophia which is called sci-fi which can be downloaded and run on at the moment on windows at the moment. I'm Making it cross-platform with Pico T. I switched the the GUI framework. You can have a custom database You can find the fish in this RNAs and you can recommend and Find miss target genes and this is just screenshots how it looks like so I Think it's very useful and I wanted to make this talk just to show that also non-programmas can do quite useful things with Python and Can also yeah, it's just it's except from the community soon. We will publish an article about it and Yeah, I hope you liked it How fast is this library how much I have no idea how long such a sequence is Can you have the database? Yes, if you have some piece of RNA and want to do something with this home, how much data is there? It's a lot of data Is this done in a batch process? Yes, it's running a batch process and both I use a database indexing to do this way fast So I'm not I don't know exactly the details, but it's do it very efficiently. So one check from from the Sophia takes about Five seconds, so it's quite fast Hello So I have two questions actually So I'm seeing you using by a python, right? Do you do use that to do your blasts? No, so we we did white blast at the beginning. Yeah, but it was a performing bay bed for short sequences So you have a lot of mistakes actually So it does not find all this iron is which are actually there So if you look on this graph on the first graph, you can see on the blue line. So this was just taken To align all this iron is this blast you will not have this line you have gaps in between It's cannot find all this iron is and that's why we switched about I Biopiten I just mentioned here because I use it for some handy tools like converting the reverse Complementing and open files faster fight and such stuff So this is not Either you're using this not at all too because you're saying in the beginning you're trying to find the off targets as well Right, that's also part of the software. Yes, it's also part. So this we do with bow tie Okay, so we split I split in python The the query sequence to 21 most or whatever lengths you like and then I take this sequence and use both I to find it in in my database, which is also made in bow tie and bow tie gives me for instance the heat which you Can see it here. So this is actually it's good This is the target you want to hit but you may type off target. So this also comes So this is another a gene and you might have here in this region. You might have off targets So this is done through bow tie Okay, and this is like just maybe I can show so this is you know if you have If you have querying and target which are the same and you split it You will find each of these substrings on that on the target here. So you have eight hits But if you have a mismatch in between so an X here, you'll find only three, you know, it's like this and For this we use bow tie And and just I was wondering whether those target site predictions They are based on the secondary structure in some way you do it. Yes, okay So I just started to to work with it We tried to to find thresholds and maybe also we go for machine learning and stuff like this But as I said, I'm not professional program. I need some time to develop already Great. Thanks You said that you are using bow tie as a short read a sampler Against probably some short reads, but I didn't get maybe I didn't Maybe I misunderstood it. Where did the short reads come from? Yes, so you have you can enter your in the software here You can paste your query sequence. This will be called a query sequence and then in Python I just split it so I split it to so here you can choose the size No, that wasn't what I meant. I mean if you if you want to do bow tie You sort of What are you doing it against against the barley genome or some shot? Yes, you can so this is a very Big advantage of sci-fi so all the tools which assist online and also for for downloading you cannot use a custom database So they're usually for humans who as RNA is usually very often used in human Everything is adapted for human, but this is Useless for us so we need to to to check against the barley genome or whatever so you can make Customized database and you can check against this And this way important because you know the the sequencing of the gym The ballet genome is ongoing and you might have next week completely different sequence than last week So we need to be very flexible here okay, so the the bow ties used against The published reference genome So sometimes you have projects where you you have Confidential data, so that's why we also don't go online at the moment and not and not the short some kind of short read libraries That you know you can do this if you like so you can download them and paste them and use it But so we wanted to be very flexible at this point because all the other surface are not very interesting