 Welcome to another edition of RCE. Again, this is Brock Palin. You can find us online at RCE-cast.com You can find our entire back catalogue there as well as links to Jeff's blog and our Twitter accounts and other ways you can get a Hold of us. So send us show recommendations or any other comments you could possibly have. Also, we are in the super computer season so my Jeff and myself will be there and Jeff is also on the line. Jeff, thanks a lot for your time. Hey Brock. Yeah today actually, I think we should mention this is gonna be after the fact but it's a special spooky edition of RCEcast because recording on Halloween. I mean this won't go live till after the fact but but just know that there are ghosts and Goblins floating around as we are recording. Yes, it is the holiday season actually on the way down to where I record this. I ran into the Dean of our College of Engineering here at the University of Michigan and he was dressed up as a gigantic ear of corn. Outstanding. Yes. Because when I think Michigan, I think corn. Right, right. So our guest today I'll give her an opportunity to introduce herself, but she is Boyana Norris. Boyana, why don't you go ahead and introduce yourself? Thank you. Yes. So my name is Boyana Norris. I am a faculty member at the Computer and Information Science Department at the University of Oregon. I'm very fresh here. I just joined after 13 years as a research scientist at the Math and Computer Science Division at Argonne National Lab near Chicago. Cool, and we are gonna be talking to Beanna about her project today called Oreo. Now I wonder if you could tell us what Oreo is and how you got the name because when I search for Oreo I find the obvious cookie references and I also find that Google helpfully tells me that it's a city in Spain. So what is your definition of Oreo? So I didn't know about the city in Spain, but when we were looking for a name for our new research project we wanted something that had to do with performance and so we searched a lot of languages and dictionaries online and finally a very obscure word in Greek contains Oreo as part of that word and it's the word for speed limit. I can't pronounce the whole thing and because it also sounds like the cookie we thought that would be a very good idea for a new tool name. So but it is spelled O-R-I-O. So what exactly is Oreo? What's the what's the elevator pitch? Oreo basically is a framework that allows people to do a couple of different things. So you can both define new experimental little languages to express some computation that may be, you know, in your specific domain. So you could instead of starting with cc++ you can come up with new representations of your problem. So it allows you to do that and also assuming you don't want to define a new language it also lets you generate code that's optimized for different architectures. And what we do is we try to target most of the architectures that are adventurous to high-performance computing including, you know, CPUs, multicore CPUs, GPUs and some new things we're working on also with many cores and FPGAs. They're not ready yet, but, you know, we always try to think of new code targets. So you could use it in several different ways, but typically people use it to optimize the performance for a particular architecture. Now how did you get started on this? Because that sounds like a pretty pretty wide-scale project. I mean it fits under the lumping of performance optimization and performance improvement, but you touched on a lot of different things there. How did this get started as a project? Yeah, it actually was inspired. It started out of sheer frustration. So Bill Gropp, who is currently a Thomas Siebel chair in computer science at UIUC, he, back at Argonne when he was working with me in 2006, expressed the desire to come up with a way to specify different optimizations that you could do on your own code. So suppose you implemented something in C, you write your loops and you know exactly how they could be optimized. Well, and the compiler doesn't do it, right? For some reason and he wanted to actually be able to explicitly force certain optimizations. So that was the first goal we had was to express what can be done on a piece of code in a way that lets us automate it. And so it was not very complicated. I mean it was basically here, unroll these loops or do a few other things that compilers typically should be able to do, but maybe don't. I should point out we've actually had Dr. Gropp on the show before and he's a friend of the show. So that's kind of cool. We're going full circle on this. So define your own language. What would you see as the benefit as of telling a graduate student or researcher to go define your own language, which we will then turn into something that a compiler can turn into machine code? Yeah, that's a great question. What the current languages people use mostly for their HPC projects, which is usually C, C++, Fortran, I mean they definitely allow you to express all your computations, obviously, but sometimes in order to optimize something more fully, if you knew, for example, that you're working with matrices, you could actually employ what you learned in linear algebra and perform some matrix transformations before you have loops. So that's one high level example of where you may want to give the compiler or the tool some extra information about the objects you're working with. And this is where, you know, in different domains, you may not have a matrix, but you may have some other higher level description of what it is you're doing that could potentially enable some transformations that aren't possible at the level of C or Fortran. So then, what do we define the language in? Is it like a normal parser generator or not that I'm using the right word? Does it have its own or is it something we've all seen before or something like that? Well, okay, so there's a couple of choices. So Ori already has a few, so it has a, it doesn't have to be really a domain language, so it already supports a few types of inputs, right? But if you wanted to add a new one, yes, you'd have to write your own little parser. And one of the philosophies behind the tool was to have as little baggage as possible. So it's actually all in Python, and we use the ply parser generator tool. And so it's actually fairly trivial to define your parser. You do have to know how to do that. And so the reason it's a little bit easier than some real compiler projects is exactly that we're sticking to Python and very, very simple tools for expressing your syntax and implementing the parser. So that's actually, you kind of hit on my exact upcoming question here was how does this compare to something like the Clang compiler, which, you know, prides itself and being able to, you can put all kinds of hooks all over the place and do all kinds of transformation of the, once the software has been parsed and so on, and you can directly manipulate the data structures for the emitted code and things like that. How does this compare? This is, so this, this framework is a lot simpler and a lot more open. And so you could basically put your own modules at any point. So like we mentioned, you can do your new language, for example, which if you take projects like Clang, they limit their input languages right to, say, C and C++. And so we let people specify new syntax more easily, I think, now that could be a pretty subjective opinion, but then, then if you were using a real compiler toolkit, that is not to say that you couldn't do it, you know, in one of those larger infrastructures, but we believe that enabling rapid prototyping here is the, you know, one of the main goals so that you can make a decision whether pursuing a certain type of language or a certain type of transformation is really worth your time. So get results quickly in order to decide whether what you're thinking actually makes sense. So Jeff mentioned Clang, but we also had on the show the chapel language from Cray and they are like this intermediate language compile or run a parser to generate then, you know, I think they were using C or C++ to then feed to a traditional compiler. Is there any projects like that that are like an abstract scientific specific language that's using Oreo? Well, no, so there's a big difference between the full language approaches like chapel and what we're trying to do with Oreo. In a language, if you say here's this amazing new language, and I really like chapel, by the way, what you, if you're a developer, right, you're, you now have to make a decision. Do I just rewrite, do I rewrite all my code right now in chapel? And so how are you going to decide whether that is really something you should do or not? So what Oreo lets you do is not rewrite your whole code what you do is you generate very small, you replace small pieces of your code with something that was generated from a different language potentially. So it's a different approach and it lets you hook up the resulting code to the rest of your application that you've already implemented in some legacy language, which I don't believe you can do with chapel at the moment. Okay, so going back just a little bit here, it sounds like Oreo is a tool that you invoke to generate code that you then compile with an actual compiler. But it sounds like there's a lot of intelligence and smarts in what it actually generates. And so is this static analysis of code or does it do any dynamic analysis or how does it determine, you know, like, oh, I see this code and this is how I will make it better. Give us a little insight into what kind of things you do. Yeah, actually, it's nice to bring that up because it's actually pretty, it's a pretty dumb approach at the moment. It's not very smart in that it will attempt many, many, many optimizations. So basically the way it works is it will apply, so it knows about certain things that it can optimize. So for loops, it would apply typical compiler loop transformations. And then it employs search to determine what version should be kept, what is the best performing version. And if you did that on even not a very complicated piece of code, if you have a lot of transformations that can result in really exponentially many versions, right. So part of the smarts are actually not so much in being clever about how you optimize the code because you just consider all optimizations. So the smarts are mainly in how you search that space. And so there are some numerical optimization algorithms implemented in Oreo that help you explore the space without having to test each and every code variant that we generate. Now, what level do you search for optimizations? Is it just at the block level, the loop level, or do you do cross functional analysis? How deep does it go? It is not cross functional yet, although we are thinking about this. So if you're talking about input at the level of C loops, for example, right, then really, this is portions of functions is what you're typically thinking of. However, when you start talking about domain languages that express something at a higher level and MATLAB is a good example because most people know it. So if you had your input in MATLAB, you could express some pretty complicated linear algebra computations with just a few lines of matrix operations, right? And so that can result in potentially complicated implementations and the analysis to generate them is pretty extensive, but we don't actually analyze any sort of, we don't have multiple functions as input. We have a very simple high level computation specification, but the code that results from it may be complex. So you've been talking about like optimizations for passing to a compiler, but you also talk about domain specific languages. I could think of this enabling a researcher to make a language that lets them develop something faster. Like you said, the MATLAB thing where you can in three lines do some what would, if you had to write that in C would be very complex because you'd have to manually loop and everything else. So is the focus mostly hardware performance or is the focus maybe application developer performance in terms of the amount of time they have to think? It's actually both. And maybe I'm too, I don't know if I'm too idealistic about this, but I believe you can do both at the same time. And I believe that, and it's pretty justified actually when you think of it, because if you give enough information to a tool to perform more code transformations, I mean, that's great. So if you encode things in very low level C, sequential semantics, it's very hard, for example, to paralyze, right? But if you wrote it in a way that allows us to decide how it can be paralyzed better, that would result both in you taking less time to write it and ultimately achieving much better performance on the target architecture. So I think really it's aimed at both. So you go at it from both directions. So I'm going to circle back just a second. I realized I should have asked you this question before. You have this big database of all these optimizations and whatnot. Where do you get all those from? Many of them are really directly out of what compilers already do. So we're not reinventing the wheel there. There are reasons why compilers don't apply them to a certain piece of code. And so this is just a way to remedy that. And yes, you do rewrite your computation somewhat potentially to be able to enable that. And others, we just think about and we try to figure out, okay, how could this be optimized? So some of it is just research the usual thinking process of here's a problem. What types of optimizations can you do? And others are just taking everything that compilers do today and being able to basically enforce it. Now, going back actually all the way to the very beginning, why don't compilers do this? If these are things that compilers know how to do, why don't they do them? It's not the compiler's fault. So it's basically the language's fault mainly. When you're a compiler and you have this very complex control and data flow situation, you have to be conservative in your analysis in order to ensure that you generate correct code and or your cheats in a way by not doing this kind of global analysis at all. So it focuses on relatively simple portions of the code. And it's true that you can achieve some similar effects with some compilers. You can guide them with pragmas if they support such right and achieve to an extent similar results for a small portion of the code. So it's basically it's really hard to have accurate program analysis. That of course that's a prerequisite for being able to optimize the code. So let's talk a little bit about the things that it kind of looks at and supports right now. Does it take any consideration of the type of hardware you're on and like I should be rewriting this to make it pad my data structure so that the compiler auto vectorizes it for AVX or something like that? Right. So what we well that's what you have to do if you're using a standard compiler and our philosophy is to never ever have you do that. So you should write your code in the simplest way possible and we will the tool is going to explore the optimizations. So whether you do padding or not might be one of them. And so that's kind of the goal here is to have a simple non architecture specific input and then be able to generate the optimizations from that. So then do you have like a hardware database like can I have my simple code that is the simplest way describing what I'm trying to solve. And on platform a it does say 32 by alignment but on platform B it now does the new 64 by alignment. Yeah in a way when you do the process of optimization which is also known as auto tuning you generate all these different versions you will experimentally arrive at that data. We don't actually store it right now. What you do is you do get that information we just don't reuse it and we're actually working at the moment and being smarter about that. So suppose you've tuned some kind of some piece of code in a given architecture and you then later tune a similar code on the same architecture it's possible to take advantage of the fact that you already know what potentially good optimizations are. But at the moment we don't you just do it all over from scratch. So is this kind of like a an out of band like PGO profile guided optimization? Yeah it will be profile guided right now is just basically more of you know it does not take advantage of past results but we are working actually on on enabling that. And the result the outcome of that will be not that you would be able to get better code necessarily but you would be able to get it faster. Now you mentioned earlier too about using particularly when you have your own domain specific language of some high level concept that can be paralyzed under the covers. How what kind of parameters do you accept for that? For example and again I'm going to touch on hardware again like Brock just did but let's say I have some high level operation like a matrix multiply or something like that. I don't want to paralyze it for a specific platform. Do you understand things like MPI or threads and how wide do you want the parallelization to be? How does that work? Currently we are mostly at the thread level and of course GPUs are also kind of parallelization happens there but we have started thinking about MPI it's a much more complicated issue obviously. So I would say yes for threading and for GPU style architectures not yet for MPI and the way you express it is not any different that's the goal. I don't think for MPI that can remain as simple and that's part of our struggle there is how do you describe your problem in a way that makes it MPI parallelizable. We haven't cracked that we may not be able to do it but you know we are thinking about it. All right along those lines with threads and whatnot since everything today is NUMA do you pay attention to any of those effects? Do you care about processor or memory affinity in these kinds of things? Not so not explicitly but implicitly yes because most of the for example tidying optimizations for loop-based computations end up you know being target they end up targeting the memory hierarchy and what happens is that you are going to address these issues we can't control it necessarily directly. I mean for some architectures you can but not for all but given that we're doing empirical tuning you know you will generate possibly close to the best variant by doing these optimizations that take into account all of these parameters. So if we want to use our own domain specific language we'd have to write our own parser but what language is out of the box like if I have an existing Fortran C C++ does Oreo understand that and I can just feed it in there and it's going to give me back what it thinks an optimized rewrite of that code would be. So there are actually other projects that try to do that the CHIL project that the University of Utah can accept C and I believe Fortran input and we on purpose don't want to do that with Oreo because in reality most of the implementations are not ready for optimization so we do actually want people to rewrite them and so far the most versatile way of doing it has been to have people rewrite them using syntax that is basically a subset of C and so C with restrictions we called loop language so it looks very similar to what you started with most likely but it just makes it completely clear to the to the transformation engine that certain things are possible. So yeah rewrite is always required because Oreo at the moment ignores the original code completely now we have started planning some more automated way of getting that input to Oreo so basically looking at your existing code with some of the traditional compiler tools I have a long experience with the rose compiler toolkit so that's my usual go-to tool but basically be able to parse your original code and be able to extract the parts that Oreo should be focusing on. So when you say restrictions what kind of restrictions are we talking about here anything that a scientific programmer would care about or not? Usually not I'm hoping not so far it's been possible to do pretty much everything we've looked at that comes primarily out of scientific applications so the really the time-consuming portions of many scientific applications are not very complex I mean they're not really you know in terms of code complexity. So sorry it just seems like that Oreo touches a lot of different things so silk plus the silk syntax for kind of describing that this is you know an array instead of looping over it would any type of syntax like that benefit Oreo to have it more aware of the actual structure of data and where it could paralyze it? Oh absolutely yeah and and I think that's something we want to pursue much more actively in the future right now you know we've looked at we've integrated it with the MATLAB like language that the built-to-order compiler at University of Colorado has and we've implemented a simple language for finite difference type computations on structured grids but really you know the more so something that's at the array level would also be I think helpful unless you introduce some really complex indexing and such things so the answer is definitely yes. So I'm sorry just to could you give us a concrete example of one of these restrictions like for example do you say no aliasing because I know for example that's one of the things that Fortran language uses to enable their compilers to optimize highly in things like that. Right so the conventions are no aliasing we are thinking of allowing some aliasing but you have to explicitly say what it is but at the moment no aliasing so we can assume that for the input which helps a lot obviously. Some of the index expressions when you're indexing arrays you can't really use arbitrary types of index expressions so the their limits to that and now c++ you know if you get really complicated user defined types I mean we obviously can't handle those as input so there has to be some sort of translation between what you're really doing in terms of floating point computations and what and how your data may be organized in more complex class hierarchy so there may be glue code between what we are optimizing and what the rest of the application is seeing so that's that's another side effect of that. Okay so you don't expect the restrictions to cause any type of issue for most scientific programmers and you also talked about how very early on Oreo doesn't force you to rewrite your entire application at what level do I kind of bolt my input parser code which I assume I'm not running through Oreo to code has been run through Oreo is it at the function level and how do I define those so that the one can call the other and have all the symbols and everything match up. Yeah exactly so a very good point there in some cases it's really straightforward because you know you have the same arrays going through both so the level first of all it depends on what part of your code you're replacing so if you're replacing something that calls other functions with or your generated code then that code together with everything it calls will not be executed anymore right so that's so I can't say oh no it's just function level or loop level because it could end up replacing larger portions of code so the answer to that is you basically have to decide okay here is the part of the code that should be optimized and express it in a language or you understand and then that may potentially replace a single loop or a collection of functions so that's one aspect of it and then regarding the variables if you use the same so suppose you use the loop language and your input language is C or Fortran there's a pretty straightforward mapping between your arrays and what the Oreo input is using then it's all pretty much automatic now if it's not a one-to-one match and you have to copy data structures around then at the moment that's a manual step so you actually have to do that yourself the gluing together of the generated code and your existing application although so something I said earlier about integrating some more some static analysis of the original code would help with that because we'll be able to at least automate some of that mapping by having parsed your actual legacy implementation um so so that you don't have to manually convert data structures um I'm not saying it's possible in all cases but it may be possible in a lot of cases so that's an interesting point you just brought up there and it made me think what is the most important metric that you are trying to optimize for is it is it wall clock execution time um at the moment wall clock is the focus although power is becoming more and more interesting and so we have some initial I'm not saying this is ready for prime time but there have been some initial studies where we try to do some multi-objective optimization where you may consider both time and power we just recently added a measurement of hardware counters to this tuning process so that you could actually potentially define a completely different metric based on some other based on those hardware counters that you may be interested so maybe you want to minimize some cache misses or some derived quantities so it will be possible at some point in the near future to be able to do that but right now uh yes wall clock time um and um limited power so okay now uh going off in a slightly different direction here when I when I invoke Oreo um I have a bunch of code and I feed it in there what kind of information does it give me does it say oh I took your function foo and you know I wasn't able to do anything but I took bar and boy I optimized the heck out of that and everything underneath the bar called tree has been replaced or what what do you give back to the developer um well we may have some work to do there because I don't know if it's um quite as understandable as you just described so uh yeah um so what we do is we take the input and then we just report um okay here the parameters we tried and the parameters basically refer to the what optimizations were applied and uh here's a time it took to run this version and so we have a complete list of uh those options we could also preserve all of these versions typically we don't um so but you can ask it to keep them around so you could look at um everything it attempted to do um but that again is not nearly as readable as what you say so maybe we should think about um how do we actually explain the result to the user as opposed to saying here this is the best use it which is pretty much what you get now okay so how let's let's talk about the actual workflow um that was a good example but uh so if I was sitting down and I was going to write a new application and I was planning on using Oreo what what would I do like what would be my steps and what would I how would I iterate to actually get a working scientific application um well you say new so that's a little bit different from our usual scenario but I suppose you could do the same kinds of things as you do for um existing ones you you basically have some implementation and you have perhaps profiled it um so you're aware of certain parts of the implementation that are not performing as well as you want um and then you um have to decide um you know which uh typically if it's a c code um you you or Fortran you have some set of loops that are not doing as well as you think they should be and then um you can rewrite them um and include uh your um Oreo input uh specifying the same computation as a comment in the original um in the non Oreo implementation c or Fortran for example and uh so that's part of the input you provide and it's just a comment so you can keep using your implementation completely separately from Oreo you don't have to tie yourself to it and then in addition to that you need a bit of extra input um at the moment you have to generate um in other words write um what we call a tuning spec and inside of the tuning spec you tell us a little bit more about that particular um code region uh for example what you want to try to optimize now we're automating the generation of some of the tuning spec for example the types of transformations that should be um allowed but um you can also write um and so you need to learn a little bit about what Oreo can do um to enumerate those options and then you have to actually tell us a little bit more about um the data um so you have to uh give uh some information about okay what what array sizes for example uh should we optimize for because the there are different types of optimizations um based on the size of your data um if it fits in now on cache uh you'll get a different code from something that doesn't and so we need the description of the inputs that you want to optimize over and there's no way to get that out of your existing code um and then a little bit more about how you build it and and all that which you already have in your build system um and that's about it and that's the input to Oreo and then you apply it uh basically uh the same way you do a preprocessor so it looks like a compiler command line um and you give it uh your code which has this comment um that Oreo recognizes in it and you give it a tuning spec uh which tells it what to do what transformations to do and what inputs to use and then it goes off and does something for quite a while um and ends up with the a new file uh that you can now use instead of that um original code fragment that you annotated with the Oreo command they made a glib comment right there at the end you said it goes off it does something for quite a while does does the Oreo process take a while to do the searching and the benchmarking and figuring out which one is best for this particular code and things like that well as as with everything in life they're trade-offs so um if you want if you want the optimal solution and guaranteed optimal and it's an it's a non-trivial computation right not tiny um then that may not be feasible um or you may have to pay a long wait um however you can always limit the search uh how long is it going to spend looking for the best version you can enforce any limit you like and and uh not explore it exhaustively um but then you're of course not guaranteed that it's the optimal solution um so yeah you get to decide how long it is you're willing to um search now but keep in mind that you don't do this very often I mean you do it only um you would ever do it again only if you change your um code semantics or um if your underlying system changes so you mentioned earlier that that Oreo is written in python does that also imply that Oreo is open source uh it definitely is um and we've had some um external contributors I know not not many yet I'm hoping for more um so yes and um it will actually probably move very soon to a more accessible site such as github um that's that's coming up very shortly so that it will be much easier for um developers to jump in and I've my unbiased opinion is it's pretty easy to pick up and and start adding to it we do have uh simple interfaces and some modules that are basically empty template types of module that you copy off and start doing your own thing um and it's designed to be very dynamic in that you could add new functionality um in a very modular fashion so that you don't have to go and change a bunch of existing um code in the implementation in order to add your new functionality and test it so it's not perfect but it's it's definitely a much lower bar to entry than a real compiler toolkit that's meant for production use now you actually you've done this a lot in this in our interview here you touched on what I exactly want to ask about um something that I frequently ask many developer teams is what version control do you use and why and this is actually interesting because this is the first time I've gotten in that I'm looking on your your web page today which will probably change by the time this comes out but I get a little view into the past so to speak and you're using subversion uh but you just mentioned github uh yeah well subversion is better than CVS right and this has been in use for a few years right so so uh think of subversion as being the next step um after we ditched CVS um and um the next step is something that's not um you know so so I'm not really all that picky so git um is definitely better than subversion in my opinion for a multi developer project so that's kind of the next logical step okay so since we're on the path of development here um what is kind of the future futures that you want to work on and see an Oreo um so one I already mentioned um that's not directly Oreo is I want to make it even easier for people um to use so I'm not saying adding new languages necessarily or new transformations but just for the users of what already exists and what that means is uh just automating uh more of what people have to do manually right now so definitely I want more automation in the tuning spec um creation um and also you you brought that up the output right now I think is not ideal for usability so we're working on on doing that uh another aspect of making it um much much easier to use is to actually enable uh couple it with other tools and enable us to identify which parts of your code uh should be out of tune with Oreo and so that's work in progress that that is really exciting because we've been um how we've been working with the tau TAU group here at the University of Oregon to integrate more detailed measurement and collection of performance data uh which I think ultimately will let us automate a lot more of the difficult part of the process which is figuring out uh you know exactly what part of your code you should be optimizing and how um and then uh from the research point of view I'm definitely you know I want to keep thinking of uh what new languages can we implement in this um and then if people actually get interested in using it for production we have to focus uh on uh correctness we do validation right now but I think you can prevent the need for some of it by doing more analysis during the transformations so that's another um future direction and an ongoing uh effort is always made to target um architectures um different architectures so I mentioned we do uh co-generations for GPUs for uh both with CUDA and OpenCL but we still have a lot of ground to cover in the types of optimizations um Intel Fi um we do this through OpenCL now but um again you can do a lot more and there's some initial work in FPGAs um which is really really preliminary that I think um may may turn out to be quite uh useful um in the long term too. Now one thing we kind of forgot to ask you in the beginning what what is the license of your code? It's a BSE style open source um the only requirement is to retain the copyright notice um and you can do pretty much uh anything you want with it and I've noticed that some people have done um you know extensions or use it for some other purpose um so it's completely open um and I don't believe uh there are many restrictions that the um would prevent people from doing whatever they want with it. Is that the same with the generated code as well? Uh I we have no uh we have absolutely no claim on the generated code it's it belongs to whoever is running it um so that is your code. Perfect. So we we don't get it back we don't see it it would be nice if we could get it back uh but we would only do that with you know the owner's agreement so. So what is one of the strangest um or most unique uses of Oreo you've ever seen or unexpected? Um let's see uh uh yeah there's a couple of them that that I think content for first place one was used um by someone uh who was interested in basically being able to create um millions or billions of code variants to study uh performance properties he didn't care what the code was doing it's just the ability to generate so many related code variants um uh that then they can analyze um and basically develop um machine learning approaches to study so that was interesting um had nothing to do with code optimization per se um and uh the other one is we try um I mean we are trying to optimize um some purely um non computational um portions of a code such as you know basically memory operations copying and such uh within MPI so that is uh that is not the typical um you know optimization target um but uh again there's some promising results there so maybe uh so maybe that's uh something that we will focus on more in the future instead of focusing on basically flops uh we could we could also optimize memory operations so users should be um aware that this is a very aggressive um optimizer and um if you think that um 03 for conventional compilers um is uh pretty aggressive then Oreo pursues uh more like 0300 and even though it does quite a bit of validation of the generated code you should always be aware that um some extra care needs to be taken to validate the final results of the computation once you um use your optimized code okay boyana thank you very much for your time where can people find out and download uh orio so people can google for orio orio performance and that would take them to the orio web page which will have um instructions on how to download and also direct to the people to the github um source repository where they can get the latest version or join us as developers great well thank you very much for your time this has been great well thank you so much for having me