 My latest research has specifically been on trying to tackle the really, really, really difficult problem of writing a data parallel compiler that runs on the GPU and do it in a way that is very general, that's very simple, and that's very portable or in performant without having to become a hardware hacker at the low levels, right? So not having to do lots of low level stuff, but maintaining very good low level performance while still maintaining a high level of thought, a high level of programming, and doing that in a domain that was very difficult, so the results of that research which should be published this year are essentially that now I have the only really complete compiler that actually the compiler itself runs on the GPU and it is written in a style that only takes 17 lines of code to write, which is ridiculously small, but those 17 lines of code only use array operations and function composition with basically no abstraction on top of that, it doesn't use any of the other techniques that you would typically see in functional programming or regular programming, it doesn't use if statements, it doesn't use branching control flow, it doesn't use explicit looping constructs, it doesn't use any of those things and yet it's able to express things that you would typically use structural recursion on and it does so a lot faster with a lot less memory and a lot more performant with much simpler code, so it's sort of like a trifecta which doesn't happen very often in computer science where you get both simpler code that's faster, that's easier to work with, that's also more portable and more high level, so I'm pretty happy with it.