 Welcome everyone. So this talk is part of what we call FCCI Tech, which started off as a sort of internal seminar series to teach our new hires how things work within science IT. But then we expanded and realized there's no reason to not invite our other counterparts at other universities in Finland. And for that matter, why not invite everyone in the world? So yeah, it's basically become a series about how we support research, both very technically and very high level. So today our talk is about the research software engineer service. So this is sort of a summary of where it's come from and what we do and where it will go in the future. So I already said that this talk will be recorded and please ask your questions within the HackMD and someone will answer there. So on the agenda, we'll talk about the start of the group, the cases, and then where we're going in the future. So let's begin. So these days, software is behind almost anything. So certainly almost any field of science can use software for things, but also humanities and social sciences and well, whatever you can imagine. For any field, there is some sort of part of it where someone needs to do their own programming in order to accomplish their work. This is sort of a fundamental shift. So like someone said recently, I forget who, one of the other presenters, custom software is like the new math. So everyone has to have some sort of minimum basis or minimum standards in order to accomplish their work. And it's not just programming, but also many other skills that go around it. So there's the scripting, the automation, data management, open source, code reuse, reproducibility. These are all really important. And they're really different skills. So just like when you learn chemistry, there's a difference between the chemistry you learn in your classes and the practicalities of the lab work. So just the same, there's a big difference between programming that you learn in a basic programming course and all of these extra tools that go around the computational science to make it actually enjoyable and useful to the world. If you can't make good software, then you have a lot of problems. First off, there's productivity. You'll end up far less efficient than others and you fall more and more behind. Then there's a big impact problem. So if you know that your code is really terrible, then you're probably not going to want to share it. And for that matter, you probably can't even reuse it yourself in six months. If the code's acceptable, then maybe you can share it, but it's really hard for others to use it. And a good code that or a good method or a good article that needs software that was written for it and the software is not available to others, practically doesn't exist. So if others can't reproduce your work and use it on their own data with their own things, then you're going to get only a fraction of the citations you could otherwise. So this goes to the open science problem. So these days, society has realized that doesn't make sense to poor funding into research, which other people can't reuse. And the same goes for code. And all the things I said above directly impact this. And then it turns into a societal problem. So not in the simple open science research can't be reused, but when not everyone has the same chance to succeed, that really doesn't match our values as a university in society. So this came the ultra research software engineer concept. So initially I heard of the RSE idea around 2018, and it took me a while to figure out just what this thing people were talking about is. The initial idea came in December 2018, where the concept was, we need people to support our researchers better. What if we hire fresh PhDs who would like to make a transition to industry, but they think their skills aren't quite up to date? They can do the programming, but their work, I don't know all the standard tools and practices that industry would need. So we can hire these people for a few years is basically a postdoc. We train them, they train other researchers and support them, and then they can move on with their careers. In 2019, the Dean of the School of Science approved seed funding, assuming that other departments come on board, and then computer science and being physics did. And with this, we were able to hire two people full time. Our first hire was in November 2020, delayed for, well, the obvious reasons. And by January 2021, we're at full power with two people. And unlike the initial plan, we've hired some of the most experienced scientists you can imagine. So not just people that have finished a PhD and been the programming expert there, but people that have done postdocs basically as research software engineers. And their whole career has been built around being the experts in any software that you might imagine. So this is actually a much better situation than we would have imagined at the start of the idea. So with that being said, we're going to some example cases. So the two people we hired are Yarno Rantaharu and Brian Van Lin. And they are here to present the cases. And through these cases, you should figure out what it means to be a research software engineer. So that said, Yarno, please take the screen. All right. Thank you. So the first case study is an adventure in software maintenance. So basically, there are a lot of these research projects where you write something, write a piece of software for a project, publish a paper, or maybe to do an entire multi-year project. And then at some point, it gets abandoned. So there are either it's written by a postdoc, postdoc leaves, and nobody is updating the code anymore. Or it's a researcher who is a long-term staff member, but it's just out of that project for a while. Maybe that project is on hiatus for a number of reasons. For half a year, two years, it can very easily happen that the software becomes unusable. So you basically run into the problem you see in the image. You try to install it, and something is missing. Something has changed and nothing works anymore. So this is basically, this is this kind of a project, this is the kind of a problem. And it basically started with someone asking in the other crash, in the second crash, to get an older version of a library installed on Triton. But that turned out to be much more complicated than it sounds. And it basically, it became, after working for a while, looking at the problem for a while, it turned from a install libraries to install this application problem to a bring this code up to date problem. So the problem essentially is that when software gets old and when the person who knows how it works goes away, it's, it wastes research time basically. It wastes the time of the researchers because it is hard to use. And it reduces the possibilities of future work. So in half a year, in a year, in two years, it is even harder for you to figure out how you had, even if you wrote the software yourself, how did you use it and how did you manage to install it in the first place? Okay. But there are many ways around this. So one of those is to have the research software engineers first possibly bring the code up to date if it's already fallen behind. So we are experts in software development tools and in the tools that you use, original use to write software and install it. So even if the person who originally wrote this is gone, we can probably salvage the software and bring it to where you can use it easily. But secondly, we're also experts in best practices and tools that can make this a lot easier. So in fact, in the end, we went down from several tens of lines of commands that you need to write and libraries that you need to install manually into three commands where you set up an environment and then run pip to install the application. The pip in case you want to use to fight an interface or a different command for other interfaces. But in any case, using these best practices also makes the code more future-proof. So it, in fact, is easier to maintain going forward. Now, the other solution is that we do, in fact, offer a software maintenance service so we can take your software and drawing. This is showing 10 repositories. We have probably, we help maintain four or five different large software packages right now. And the good maintenance of your software is really important. It makes the difference between trying to reuse something that's hard to use, so to reinvent the wheel every time there is a new postdoc, this new student rewriting everything because you can't figure out what the previous person wrote. Difference between that and building on top of your previous work and actually so getting things to move forward much faster and much more painlessly. So I'll throw the ball to Marijn for the next example. Thank you, Jarno. Okay, let's continue with another story. An adventure in software published. It's somewhat related, but we had, I think this was during the initial survey that we did to catch interest for the RSE program. There was one case there that stood out to us and we picked up on it. This is the case of a researcher basically in, yeah, developing a new analysis method for genomic data. And the method is outperforming the competition and they managed to publish this in a very good journal. It's great to have a paper up, but now they want more, of course, like everyone should. If you spend years developing this nice new method, you want everyone who wants to, to be able to use it. You want this to become the new standard. So they came to us and say, well, can you take a look at the code base and maybe clean it up a bit or just make it so that everyone who wants to use our new method can do so with it. And so we agreed and we took a look at it. And the code is already in pretty good shape because what this code base has that I must add many code bases do not is documentation. So they had pretty good documentation of how everything was supposed to work. However, installation instructions, careful as they were, there are seven pages long. So there's loads of software packages that need to be installed in order to run this pipeline. This is sort of a barrier. Then if you want to reproduce the analysis that they did in the paper, you need to first download and pre-process the data. That's five pages of instruction. And then if you actually want to apply the analysis as they did in the paper, that's 15 pages of instruction. So it's very good that all the documentation is here. But maybe we can make it a little bit easier for people a little bit better. So we got to work and the initial situation was that we have 42 analysis scripts in a folder basically. The first thing we did was to just go through all these scripts and identify what is the core functionality, the core principles behind this method. And we've extracted these and put them into a clean R package because install and then have access to like the new algorithm in a noise general purpose API. And once we have that, we can, we rewrote the analysis pipeline as it was to reproduce the paper now in just three R scripts. Now using this R package, this becomes much easier to do. And now this analysis is published in the papers, a really nice example of how to use the R package for blown analysis. Installations instructions from this went from seven pages to two commands using a package management system. Pre-processing the data was five pages of instructions and we put that into a workflow tool called Snakemake and now it's a single command to do. Snakemake pre-processed my data. To do the analysis, it was 15 pages of carefully laid out instructions of all types of things you need to do. Again, with Snakemake, you can just do it in a single command. Most importantly, the analysis code was obviously written to analyze a specific data set, the data set that we're working with. This is how it starts. But now if you want anyone to use this, of course, the code needs to be made general purpose. So it can be applied to any data. And finally, they also asked us to take a look at some performance bottleneck or some points in there where they, where they knew, okay, this is probably taking more time than it should. And so we take a look and actually we found that this was indeed the case and we brought the runtime down that was originally multiple days to just one hour. The main point I want to make with this, other than just bragging about how good we are, is that often we get asked this question. Scientists should be able to program. It's part of their job. Why do we need specialized RSEs to help people do the job that they're paid for to do anyway? And here's a good example of this. Yes, scientists should be able to program these days. And they do. But there's a big difference between able to program and write your analysis scripts, get your analysis done in order to publish your paper, make your papers, all of that stuff. That's the sort of the situation on the left of this line. And actually, like, transforming this into like, general purpose package that can be installed with a single command and this fast that can be used by everyone on the right. There's actually quite a large skill difference there and skill difference, knowledge difference there. And in this gap, there's a whole world of tools that suddenly come at you. You need to know about R packaging, version control, unit testing, documentation systems, API design comes into play. We have workflow tools, we have package management, we have profiling. There's all of this stuff that suddenly comes into play. And my point is that, yes, scientists should be able to program, but scientists should not have to know all of this in order to achieve impact, in order to have reproducible. This is where our Z is coming. We are experts in these sort of things. This is the transition. Okay, let's go back to Jarno for a new adventure. All right, thank you. So talking about a knowledge gap. Yeah, let's talk about software platforms. So in this case study example, there is this study coming up relatively soon, this corona study, where a researcher, research group wants to collect data from behavioral data from wearable devices. So Aldo has this whole infrastructure of both magics for these devices and also for helping using these devices to collect data from people. And this is really great, but you also need software to handle the data. And well, there's a lot of overlapping issues here. So it's personal data, it's very personal data, it's kind of different level of personal data in a way, because there's a lot of really, it tells you things about your mood, about your body at the moment, moment to moment, it's really, really personal. Okay, you have to be able to get data from the devices, you need to be able to use the devices. You need to set up something to collect the data framework to collect the data. So you need to set up servers. And that means web servers, websites, but that also means a service for collecting and handling the data and passing them on. So to the researchers. So it's a whole infrastructure. And if you haven't done this before, you don't really even know where to start. Now, of course, the researcher is a specialist in everything that goes with the research, everything that goes with the research subject to the behavioral, behavioral things and how to handle the data once they get the data. But the researcher really should not need to be a specialist in all of this software and server infrastructure stuff. So this is the solution that the RSE service provides. So we can sort of on the light side, we can do consultations about personal data we can talk about the ethics side and what you can do what you should do, how you can go through the ethical approval process. We have experts in personal data who can talk with you about that. And we also we can also talk about security. So when you collect in personal data, security issues will come up. So you need to store them in a way that is well that is as secure as possible to avoid any data leaking. And well that in itself can be a somewhat complicated thing. So the infrastructure for this particular project is a there's a quick sketch really of it on the in the image here. And it already looks a bit complicated because of this mainly because of security issues. So you have multiple servers, one of them. So one of them or two of them in fact are web servers. So they are connected to the internet and therefore a bit less secure. And then there is this data cathering server that holds personal data. And that one then and it's through an encrypted connection that's mostly one way. It will talk a bit back and forth with the device or the API but really only to fetch the data. At least basically one way flow. And then again through an encrypted connection the data itself goes into Triton but in a way that tries to minimize the amount of personal data as much as possible. And the researcher really only needs access to this. Now the other much larger solution is to use the RSE service to really create the whole software infrastructure which in fact is happening in this case. So we are actually writing the software to do this workflow. And we can do this longer term projects as well, more development oriented projects. And this is especially great in this case because we can reuse software. There's many projects out there where people want to collect data and collect personal data. So a lot of this expertise and a lot of this infrastructure can be reused in other projects. So the bigger picture here is that beyond this one project there are similar projects, research teams that have similar needs in different departments and in different schools. And they can share the resource of the experts. So the same experts can work in different teams within the university and use the same kind of non-researcher software expertise to enable the research, make it happen. And well not all researchers really need to be experts in all of this infrastructure. So by sharing people between research groups and sharing the expertise we help the research happen without requiring impossible things from the researchers. Okay and back to Marijn for building a foundation. Thank you Jarno. This is a story about building foundation. So at one point we got an email from a newly appointed professor and of course a static that they were appointed and they had big plans and they were hired to be in a position between Aalto and and the HUS hospital. So they would communicate with and organize research here at Aalto and also collaborate then with the medical professionals at the hospital to like get really implement research solutions there and get research access to patient data and those sort of things. He was building a network here and he reached out to us and say what what we actually need here is also an expert in code because the work that we're going to do here is going to rely in the large part on applying advanced machine learning algorithms to this medical data. And the professor is a medical expert and is an expert in research and an expert in their field but they're not they don't have a computer science background. So they reach out to us and say come get involved in this project that we're building here and can you make sure that the software that is developed here mostly now analysis is up to top most quality because I'm starting a new research group here. I'm going to hire multiple students and PhDs and it would be great if we can jump start their work by having already some data analysis pipelines there that are written like to the highest standards and they can work off of those and they can copy those. They can look to those as examples of how do you analyze your data in a way that follows all the best practices. And of course we agree to do this so we're now involved in in this growing network. And the idea right now is that we develop a series of data analysis pipeline in such a way that students can build on top of them. And this goes beyond just writing the software so we will communicate with the students. We are we're sort of an extra supervisor for them that's sort of the supervisor for their code. So we are part of this growing research consortium as as yeah experience actors there that are our guidance for the software the analysis software that gets written to check in that every student and every PhD and every post update they always have someone to talk to that is an expert in programming and that knows their code that knows the research is going on knows the code and and just can help them solve problems. And for many students have luckily have people like that in their lab like the tech person that you go to when you have a problem if your code doesn't follow you get errors there's a tech person in the lab. It should be but it's not always always the case. And when you involve RSEs into this network so we can make sure that every student has a person like that to talk. Those were our four of our stories. We have many many more to tell because as as we've started in this January we've had already quite a lot of projects and requests coming in and they come into us in multiple different ways. So let's let's talk a bit of the progress that we've made and how our things are going right now. One way that a good way for us that we receive many projects on the question is of course a daily garage. So every day at one we have this open garage it's a zoom room or probably stay a zoom room also after a pandemic. And if you have any question you just hop into the room and you're immediately faced with experts that can help you. So RSEs will be there but also often we have the private administrators will be there our data agents are there. So we actually have a really nice collection of experts here that can handle a wide variety of questions. So for installation to data pipe and and it's getting more and more popular and this is a good way to just go and ask and get to help. But it will sometimes happen that you have like a bigger project come to the garage anyway because then we will set up a meeting a COVID meeting with you. Let's discuss this properly we will sit down with you and and we arrange meeting and when that happens you end up in our tracker. So we have a project tracker going on for the RSEs where we keep track of all the things that are going on right now. There's currently 47 longer projects in there. We've already managed to finish 24 of them so hopefully we've already achieved some impact here. And of course there are many open projects which most of them are waiting for something for a student to start or some data to become available. I think we have about nine actively currently running a project. So keep track of that. And one other thing we try to keep track of during this project is how much time we invest in each project. So we can keep a bit track of how much time goes to each department that comes to us. And you see here that so our main funding comes from computer science, MBE and physics. And they're well presented. Maybe computer science is even a little bit overrepresented. So if there are any MBE and basically physics members in the audience right now. So warm welcome, come to the garage. We'd love to spend some more time on you. But we also have like a small snippet. So we also get requests from people outside of the school. So it seems to indicate there actually is a demand for our services outside the people that actually pay us. But yeah, well, there's of course currently limited time that we can spend on them. So just for the people in the audience here that feel, hey, great, I could use an RSE. So come talk to us, right? Many of them they get referred to our network. That's a common way that people come to us to say, well, I don't have time, but why don't you also RSEs? They're paid for this. If anything come to the daily garage, of course, you can just send an email always RSE group at alter.fi ends up with us. And if you want to know more, just check the website. So I think to summarize where we are right now, I think we fit nicely into like the bigger picture of auto science we're doing right now. Allowing them to offer much more hands on support. We are the RSEs have time to really sit down with you and go through your code and actually read your code, clone your code and even even have time to work on it. I think that's a really nice thing that we want. With that said, let's let's pass it on to Richard. And let's talk a bit about the future of the RSE. Yeah. So the first thing about the future is the current funding situation. So right now our long term vision is two different forms of funding. One is basic funding, which basically comes from the schools and departments or university level, which provides a short term service for staff for free. So short term started as a few days and then practical difficulties has slowly made this larger and larger. But the basic idea is that anyone is able to do their work. Anyone's able to reach their full potential by having someone that they can consult with immediately and without bureaucracy. This is great. In the long term, we'd also like some project funding here. So basically whenever a particular group or a particular project has some more intensive needs, which we can justify funding based on the department or school funding, then the project can pay themselves directly from external grant money. So this can happen either by taking our RSE directly on staff of the project, basically using it through Holly, just like any other researcher, which in theory can be done right away, or it could be done via internal invoicing, which unfortunately doesn't work for many externally funded projects. But in practice right now, this is too difficult. So we can't really, like we have projects that come and they would like to give us money, like, okay, yes, like, please just take our money and help us do the work. Let's get it done as soon as possible. But the difficulty is too much. Either it takes too much effort to set up Holly or the grant can't do the internal invoicing. So right now we have this sort of wise idea. We just don't worry about the money. We do everything with the basic funding. And then after a year, we see what the actual needs were and where the money could have come from and then try to find a proper solution. I guess the moral of the story here is that this internal accounting and all the grant conditions and so on are really complex. And this is why I expected it to be a bit of a problem, but it's even more complex than I imagined. So what else can we do in the future? So one thing I would like to see is more long term development projects directly as part of grant work or not. So instead of these sort of short term things like a project's basically done, so let's clean it up. Why can't we work directly when a new project is coming? So perhaps these needs would be more outside of the school of science than inside. So like within the school of science, many people and groups have these sort of programming skills already. But within the whole university, there's many people that need some software written that don't have these skills right now. And for that, we could basically come in at the beginning and then develop what they need and then most importantly, not disappear and go on to a next job right away. We're permanent university staff, so we'll be here in a few months and a few years after the project to continue making small improvements and keep the project alive. So yeah, so this means that the needs aren't limited to the school of science. So we'd like to expand our funding base to the rest of Alto and then also Finland wide as part of the Finnish computing competence infrastructure, which is basically the science IT step equivalent staff in other places. So our goal is to form these kinds of groups or services in other universities and make a network of them where everyone can share their best skills. I'd like to connect better to these open science and data support kind of things because this could be where a lot of the problems are noticed when the work is not as reproducible as it should be. Or maybe on the other hand, we introduce people to the open science and reproducibility concept when they come to us for other unrelated problems. So we'd expect that within a few years we'll make some more hires, possibly some people that can focus more on the raw software development rather than open science. I mean, it really depends on where we go right now. Our staff is on the right level, but after everyone starts learning about us, then we'll certainly have more people and then work out these funding practicalities. So some of our open questions so identifying the best customers. So right now, with the amount of outreach we've done, we have no shortage of customers and work to do. But really, are we reaching the people that need us the most or are we reaching the people that are already the best at what they do and happen to be the biggest users of the science IT services anyway? This is a big question. Well, the sustainable funding model. Anyone that has ideas, please get in touch and let us know. So we can't do everything ourselves. So we need to make some sort of better communities of people that do this. So every group should have some people that are sort of the experts in best practices that can be the first line of support. And then we support these people and also we can help with the biggest and longest term projects that go beyond what people can do. And then this cross organizational work. So sometimes we've had groups that would like to fund us with their for their projects, but the money is with a collaborator at another organization, whether it's Hoos or University of Helsinki or whatever. And this becomes really difficult. And we basically have to say, well, we can't really help you because the overheads we would charge would be just too much. And it doesn't make sense. So finally, this is not just about research. So you can easily say, well, everyone can do this themselves. So people already do their research. Is this really necessary? But that assumes one correct model for the researcher. So yes, many people have learned this themselves. And especially in the School of Science can do this. But when we're making these big pushes towards diversity, we can't assume that everyone has the same background. So people come from many different fields. There's more multidisciplinary disciplinarity. And this is something that over the last years, I've seen more and more of within my job as Science IT. And that's why I think this project is so important. So with that said, we can discuss some of these questions. I guess, let's see. Let's see what comes from HackMD. I will share that quickly. So we had one question. Do we help everyone at Alto or do you need to be some group member? So right now, we focus on the School of Science. Sometimes people come to our daily garage and they're not from Science. In which case, we try to set them on the best path as much as possible. But sometimes we've seen that the issue is more than we can do in a short consultation. So we've started a conversation with them about how can we get funding? But really, these conversations are much more difficult than other schools funding us directly. So we can just go and help their staff immediately without needing to talk about money first. We develop a database for the web. So yeah, so we can do things like that. So if something was a peer software development project, like building a mobile app or something like that, maybe we're not the right people to do that. But whenever the project sort of is internally connected to the research, then we might be a good place to start with. Let's see. So yeah, this next question, it sounded like the research software engineers are going to be a permanent part of the who's Alto group. So what's the reality? Well, I think it's too early to know what the on the ground planned will be. But within science IT, we do have people that are permanently funded by another research group basically keeping the staff on retainer there. And this is a great model. So this could happen with this group. Or it could be sort of on and off consultation. And when there's a more involved project, then we will go and directly be funded to do that project for the months it takes. Okay, what is magic is answered here? All enough. Oh, let's see. We have a redness project. What do you like to unmute and answer a redness question? If the question isn't exactly computational, but about data collection. So yes, definitely. These kinds of things is perhaps, well, I was going to say perhaps more relevant, but I'd say maybe as relevant. So anything that has the intersection of computation software and data, or one of them, then at least come to our garage and ask us the initial questions. And if we're not the right place, then we'll send you somewhere else. But we can basically learn and figure out anything. Oh, this next one's a good point. So mentioning the efforts towards the Nordic RSE collaboration. So there is a, there's this group NordicRSE.org. We're a big part of the people here. And we collaborate with people in other, well, I guess not really work collaboration, but more like RSE development collaboration with these other people. And we're constantly discussing the best practices and sharing what we do and learning from others. Yeah. And what's the lower and upper limits? So I think these answers here are pretty good. So once we get funding worked out, I think there's really no upper limit. So if there's a lot of work, you can basically talk with us and reserve someone for as long as you need. We should point out that one of the benefits of working with us instead of hiring your own people. Well, I already mentioned this at one point. So the RSE does not disappear immediately after the project is done, which is really a problem with several of the people that are some of several of the projects, which we've taken over. So you hire someone for a summer or a little while, and then the contract's over. And then the thing's not quite ready yet. This is a big problem. So with us, we don't disappear right after the contract is over, and we can continue with the consultation and so on. And also, you don't have to hire someone. Like hiring a software developer is really not what most research group leaders are good at. Like how do you know who to get? How do you even supervise this person? So with us, well, you don't have to worry about that. And you don't have to worry about explaining to this external consultant that wants to come in and sort of wave their hands and say, oh, I did this. Now we're done. Good luck. If you have more, give us more money. Know like we're here permanently and we know what you actually need. If you did have to hire someone from outside, you could also come and talk with us and then have us help to supervise that person and make sure they're keeping on the right track and that it's something that will be maintainable long term. And yet if you're watching and you're somewhere else and you would like to set up one of these kind of groups, you can ask for our advice. This talk is sort of a good starting point, but a lot of it is figuring out how it fits within your own institutional culture and how to do that bootstrapping. I'd like to thank everyone that attended and asked the questions and there are no MRI for presenting. So yes, thanks for coming.