 My name is Vass Faciliatus. I've got lots of affiliations listed up there. Mostly I'm with the University of Chicago where we're a couple of hats. I do teach so I can scratch my technical niche, but mostly I work on a project called GloVus. And for those that are not familiar with GloVus, it's a service for researchers to manage the data, manage their computation. It's a service that we've been developing and operating at the University of Chicago in the office of research for in various forms for over 25 years now, but most recently in its current form, about 15 years. And we're the sort of strange group, but we're a mission-driven organization that's really out there trying to help researchers do their work better more efficiently. And do so in a sustainable manner. You know, Dan pointed out sustainability as an issue. That's something that we've sort of had as a requirement from day one. A lot of our funding does come from federal agencies, but we've also worked on a model to sort of keep things going in the absence of continued federal support. I just wanted to quickly go back down memory lane. So the last time I was here with this group was it's five years ago. It was hard to believe. At that time, we had just started helping people manage and work with protected data as part of their research data environments. We had, I had given people a whole bunch of timeline, but the thing I want to focus on here is we had about 100,000 users, which we thought was wonderful at the time. Fast forward five years. That's grown substantially. We've got over half a million right now. We have a lot of institutions that are supporting us through our sort of hybrid subscription model for sustainability. And as I say in the bottom, we're getting there. It's kind of that line keeps sort of nudging up against the sort of the crossover point, but we're close to the point where the service will be sustained by the help of many of you in this room, in fact. And you know, this is this is sort of the world today and what was our tagline back then has evolved as the scope of what we do has evolved and it now says research IT reimagined, which is very big and fuzzy and all-encompassing. But I wanted to focus in today's talk about a couple of the things that are really driving why we need to be reimagining research IT. So if you were the teen in the sort of the 80s, which which I was, so that's just dating myself a little bit, but you know, you might have watched B movies like that. The aliens are coming a few years ago. I started showing this slide, which is that the instruments are coming. And this is the thing that's really scaring a lot of people back then. This was scary to many of the large R1s, but most others hadn't sort of seen this monster coming around the corner. I think it's here now for just about everybody. So the growth of instruments is a big thing. And the other thing is that we have a lot more and many cases, much larger collaborations in research. As a proxy for that, I use some data from my own service. And this is an indication of the amount of systems that are being used to share data with collaborators across our community. And you can see that this is the number that are active in a given month. So we're approaching two and a half thousand systems out there. People are using to share data. So that at least is a strong indicator to me that collaboration continues to grow. So if you combine the sort of growth in the instruments, in the sensor rates, particularly, and I'll show you a couple of slides in a second here, with the growth and collaboration, we have this increasing need to really automate a lot of what people are doing just because we're well beyond the point of where you can point and click your way through things. So what I thought I'd do is I'd share a few examples of the work that we've done with various institutions around helping them automate their instrument data environments. On sort of the somewhat simpler end of the scale, by no means easy, is the typical next-gen sequencing genomics course. The University of Michigan has been a longtime supporter of ours. They sequence, you know, thousands of samples every year from many different groups. And there are a number of steps that these samples go through all the way ultimately to data being shared with some larger group, perhaps even beyond the project boundaries. So this is kind of a recurring pattern that we see of data having to be moved from instruments, going through some initial analysis, then perhaps going through further analysis downstream. And very importantly at the end, they're being shared more broadly. In some cases, these data sets are public and obviously that is a responsibility that a lot more researchers have nowadays with federal funding mandates. And the other point is that it tends to touch a lot of systems. So the infrastructure required here can actually be quite complex. And it's really something that researchers shouldn't have to concern themselves with. And that's really where we've done a lot of work to help people with that. So next-gen sequencing is one and actually I was thinking about this the other day. I don't know if anybody's familiar here with sequencing and the nanopore technology. But if you look at what you can get, the amount of data that's coming out of these systems nowadays that are literally the size of what used to be desktop computers in my early days, you know, multiple tens of terabytes if you run these things continuously, it's really, really scary. And the other area that the other instrument area that we've seen a lot more progress in is the cryo electron microscopy. If you don't have one of these on your campus, I believe you will. It's just a matter of time. And once you see, I mean, it's a cool looking thing. It's all closed up because it's very low temperature and doing interesting things. But this slide, when I first saw this data, it kind of blew me away. And it's already five years old. So down there is the resolution, the highest resolution of one of these instruments, two angstroms approximately, which I'm told is about twice the width of a hydrogen atom. So these things can see detail in the samples at essentially an atomic resolution. And again, Dan pointed out some of this in the earlier talk. So that's pretty scary, right? There's a lot of data. And this is the flow that's kind of typical. So again, you're getting data all for these instruments and manipulating it, having a human manipulate the images, is, you know, works at a small scale. But when you have, you know, these instruments are very expensive, shared devices, and you want them to be fully utilized. So you want, you know, this experiment to be done as quickly as possible, get the next one lined up and get, you know, sort of get that throughput as high as possible. So you definitely do need some level of automation. And this is another case of where it's a little different in the next-gen sequencing, perhaps, because you have a human in the loop a lot of the time. So that's another aspect to automation that's a little challenging. Sometimes it's easy to automate something from end to end when it's kind of a light-soul thing. You press the button and it just goes. It's a little different when you have humans sort of sticking their fingers in the pie and having the sort of the downstream analysis take different paths. So that's an area where we've worked really hard to understand what the needs are. And again, trying to get to the point where the automation works all the way from the point of capture of the data through to the point of publication. And I use publication in sort of a loose sense, not in the sort of traditional, I publish a paper sense. That can be one aspect of it. But in many cases this is still, you know, working data. But it has to be published in the sense that it has to be shared and made available to others. Sometimes just within the lab, within the project, within a collaboration, the broader community and so on at different scales. So one example of where we've been working on this is with the Rosalind Franklin Institute. I have a video there which probably won't work. But they are investing, this is a group in the U.K. at Harwell that are investing in a lot of very advanced instrumentation, particularly around cryo-electromicroscopy. And they are publishing, they're taking data all the way from the instrument through to, again, publication level. And a very big part of that is, beyond sort of the initial analysis, is figuring out how they're going to store it, who they're going to share it with, and more importantly, how they're going to describe it so that it can be discovered and reused down the road. The third example I wanted to talk about was this is another collaboration that we've been part of for many years. So there's a facility at Argonne National Lab called the Advanced Photon Source. It's a synchrotron, and it has many beam lines that are used to shine very bright x-rays through different samples and do various types of science. One of those mechanisms, one of those methods is serocrystallography, which I am by no means an expert. In fact, I know very little about it, but I do know that this is a technique that's used in one instance to try and discover the structure of proteins. And this is something that was very much at the forefront of the early COVID research. So we were fortunate to work with a number of groups. This was a multi-institutional collaboration where we took data from these beam lines and put it through various transformations, as far as getting sort of initial quality check on the data, then analyzing it, then doing the image rendering and then extracting facets and metadata from the image such that it could be published actually in a data portal that was then made available to everybody else that was working on these projects. And in many instances, these were large open science collaborations where these portals, the data in these portals, was immediately available worldwide and was used worldwide by others who were supporting those early efforts in drug discovery. So this, a lot has been written about this particular one. If you care, I have some references here. I should probably put the DOI, shame on me. So, but the key point is at the bottom there, 10 to 100 times speed up in time of the solution of protein structures at the advanced photo source. And again, I'm not a life sciences or a drug discovery person, but I do know that this aspect of sort of figuring out protein structures is critical to a lot of the early stage of any therapeutics and drug development. So being able to have that kind of impact, I think, is quite significant. And this really was, by and large, as a result of the ability to automate these processes end to end. Because, as I said, they touched many, many different systems and many different people. And it's not something that you could very easily orchestrate in sort of a manual sense. So how did we as a service provider, as Globus fit into this picture? So the systems that I work on, we have multiple different services to enable some of these kinds of capabilities. The one that I want to sort of talk about them working from the point of publication back, because this to me again is probably the most important, is to make sure that the data's out there and discoverable. So one of the services that we give researchers is called Globus Search. Not very original, I guess, but hopefully it conveys what it does. Key point is that we allow the researcher to describe the data in whatever way makes sense to them. So we are agnostic about the type of schema that's used. And we make it really easy to query these indexes just using simple URL queries. And the other part of it is we have interfaces to data sites so you can mint DOIs. And ideally you have DOIs associated with every single subject in your index. An important part of this also is the ability to share the data beyond, as I said, sort of the immediate group. And with sharing, we have the ability for anybody in an ad hoc manner to share this directory with someone in my lab. But it's becoming more and more important to have much more fine-grained control and different policies in place around data sharing. So as part of the Globus Search service, we've got this ability to define or to allow the researcher to specify not only who can share and see the data, but also who can share and see the metadata. So it can be all the way open and it is in many cases that's appropriate. But in more and more cases where we're seeing, especially in the life sciences, because there are protected data involved, we're seeing the need to have that more, so that bifurcated model where some things are available to all whereas others have additional restrictions imposed on them. So that's another aspect that we've had to deal with. And so again, as I said, five years ago, I was here and this was sort of the big thing for us to support and protect the data management. Of course, if you're in the space at all, you know that the list of compliance vectors here keeps growing on the left. We're doing our best to keep up. But we are seeing many, many more of the institutions that we work with blending together protected CUI type data with open data and using the same platform to enable researchers to work with those data. And this is really sort of our mission in talking to those folks is saying that you don't normally, sometimes, if I go back to early conversations we had, especially with some large hospital systems that we've worked with, the answer was just no. You go to the hospital, you're legal and it's no. Sharing means it's all open? No. And in fact, we've tried to sort of work with them to help them educate the stakeholders to understand that you don't have to make the trail of view. You can be compliant and you can collaborate within those compliance frameworks as well. The other area where we've been really, really active is as we've seen the evolution of systems, storage systems, particularly going from mostly service systems on premise, on campus at the institution, out to the cloud. We have had to support those. And as this diversity grows in these types of storage systems, it's becoming incumbent on the researcher to figure out how to use all of them. And we've tried to really make it easy for them by providing a unified interface so that when they're looking at their laptop, data on their laptop or they're looking at data on one of the large cloud providers or anything else, it all just looks the same. And that's been really, really important to sort of making the rest of this infrastructure accessible to them. And in some of those previous slides, I showed a couple of computation steps. This is a relatively recent thing that we've been working with folks on. It's essentially the ability to run code on a compute resource that you have access to irrespective of where it is and do it in a consistent unified way just as if you were running the code on your laptop. So without getting too technical, if anyone's familiar with the term function as a service, this is essentially what we're doing. But we're allowing the researcher to run it all the way from a laptop up to a supercomputer. And it's really just by doing the exact same thing. It's just that the platform will take care of putting it out there. And then sort of bringing it all the way back to the automation point, there's a service called Globus Flows that provides this reliable orchestration that I was showing those sort of looped flows. So the researcher can decide what the series of actions is that they need to take on the data, on the analysis and so on. They can sort of codify it in one of these flows. And then if you will outsource it to the Globus platform to actually manage that repeatedly at scale. And what's been really interesting actually over the past probably two years is we've seen a lot of these services, we've helped people build sort of end to end solutions, but we're starting to see more and more people building their own. So they're accessing these services directly and integrating them together with some of the portal frameworks that we have out there. This is just a sampling of some of the data repositories and portals that are out there. So the two defining characteristics of these kinds of services are that they serve a lot of large data sets which are very, very difficult to handle if you're just using some of the repositories where you're limited to downloading from your browser. We work, for instance, with a lot of climate data, I think I've got the Encard, the Research Data Archive at Encard is one such example. Some of the smallest data sets are terabyte size and there are many larger ones. That's not something that you can handle through a browser. So we've started to see more and more folks adopting these services in their own environments and building these more bespoke solutions. And that's actually also reflected in some of the data. So this is the number of applications that have been registered with us that then use some of these services. So we're approaching the 10,000 mark, which I think is actually pretty cool for as far as the community adoption goes. The other key thing that we focused on is in terms of, so I had fair in my talk, in my title, and that wasn't just clickbait. Besides, I was competing with two other AI talks. But hopefully you've heard me say things like accessible and findable, et cetera. And that's really ultimately what we're all about is that through automating these processes and using these services, the data are by default fair. When they come out the other end, if you will, whatever that end looks like, they're there, they can be found, they can be reused, they can be moved to other places very easily. So they're really accessible. And one of the ways in which we do that is that we allow a lot of institutions, a lot of researchers to log in with their existing institutional credentials. So we've, since day one, actually, we've been working with Internet too, with an in common federation and through EDU gain and other federations like that. So anybody that's part of those federations already has access to these capabilities. And we continue to expand on that. So I know I've got just maybe a couple of minutes here that I want to leave time for questions. So I want to talk just briefly about what we're doing now, sort of what the next wave looks like. So beyond just automating the capture of publication, obviously we're seeing AI and other things coming into the fold. So we're starting to see things like this, you know, this idea of smart instruments. This is an example of a collaboration between on the left is Slack, just a few hundred miles up the road here, an argon a few miles from where I live. And again, it's one of these experiments where the sample is being hit by a beam line. Some data, some analysis is being done sort of at the point of capture. And then the data is sent over to argon, which takes a few seconds. And then another model is run. Another AI model is run at argon. And then the results of that sort of optimized more in a finely tuned model is sent back to Slack. So roughly every 30 seconds you have this experiment being self tuned and sort of not self tuned, but being incrementally improved and tuned. So this kind of sort of smart experimentation is something we're starting to see a lot more of. Ian Foster, who leads our group, is heavily involved in these things called self driving labs where even the physical infrastructure that's controlling the experiment is actually sort of modified and controlled by these automated processes with some intelligence and machine learning built into them. So I will I will leave it there. So we have about five minutes for questions. Thank you. So I will say as a general approach, we're not we're not involved directly in those disciplines specific. So the examples that I showed with things that we helped with, we were part of those projects. In some cases, we did some of the development ourselves, but really it's up to those collaborations and those folks to to build those solutions. So in a in a sort of a negative way, we're the plumbing that that allows things to flow. And we should probably not be seen or even people shouldn't even know that we exist in some ways. So that's that's the rule. That's sort of the philosophy if you will behind it. But yeah, it's really if you have that's that's environment, which is actually very common, not just in astronomy and many other disciplines. It is incumbent on the institutions as well to to invest in pulling some of these pieces together into a solution. Yeah. Yes, I don't know that. So Dan put the slide up there with big uncomfortably big data. Is there an uncomfortably small data set? I don't know. Generally, when you're dealing with lots of small files that present all sorts of problems, not just in our infrastructure, just in storage systems in compute systems. So it's a judgment call. I mean, does the system perform optimally at those extremes? Probably not. But then most other systems that we connect don't either. So yeah, there's there's probably a sweet spot. But we see people all the time. We have one example right now. I'm actually we're talking to some of the folks that have sensors out on ships in the ocean where they're gathering, you know, thousands of readings, but everyone is a little file with the reading of something. And they have to bring all this together and they have to transfer it to shore where it can actually be processed on real machines and so on. So we're working with those kinds of systems and there are ways that you can optimize those those processes. Yeah, it's a great question. Yeah, we get asked that a lot. So we have there is a way for you to specify that you should sort of archive sort of tar the files if you will and then push them out. But again, that's something we leave up to the the user or the institution or the research computing administrators to to put policy around because those systems are, you know, depending on how they're being managed, you can do lots of weird things and break them and make it very make a lot of people very unhappy. So yeah, but we do have that the ability to do that. Yeah, what I said was that you can have differentiated policies to control access to the metadata and the data. So an example, so we we we worked on a on a federated set of a cancer register cancer data registries with the University of Pittsburgh. So there you could see, for instance, in the search, you could see cohort level data, but you couldn't see anything more specific in the metadata, right? And then as far as the data, then you had to sometimes go through an IRB or other mechanisms to get access to the actual data. So point is that you you can have that separation because sometimes it's sort of an all or nothing, right? I either see everything and therefore I have to really control who can get to it or I see nothing and then it doesn't really help with collaboration. So we try and sort of break that apart a little bit. Yeah, that's a great question. So so we have so we on the authentication side, we just act as a broker so you can show up and log in with your whatever credentials. The authorization is controlled by whoever owns controls that system. So if you are the owner and manager of that storage, you decide who has access to it. We give you mechanisms within the global service to reflect those policies in the in the installation that you put on campus. And we and we enforce them, but it's really up to you to define those. Yeah, sure. No, that's actually not not the case. So we have this hybrid model, so I'll talk to the subscription just briefly. The core services for moving data between systems are available to any nonprofit researcher. There's no cost for that. If you want to access some and many of these other features also have limited access. So for instance, if you want to have a search index, you can have one, but without a subscription, you can only have one. So if you want to scale this out, we do ask people to subscribe, but you can actually use a lot of these capabilities without having a subscription. Oh, yeah, sure. I agree. And and that number that you saw of the number of active users. So I think we're at last count at well over 2000 institutions actively used on the daily basis of those a little over 10% of subscribers. So that's there. And and we also have, for instance, many partnerships between, let's say, from a pharmaceutical company and a university where the commercial partner needs access to the data. The university can make it available to them. There's no cost. I mean, we have lots and lots of these scenarios. So if you have a specific case, I'd be more than happy to chat with you afterwards. Yes. Yes, the eternal question. And again, Dan, of course, it's all the thunder there. We don't have any mechanisms for doing that because, again, it's your data or it's whoever's data. And you have to make sure that that DOI resolves to something that still exists over time. We because we don't own or in any way control the storage or what happens to the data when it moves between storage systems, right? And we know that it's moved, but it's still between your own systems. So we just don't have a way of of of ensuring that, unfortunately. Yeah. Well, thank you very much. I'll be around the rest of the conference. Happy to check.