 Welcome to another edition of RCE. Again, this is Brock Palin. You can find us online at rce-cast.com. You can find our entire back catalog there of over 100 episodes of all your favorite research, computing, and engineering topics. I have again here Jeff Squires from Cisco Systems and one of the authors of OpenMPI. Jeff, thanks again for your time. Hey Brock, happy new year. Welcome to 2017 here. And you know what? I was actually just wondering how many episodes have we done? Oh man, you had to ask this right here. I really did have to. Okay, I just pulled up 108. All right, that's not too bad. Yeah, okay. Yeah, doing okay. That's respectable. Continuing a strong tradition, we got a new episode today here. And we got some friends from out east. Yeah, so we have Terrell Russell and Jason Kopowski from iRod. So guys, why don't you take a moment to introduce yourself? All right, thank you. My name is Jason Kopowski. I'm the executive director here at the iRods consortium. I've been writing software about 20 years now and I've spent about a good solid six or seven years on this project. And I'm Terrell Russell. I'm serving as the chief technologist for the iRods consortium here at UNC Chapel Hill. And I've been working on iRods for a few years as well. So can you give us a quick overview of what exactly is iRods? Sure. So iRods is open source data management middleware. So you can think of it as a abstraction layer above your storage compute and other services and below a set of clients. So it provides a wireline protocol through which we have a number of client APIs. So in Python, C++ and so on and a number of clients that speak to it that are written in the wild by various users. iRods does four things well which we call the core competencies which are data virtualization. So it provides a distributed abstraction layer to your storage and a unified namespace on top of that. We provide data discovery. So that speaks to our capabilities of storing metadata and providing query interfaces to the metadata. We have an integrated rule engine in a domain specific language as well as other languages which will allow you to operate on the data. That can be driven by metadata and allow reaching to compute and various other services. And it provides secure collaboration. So since we have a unified namespace we have the ability to federate these unified namespaces and share resources across multiple instances of iRods. So now to the uninitiated there, i.e. me that sounds like a file system on steroids. Tell me why that's wrong and too simplistic. Absolutely. So first of all iRods does provide some positive semantics but it is not just a file system. This unified namespace, I'm sorry, this unified namespace reaches across both object, tape, standard, sand, NAS, whatever. And it's also distributed. So you can have iRod servers running anywhere in the world that serve this unified namespace to any number of users. On top of that we provide metadata as well as the integrated rule engine. So every operation in iRods has policy enforcement, hooks that are pre and post every operation that allow you to influence how those operations perform. So when you say a unified namespace, are you talking a file system like namespace or is it a URI type namespace or what? So this is Terrell, the namespace itself is yes, it's like a file system. It has a root so to speak, which is the zone name. And then from there you've got home directories but we call them collections because they're fake. They are themselves abstractions. The collections are pure virtual, they only live in the catalog. And then when it comes time to actually go get a file, what we call a data object because it's a logical abstraction as well, it can actually be in multiple places at the same time. There's a concept of a replica. So you can have your files in a collection either on the same computer or different types of storage all around the world but you still have that kind of logical view into that namespace and see those different data objects regardless of where they live physically in the grid. Now by the same token, you say that your collections I think was the word, those are virtual as well. So could I have a collection quote unquote folder that is comprised of files that come from storage in all different places but iRods makes it look like they're in one place? That is exactly correct. Okay, and then tell me a little bit about what are these rules things? You've talked about this rules engine or you've mentioned this rules engine a little bit. Tell me how does one use that? So data management policy itself is typically written down as a set of rules that users must abide. What iRods allows you to do is provide computer actionable implementations of those rules. So your data grid or your logical namespace whatever you wanna call it, your zone will automatically enforce that data management policy. And there are various official sets of data management policy out there. What iRods also allows you to do with the rule engine is leverage that metadata. So we can automatically ingest data and metadata that is resident in that data and apply it to the data. So it's immediately discoverable. So if you have an instrument that is staging data in iRods, when it arrives in iRods we can extract the metadata and use that metadata to drive anything from data discovery for scientists to the fact that it needs to be taken out to computation for sequencing alignment and then capture the products and annotate it with a provenance. And so the metadata and the workflow engine allow us to provide a wide range of use cases each of which can be tailored to the particular deployment of iRods. So those examples of user were primarily when ingesting data that you're kicking off. It sounds like pipelines which can extract metadata or you actually even talked about kicking off compute. So this could be actually be a front end for drop data, do something. Does it only do actions on when a new data object comes in or can it do it other times? At many other times. So iRods is based on plugins and every operation within those plugins has its own policy enforcement point. So anytime the API is extended that API also has the policy enforcement point. So this is for network traffic authentication reaching out to our catalog in order to capture information or metadata and so on. Every operation has policy enforcement points. And so anytime a user touches data with an iRods policy may be enforced. So the policy for the ingest is basically on the operation named create, right? When the file is created on a disk somewhere or on a tape somewhere, the post API hook for that create operation can fire. And that's the one that would then start off the compute or the extraction of the metadata or whatever it is that your policy at your organization or university or company has laid down. Other operations can include I open that file or I delete that file or I make a replica of that file. Any of those operations can trigger any arbitrary code that you would want to run inside the zone. So that arbitrary code that I'm talking about is the policy. If your organization votes and says we should have three replicas across three continents and we should email the boss when this happens and this person should not touch this file, that is the policy for your organization. Now you can actually write rules to enforce that. And of course, since the rules are just text you can now keep those rules in version control, you can map them over time, you can track them and make sure that nothing surprising is happening inside your zone. You said something in there about somebody shouldn't access this but you're talking about this in terms of a rule. So do I actually have to write my own ACL access or is this more like a logging who accessed it? We have ACLs and you can manipulate those ACLs through the rules, which is what I was suggesting. You can limit, read, write, any of those kinds of things. In addition, you could move them to different physical pieces of hardware. If you've got certain human medical data that shouldn't be outside of a certain geo restricted place, your rule could say this should only be on this set of servers and never go to a different data center. Just as importantly, you can say, we definitely have to have two copies of this and they have to be in these two different data centers. And so you can, again, whatever the humans in charge decide the policy is for the organization, you can now write the policy to reflect those rules. And it doesn't depend on that one system administrator who remembers everything before they retire. Yeah, and rules can also run asynchronously. And so another typical use case, simplistically you consider as tiering. So you have object, tape, flash array, what have you. And we have users who will just tier data based on access patterns. And so if this file data object has not been touched in three months, they'll stage it to tape and trim the replica on the flash array and vice versa. And that's simply an asynchronous rule. Another popular one would be data retention. So in the UK, research data has to exist for 10 years after the last time of use. And so University College London runs IROTS and they have IROTS automatically manage all of this. And so they have, as I said, various levels of storage and as it ages out, you know, they keep increasing those counters and when it reaches 10 years, they trim that final replica. So you said, you keep giving me new things to ask. The, what you described there sounds like a classic HSM policy engine. Would you compare IROTS, like would you actually, if you just wanted to have an HSM, would you look at IROTS as an option for that? Or would there be reasons you wouldn't use IROTS for that, even though it has this functionality? So people show up to use IROTS for one or two of our four core competencies at any given point in time when they become new users. HSM is one that they show up for. Certainly they have lots of different flavors of storage, lots of different vendors of storage. And so the unified namespace we provide gives you the ability to get your arms around all of that different infrastructure and provide a layer on top of it that insulates your users from change. And so if you have three to four different kinds of storage, you can set up a tiering system across it. And then once a new storage shows up, you can add it under IROTS and then stage data out of whatever the old things are into the new things and then simply decommission the old things without anything changing above the layer of IROTS. And since it's open source, it provides, that insulates the users also against vendor lock-in and so on. IROTS provides you the ability to get out of vendor lock-in. So what kind of clients does IROTS support? I mean, does it support, like, are we talking desktop users? Are we talking server users or large data users? What's the target audience here? That's a wide-ranging answer. So in large part, people mostly show up first for the sysadmin parts. We've got installations around the world that have just service accounts that actually talk to IROTS. And then we have other installations where there are tens of thousands of humans, scientists, who have direct access through client tools into their IROTS namespace. But there are clients that range from command line interfaces to drag and drop things on the desktop. We've got web clients. We've got access through WebDAV, through NFS. There's a variety of different ways that you can get into the namespace through the IROTS protocol. And in the last year, I think we've had three new ones show up written by people who are not us, which is. That namespace, when you look at it through something that looks like POSIX, like NFS or FUSE or something like that, is that, does that have to be set up in advance or can the structure of the way that appears be driven by, say, metadata? So I could look at the same collection of data, but look at it by patient or by diagnosis or by comment trait or some form of metadata. Right, so the different clients are gonna have different capabilities. In the command line, you can do all of the things that the API provides, but you have to type a lot. In certain clients, web clients, there may be some forms that can do some metadata queries and then you can actually save those queries as virtual collections. That will allow you to have a slightly different slice through the catalog. By default, we all think in terms of these trees of files, but when everything is in a database, that's just a particular query. Please show me all the files that have this path as apparent, and that's what you're getting back out of the database that the catalog, you can just as easily ask it, show me all the files that have an attribute named this versus this, and you'll get a listing of results that look just like the listing that came out of the tree version. So yes, depending on the client, you can definitely tailor the views into this namespace. Now, with these rules and this whole metadata infrastructure system that you've got here, is there actual computational power available to the end user there? Or is it really more intended to be just infrastructure, albeit a very flexible and powerful infrastructure, but it's not intended as an indirect computation engine as itself, right? Well, that is not really true anymore. So given the changes and the advancements that we've made in the last number of years here at the IRODS Consortium, IRODS has grown the ability to actually give you reach into your infrastructure in your compute infrastructure, and we actually have two computational models, the first of which we talk about taking data to the compute. So IRODS users, and this is a use case out in the wild, will stage data to parallel file systems from IRODS for computation, launch their jobs all driven by metadata, and then capture the resulting products back in IRODS at which point you can stage it out of the expensive parallel file system back into object or what have you and maintain provenance from front to back. And this is what gives us the ability to create reproducible science, which is really important to us here at the university. You have full provenance of the inputs, the outputs and the operations that have been performed on that data, and we have written all of that down for every time that data has gone to computation. So now you have the ability to page back through what has happened to that via auditing and see why did this run differentiate from that run. The flip side of that is taking the compute data. So what we describe as an absolute model, IRODS clients have the ability to package computation in Docker containers and ship that off to where the data is resident because you don't want to move your 20 terabytes of data around to perform your computation. And so we have very impressive IRODS servers out there that simply manage the compute on a scheduler resident with the data. So you're talking about capturing a lot of information over a lot of time, like an awful lot of metadata. And I'm thinking of some workflows where the amount of data stored would actually be less than the amount of metadata captured during the process. What back-ending system are you using to capture all this metadata? If I remember it was a SQL and a lot of times SQL doesn't scale to billions of entries, which I think you could hit very easily here, so. But what are you guys doing to scale? So the easiest thing possible, we're externalizing it. So we, as of IRODS 4.2, the rule engine itself has become part of the plugin framework. And one of those plugins sole purpose in life is to simply write down what has happened to the data and ship it off over an AMQP message bus. And we call this our audit plugin. And we plug that directly in as a reference implementation into the Elk stack. And so you can scale it at any direction you like. And you can also tune via regular expressions what parts of the workflows through IRODS that you wanna capture and get written down. And so for instance, if you're really only interested in things that have gone out to compute, you can capture those and you can filter out what you do and don't want. And then from there provide dashboards into what is actually happening with your data. But you are correct by default. We ship that with the asterisk regular expression. So every operation is being hit, gets recorded and shipped over that message bus. And I think for a simple listing of a directory, I think it was about 1200 API endpoints were hit inside of the IRODS software, which yeah, that adds up pretty quickly. Okay, so Elk stack, I mean that's Elasticsearch, Logstash, and Kibana. So we use Elasticsearch as the backend primarily. Okay, so we use Elasticsearch here. So like if I was using this in front of a massive tape library and I was doing millions of subjects and millions of jobs and was collecting billions of files and then billions times and pieces of metadata, this should have absolutely no problem managing that. As far as we know, that should work just fine. It's just now a question of scaling the rest of your infrastructure and network to support the load you're putting on it. Yeah, and as we said, that is very tunable. So you may know exactly what it is you wanna know and that may be just a few pieces of information per operation or job, so. You also can send different bits of your auditing to different locations. You don't have, you could shard them based on topic or based on namespace or based on type of operation. It's just a question of that regular expression being able to support what it is that you care about at that moment. Okay, so let's take a step back here for a second. We've been talking about iRods itself and how it works a little bit and what its capabilities are and things like that. But let's talk a little bit about your organization and the structure and things like that. For example, there's multiple organizations listed here. For example, there's UNC Chapel Hill, but your email addresses are renzi.org and then there's the iRods Consortium and so on. Can you explain the relationship between all of these? So the iRods Consortium is primarily driven by membership and so that entails a yearly fee and some benefits that go along with that yearly fee. So if you look at our members page, we have genomics institutes, we have storage vendors, we have other institutes, enterprise corporations like Bayer and so on that are all running iRods in production and feel the need to both sustain this software as well as be a member of the community. And this is my primary goal is growing this community providing this education and the suite of tools to help solve these problems. And the Consortium's job is to sustain the project as a piece of open source software. And then so the Consortium is inside of the Renaissance Computing Institute, renzi, which is one of, I don't know, 60, 70 research institutes at the university and so renzi is a research institute at UNC Chapel Hill here in North Carolina. And so we are all state employees and that's our job. Okay, so then is the iRods Consortium, do you guys define iRods? Cause you mentioned that I think you're up to version 4.2. Is there an associated standard that goes along with this that is defined by the membership? How does that all work? So yes, some of that. So the Consortium itself being member driven now, we have a very nice setup where we can simply ask them what the future should be and that's part of the reason they join is so that they can sit at the table and vote and yell at us and tell us what it should be. So we have a protocol which has not changed in a while and so has been backwards compatible for a long time. That's a wire line protocol. And that mostly is defined and being with regards to that federation that we talked about earlier. So our current version of iRods can talk to the last few versions of iRods. Once we get to the point where we want to change that, yes, the technical groups inside of the Consortium can vote and lay out a roadmap and decide when we want to change either the protocol itself as well as define the feature sets and kind of the timeline for fixing bugs and things like that for both the people in the room as well as the people who don't even know they're part of or having a bug afflict their day to day usage. So in terms of development, who's actually, is most of the work still being done there at Renzi or do you have a lot of external contributors? Who's doing the development? Right, so most of the work is still here done in one room. We've got a few people we keep in there. I think we're up to nine or 10 now. We do have a lot of input from other places but not a lot of pull requests. We have more now than we have ever had. We are getting some external pickup as, especially as these larger pharma groups with their own expertise show up and have opinions. But for the most part, they're happy and the things they're finding are pretty small. We're not finding too many problems in the sense that we have to fix things. It's mostly goals around the next sets of features and the abstraction layers we're providing through the plugins. Yeah, the plugins are important detail. So everything within iRods has a 4.2 is pluggable. And so we've had multiple storage vendors show up with their own plugins that are proprietary that work against the various versions of iRods. And so since we've externalized a lot of this, the core no longer has to change rapidly and that has really changed our development model entirely. We'll be able to release new suites of features that are simply plugins that can work for iRods 4.1, 4.2 and beyond. If I can add something else to that, I think another kind of a surprise feature is that when iRods is used inside the government and there are large reasons for keeping things secure and mandates with public dollars, in the past, if a new version of iRods came out and someone wanted to upgrade, it had to go through a full audit again because it was a new release. Now, since the surface area of new code can be reduced because we're talking about plugins, it's much quicker for an agency or for their security people to be happy because they can just audit the new stuff, the things inside the tiny box rather than the entire code base. So something I've been wondering about because I've been running across these different tools is how would you station tools like Hydra and Deespace and Fedora and tools in that space and iRods? iRods is what I like to consider an integration layer. And so iRods runs with Fedora and we have a Java API that directly integrates with Fedora 4 and so it works as a storage backend for Fedora and you can wire the various services through Fedora out to whatever client it is you wanted to use. And we've also had integrations with Deespace and I believe Hydra has also taken a look at iRods but I haven't been in contact with that group in a while. And so really it's a question at what layer you wanna use iRods, either as sort of the front end abstraction layer or more of a automatic storage management backend that abstracts you from the necessity of integrating with various other storage technologies. If you're keeping metadata at the Fedora layer and you're keeping metadata at the iRods layer you may have a problem. So now it's a coordination effort of keeping track of who's in charge. So in large part people who have done that have made a decision at the organization level where they're going to keep the truth. Some people have chosen to keep it above in the kind of portal layer and some people have decided to keep it at iRods and basically sync it up into the Fedora layer. It's just a matter of preference and a matter of use cases and matter of disaster planning I guess really. Now something we've kind of jumped around a little bit about kind of indirectly discussed but is iRods multi-tenant? I mean, can I have multiple different projects with different access requirements and different sets of rules from a single iRods installation? Absolutely, it can be achieved in a number of ways. And so something we haven't touched on yet is Federation Song going to blatantly use this as an excuse to talk about that. We have had organizations like the Wellcome Sanger Trust Institute that runs seven, eight grids, iRods, independent iRod zones internally. Each project runs its own iRod zone and they all federate through a single Federation hub of iRods at the user level and so they can hop between projects and collaborate across projects using our ability to federate at the unified namespace layer. Another way that this has been achieved is by having pools of iRod servers that all are members of the same namespace. Each server can have its own set of rules, its own rule base, as we call it, which define your policy and each user can be a member of a particular group that belongs to that pool. So if you are, say, a pharma company and you have certain users that have access rights to personally identifiable information or public health records, they can belong to a set of pools and have rights only to those pools or be the only users that have rights to those pools of servers that manage the storage underneath and so it can work both ways. So you said something very interesting in there that peaked a question to me, particularly throwing it back to earlier in the conversation. So if I'm a user that's supposed to have access to personally identifiable information within a data set, but then I'm another user who is not supposed to have personally identifiable information, but to the same data set, is this computation or filtering that can be done upon access of the data so that if I am a user with or without access to the personally identifiable information, that will either be stripped out or left in upon my access? Is that a use case? That is a use case. That is something that can be accomplished through our rule engine once again and it's a use case that is being worked on actively right now by a couple of groups. And so the idea there is, is that user rights or roles can be tagged as metadata and based on that you can get a replica of the data that has been previously anonymized and so you are only allowed access to a particular replica or it can be done automatically on the fly and that's a question of whether or not you trust your software and whether it's past audit. Have people integrated this into like IRB processes and stuff like that where a given person has access to a set of data for a limited period of time? That is certainly a use case that can be implemented. I can't say that that has been done in the wild as of yet, but I mean that's a use case that can be extended to a various other range of utilities. You can simply flip that flag based on the metadata or remove them from a particular group within the IROTS namespace. So then what about a speed and size? Can you, you mentioned federation, a few other things. Can any of those be used to leverage, total data throughput performance or like large data operations or anything like that? So yes, we've got, as soon as you span large physical networks, you're going to have a latency. So you can chop into that a couple of different ways. One is by having replicas that are close to the users. Another is to have the database itself, the catalog in different places by having different zones and just using the namespace federation capabilities. In terms of size, largely that's independent because that's gonna be a function of your storage kind of footprint, so to speak, through different vendors or physical locations as well. And if you're talking about just throughput, we have had a wide number of IROTS servers stood up in front of parallel file systems and that is one way to tackle that problem simply by just round robbing through the various IROTS servers to get aggregate throughput. Now along those lines, you mentioned the hardware vendor and we've talked about replicas a little bit and whatnot. Do you also handle hardware failure or is that kind of a side effect? You can look at it both ways, which is a feature or a challenge with IROTS. So given our replica model, if a server goes down and is marked down in the catalog, IROTS will simply just give you one of the other replicas. And so we've set up monitoring the inaugural and other systems that simply heartbeat the servers and if something goes wrong, they'll mark a particular storage resource down at which point you can send an email or a page somebody or what have you and then the other servers will pick up the slack. And that is hardware agnostic, which is the important part there. So you've mentioned parallel file system, object, tape. Do you wanna kind of go through a list of backends that IROTS supports out of the box or is that probably overly verbose? We can tighten it up. So IROTS will talk to anything that is a mount point, primarily, so that's where people like to start. And that of course can reach out to many numbers of things. We have native connectors or plugins for S3, DDNs, WAS, HPSS, TABE, TSM. And that has been the, I know that there is an ISLON plugin out there that is proprietary, that has just been open sourced by Dell EMC and Cleversafe has a plugin that is out there. And I know of at least two other storage vendors I can't mention that are also working on plugins of their own. All right, so you've named out a couple of corporate types of users and environments for using IROTS, but who else uses IROTS? Give me a general classification of maybe a couple of different buckets of types of users that are out there. So we've got a few different domains. IROTS probably got its start mostly in the library kind of archives and records management world. The metadata was a big deal at the time and very hard to do. It was originally conceived by a plasma physicist. They had a lot of data back in the day when it was all simulations and too big to write down. In the recent times we really have seen an explosion in the genomic space mostly because of the sequencers and the kind of volumes of data coming with that. We expect very soon to see a relative explosion in the medical space because of the new different types of optical things that can happen, taking pictures of very small things and very high resolution. But for the most part we've also seen pick up in some of the media and entertainment spaces and we've had trickles of interest from oil and gas. They have a lot of data in large part. They have a lot of money as well. So they have not necessarily focused their beam of attention yet but we expect that to happen pretty soon as well. So what language is the IRODS itself actually developed in? IRODS itself is entirely written in C++ and on the client side we have Python clients, Go clients, Java clients and of course the native C++ client libraries. On top of that there are a number of GUIs out there as well as plugins for FTP clients and so on. We also have a NFS front end as well as a web dev front end. This might be my own ignorance. Can I actually make a request to IRODS via simple arrest calls or anything like that? Oh yes, there is a arrest front end in front of Java. Okay, and then the plugins. You mentioned the plugins for the rules themselves has a domain specific language but then also could be developed in other things. Can you elaborate on that? Sure, so as a 4.2 the last part of IRODS that hasn't been broken out and put it into its own tiny box as we like to say was the rule engine itself because it was so tightly integrated with the original IRODS code base and so we went through that heavy lift and pulled it out into its own plugin. So the native domain specific language is a plugin as well as a Python plugin that we're gonna be shipping shortly any day now. And I know that at least there's one other rule engine plugin out there that is being written for go. I think that is it. We have written some other rule engine plugins for JavaScript and Haskell and some other things just for fun but I don't think we're gonna roll them out to production. So by having a rule engine that, so the rule engines themselves are written in C++. The plugins themselves are written in C++ but they provide the ability for an interpreter of another language to interpret rules written in that language. So when we say a Python rule language plugin it's written in C++ but provides the ability for scientists or non-administrators necessarily who already know Python to be able to write their own logic in a language they understand and are familiar with which is a big deal. We've gotten a lot of feedback around the Python which is why it's the first external, first other language that we're going to release. The other thing to know about these plugins is that they can actually be run concurrently in the IROD server. So a particular IROD server can actually run multiple rule engines at the same time and support rules written in different languages at the same time including the ability for those rule engines to call rules from each other. So you can actually call some of the legacy functionality in the original rule language from Python and vice versa. For some of the stuff you talked about though for extracting metadata and things can I just invoke a shell and use something like image magic to capture like the dimensions of the image? Can I do something like that? Or do I have to implement it in this language? Yes, yes to both. So part of our training that is out there in the world right now is extracting metadata out of images using image magic. And I went through their API but you can also just shell out and call convert or what have you. That's mostly a question of speed and implementation detail. If we've now seen people part of the rest of our training is to show that you can prototype in one of these scripting languages be it the DSL that we have called the IRODS rule language or Python. And then as you understand exactly what you want it to do and you prove that your proof of concept is good but you wanna scale it up you can actually write your own rule engine plugin in C++ for speed and deploy that at scale. Well that might indirectly answer my question. I was gonna ask if you have these language rule engine plugins that you said are all written in C++ but they support the various different languages and things like that. Are you actually compiling the rules that people write and say Python so that it's in some kind of byte code in an LLVM kind of sense or are you actually interpreting the language on the fly every time the rule is fired? So right now the Python rule engine plugin is written using boost Python and so that gives along with all of what that means. One thing we haven't touched on are microservices and we've been using that word a long time it means something different now but they're basically compiled C++ shared objects that give you the ability to effectively expose a function and bind that microservice into the rule language itself. And so through other domain specific language Python or one of the other rule engines that are supported we can do things like write a microservice function object that will extract the metadata from the data at rest for you using image magic at which point you can use the rule language to simply apply that to the data object at rest in the catalog. And so that was used historically for things that allow you to reach out to other libraries that the rule language wouldn't support which is no longer as important now that we can use other languages or do things that are computationally expensive. If you're doing that work in Python now yes you would be using the interpreter but you wouldn't have to write microservices necessarily you're just be writing Python functions. So some of the nomenclature gets collapsed once you start to put them in tiny boxes which is good because that word now means different things to different people microservices. So you mentioned I rods open source what licenses I rods distributed under? So I rods is and has always been a BSD three three clause pieces software so people can use it they can put it in a box and call it well I don't think they can call it something else I guess they can. It would be pretty obvious that it's I rods but yes it is BSD three. So what's the largest I rods installation that you've seen in terms of pick your favorite metric right total data total number of objects total number of users, whatever. So they're the Cybers project out of Arizona has probably over 50,000 users at this point so that is the largest user base that we know of as far as data there are tens and tens of petabytes out there. I don't know that we've hit a hundred yet it's hard to say since it's open source these are just the things that people have shown up and told us about and we're looking at hundreds of millions of objects as far as total size in that direction. We know that the Sanger Institute for every data object in their system coming off the sequencers we think they've told us they have over a thousand metadata attributes for every data object. So they do a lot of work based on the metadata and they are actually using an Oracle RAC database to hold the catalog. The other two that we support out of the box are Postgres and MySQL both all three are relational because we need that big C for consistency across our namespace. And then what was your most unexpected use? What was the thing you've reigned into that you're like I'm surprised someone's doing that but it works. So my favorite answer to that is that a Health Science Institute used IRODs to automate a real world workflow in a lab and we actually helped implement that. So this was a lab that was synthesizing viruses which is scary enough as it is. And they used IRODs in the metadata as well as the asynchronous rule engine calls to at every stage of the workflow they would put IRODs with an associated metadata into the system. IRODs would then run some rules that would generate the next steps and then so on and so forth. So we helped them take something that was very hard to teach in a lab that had a lot of high turnover and the people that worked in the lab which meant they're always teaching somebody and normalize that from how the forms were generated and how the ingest of the initial requests came in and we also provided them the ability to do queries for billing and so they became much more efficient and it was interesting to me because we're always implementing computational workflows. Okay, well, thank you very much, Terrell and Jason. Where can people, I'll find out, download a copy of IRODs, get involved with the community. Well, so IRODs.org is the main hub. That's where everything lives and it points to lots of other stuff. Our GitHub account, everything we do is public. So github.com slash IRODs. And then of course, you know, we are on the Twitter so also name IRODs. And that about does it. We've got a discussion group that's a Google group that's easily findable. A lot of people show up there first and if they're brave, they'll ask a dumb question but after that, it's a lot of email. Okay, thanks a lot for your time. Thank you gentlemen. All right, thank you. Take care.