 OK, hi. I'm Nigel, and I'm here with my colleague David, who's going to be taking you through some of the other content. I work for IBM, I'm based in the UK, and I'm actually a maintainer on our open source Egeria project. Egeria is part of the LFAI and Data Foundation, it's a graduated project, and we've been involved since the inception of that project. The purpose of Egeria is all of our metadata, so that's what we're going to talk about today, and what exactly is the problem that we're trying to solve through a solution like Egeria and other software solutions that are out there in the metadata space. Now, some people will know a lot about metadata, some people may be not so much, so the first thing to think about is what do I mean by metadata? So, what we have here is a picture from Brighton where I live, this is a famous pier. You may recognise that, or maybe you don't recognise it, it depends if we're in the UK, maybe more people recognise it if I was over in the US, less people would. So, what metadata gives us in this context is all of this additional information, this is from Apple Photos, and it's telling me things like what's the location where it was taken, what's the location where it was taken, what device did I use, what were things like the shutter settings that were used. And also, maybe things that I've curated, like tags I may have added, that might help me in my searching for photos later. And the important thing about this as well is, over time there's become a lot more standardisation about the format of that metadata, so you can use different photo apps with all your photos, and you can kind of get a consistent view across them, so standards is actually really important as well. Now, if we look in the organisation, this is actually a map of a lot of systems within a bank that we've worked with who are actually still involved in the project. And this just shows you all of the different systems that are collaborating, different flows of data, different service calls, it's complicated, and it's got more and more complicated over time. This has been going on for many, many, many years. And when we're trying to do our data science, we're trying to develop new AI algorithms, what's really important is that we actually understand not just what all these data elements are, but also things like the flows between them, who owns them, things of that nature. And when we look in these enterprises, what we actually find is that they're using a lot of different tools. So they may have tools from one vendor which works very well for their development teams, and very different tools that maybe are used in their data science teams. And what we typically see is, although there might be metadata, for example, about database tables that are being used in the data science space, they're probably in isolated silos. So other tools, maybe in development for example, or business reporting, they don't have access to that same metadata. So there may be repetition, there may be inconsistencies, these tools are not working together. And we can see one example of that here. So this is a scenario where we've got one person, Callie, and there's a whole lot of personas we've got in our documentation you might get used to when you refer to it. This is one of our personas. And she's actually using public data from a local government data source, and from the employee directory, and she's going to try and join them, and she wants to send birthday cards. And what's actually happened here is that she didn't understand enough about the information that she was using, because if, for example, in the UK in many countries, you may have two dates associated with a date of birth. There's your actual real date of birth, and then when it was actually registered. So you've picked the wrong field, you've sent, in this case, birthday cards to people on the wrong date, and it's obviously not their birthday. So you failed in your objective. That's a very simple example. That's just birthday, probably not too upset, but you can imagine many cases where using that wrong data is going to get you into problems. The other thing about metadata is there's a lot of different types of metadata that you'll see in an organisation. It could be things like the technical metadata which describes the fields, the columns, the tables, store procedures, etc. Within a relational database, but then you may also have an enterprise glossary you've used where you're trying to describe in a consistent way all terminology that you use in an organisation. So how you describe a customer or how you define a renumeration in some way, and you want those to be consistent. So there's an implicit linkage if they're using them to describe. And of course you've got your policies maybe around how you control data, how you worry about things like archiving. And so what is important to recognise when we're managing metadata is the linkage between those, and that's a critical element of managing metadata. So there's lots of opportunities if we get this right. This allows you to do things like look for a new business, become more efficient, become more agile, and be able to do that across many different parts of the organisation and meet compliance, etc. So a few little quotes to finish this little section off. David Weinberger, British, I think mathematician I think, metadata liberates us, liberates knowledge. And actually we can think of examples like going back to the Second World War where actually metadata was so important in understanding how different sort of organisations communicate. Tim Berners-Lee, data is a precious thing and will last longer than the system themselves. That is just supporting the fact data is really important and metadata tells you what that data is. So there's a few more quotes that you can read as well here. And with that I'm going to ask my colleague David to talk just a little bit more about Ageria itself and metadata. Thanks Nigel. My name is David Radley and I'm an Ageria maintainer and work with Nigel for IBM. He's explained the business problem that I think you're probably familiar with. I'm just going to check how many people here work with data and metadata. If you put your hand up. All right, so most of the people and the others I assume are interested in this area. Okay, that's good. So what I'm going to do now is the title of the talk was about building an ecosystem around governance of data and metadata in open source. We talked about the business problem. What I wanted to talk about was the vision. How would we really like it? And then we'll talk about the software that can enable it, which is going to be Ageria in our case. So talking about a sort of a new dream, a new manifesto for metadata and governance, because the maintenance of governance of metadata for all of this data, which is vast now and diverse, it needs to be automated a lot. So I've got this robot to represent automated, but also it needs to be provided to the business user appropriately. So it needs to be nurtured in a very special way to make sure that the data is appropriate for the different roles in the community. So that's one of the things that we think is very important. We want the availability of metadata to be everywhere. So we want it to be knitted through our tools so that metadata discovery and maintenance, for example, is the bottom one, is around every tool that accesses the data so that metadata is taken out, sucked out and can be seen. So this is sort of making sure that the access is open as well. So not only to see this metadata, but we need to know what it means. We need to sort of codify it in some way to have open agreed standards around the elements of the metadata. But also similar to this sort of bespoke sort of consumption, it needs to be personalised. It needs to be appropriate. A dataset for a data scientist needs to be prepared in a certain way, which is different for a dataset, say the owner of the data will need a different view of that. And the consumer of the data for application development would need a different view of that data, though fundamentally that would be a dataset at the technical level. So my quote at the top was, you may say I'm a dreamer because this is the manifesto, but I'm not the only one. We're in a community, the open source. I hope that someday you'll join us, which is the whole point of open source. And the last one I've made is because metadata can drive you on was my change to the last one. But it seemed quite appropriate because we're all talking about metadata that we want to understand in an open way and having community around it to agree the standards that we want to go forward with. So Ageria is an open source project, as Nigel said, a Linux foundation. It's trying to make real that vision. So it enables the sharing of knowledge. We have a distributed federated model and we have agreed open types that describe the various types of metadata. Everything that we think is important around an information ecosystem, from its governance, from policies and the semantic meaning of things and the technical metadata, we have agreed in some open types, which I'll talk about next. And this is the open types that we have here. So these are really the core of Ageria. This is how we have a common language that enables one tool to talk to another repository using this common language. Because traditionally integrations between these metadata services and repositories have been, maybe it's difficult to know what you know what it means in one repository. You know what it means in the other repository. But can you get consistent mapping between what it means? It's very difficult because they've each got a slightly different nuanced view of what it means. So the way we solve that is they both map to what a agreed open standard, which these are these open types, not technically a standard for the standards person here, sorry. But it's an open, these set of open types that Ageria has that are very well understood. So we say it has to mean this for you to map to it. So we have a way of communicating using these open types, which is probably the main way, the main value that Ageria has. Of course, many vendors would have a set of types for these things. And there are many metadata standards, but often the metadata standards are on an industry basis, although for a very small area of metadata, this has got a very wide, it's describing, it's looking to describe everything. And we've made a pass of it with some very expert people to come up with this first pass of these open types that we hope is good enough. And we've improved them slightly with versioned them with time as well. It's backward compatibility with these types. And if you don't think these types suit you, then you can extend them or add new types. And so it's a growing organic thing. But the more that we can agree on, the more that we can, the more that we can write tools around those agreed things. That's the sort of idea that we have it in the open type model. We can then write code around it and exploit those types and know what it means. So, the more we can agree the better. If we all have our own language for everything, it makes it difficult to communicate. So, a common language is the first step towards communication across cultural boundaries. And I was thinking, we all know that the data warehouse guides and the data scientist guides and the people who own the data and the DB admin have very different languages. They're very different cultures. They're different people. They've got different motivations, different roles. This is allowing us to have a common language to cross those cultures. I could definitely recognise that within organisations. These different ways of thinking, different motivations for different groups lead to different cultures and different hopes and dreams and motivations for each of these groups. So, these are the main areas for the open types. We have the base types for systems and infrastructure. We have technical metadata like assets, APIs, databases, topics or events, metadata discovery. We have reference data, collaboration things. We have policies in the governance. We have a semantic world. And it seems that the most important part is actually the linkage between these areas because these are often, as we just talked about, separate areas. So, we actually have within the metamodel of Nigeria lots of linkages. You don't need to learn this, but it's in the slides if you download it afterwards. But one that's quite interesting is that you can associate a meaning with a technical metadata. So, a column that is a national insurance number, you can actually associate with that with the meaning in the semantic layer. This is a slide showing that some parts of the organisation may already know what something means and other parts don't. Because of this, the silent nature of existing infrastructure, it's difficult to connect those two people, one has the question and one has the answer. So, Nigeria comes in and through this idea that we talked about, that it has the common method to be able to, the common way of representing things in metadata, these open types. And it covers, it can be deployed on all these different types of platforms like data lakes in the cloud and IoT spaces, application development. I'm just going to quickly go on. This is an example about how looking at metadata, how metadata gives value to me. So, the bottom layer, this is a string and we don't know really what it means. Some things like to see some numbers that might be interesting. The first thing we can do is put across structural metadata to get the technical metadata. So, database columns, for example, or just splitting them up. And then the next level that we can go to is put in, actually relate them through to meanings in the semantic world, the glossary terms. And then create a semantic model relating how managers and employees are related with ISRO and hazard relationships. So, this is a really powerful way of bringing the language of the business and being able to work at the language of the business and then map it down to the actual technical ones which look a bit like gobbledygook if the business person has to deal with that directly. We also have the ability to classify things. Here it says sensitive, we have confidentiality levels that you can place against the technical assets or you can place them against aspects of the glossary terms and the like. So, the data needs to work harder. So, we've got, as we have this data and it was very safe within a little silo surrounded by an application, it had its own protection. As we now bring it out to be able to be used for analytics or governance to get a coherent view across the data so that we can meet compliance regulations like GDPR, like where's all your customer information. It could be split between many different silo's information behind many different applications. As we take away the protection of those applications, we need to re-add it and re-secure it across all the data that we've now exposed potentially. So, that's why this making data available is very close to the governance program. We need to work out how we want to govern this data probably based on its meaning rather than that it was DB2 or Oracle or whatever. There can be some factors around particular databases that are sensitive as well. So, that's the idea that potentially these turtles or tortoises, they lose their protective shell and they're looking a bit embarrassed and they're a bit exposed at the moment. So, we need to be able to put something back in. So, Ageria, as it's taking away the shells effectively, needs to add something back to be able to protect them again. So, we've got a big tortoises shell now called Ageria and it now has to look after that metadata. So, it's got concepts like zones for assets so you can actually deal with things that are sort of at the right level. You can have development aspects of a zone. You could have a production zone, you'd have an archive orientated zone, you could have sandboxes. There's different approaches that it takes as well as effectivity. You can use that this is, Ageria is a metadata layer. So, that's why we're talking that it needs to be able to store the policies and the rules at which we have places in the metadata force to be able to keep the data safe again. So, in operation, Ageria, it can live in all of these different spaces. Like we saw in the other picture, it could live in the cloud, on-prem, closest to it with IoT. Here, we have Ageria platforms. So, the way you run this is that you have a platform and that's effectively a process. Instead of writing lots of different bits of software, bespoke POS software for all the different ways we wanted to use the metadata, for example, as a view service or as a discovery engine or as a metadata repository, what we've actually done is encapsulated that in configuration. So, the platforms there and your configuration describes what the type of server is. So, you run these yellow blobs, these orange blobs, as servers that run on the platform. And they're all configuration orientated by default. Those will be in files. Are we just about? Am I just about finished? OK. OK. So, that's how well the Ageria platform and servers are about. Nigel is going to go on and talk about, I'm going to a demo, and he's going to talk about how you can get involved with the community. This was sort of an overview of some of the general ideas about why we think Ageria is an important place to be able to manage your metadata and govern consistently. And it's demo time. So, the challenge always is how do you represent the stuff that you've been working on? It's kind of middleware, right, and metadata. It's so important to us all. But what does it mean in terms of actually visualizing it? What could we show you? So, what I'm going to show you today is some Jupiter-based notebooks that we have. So, this is deployed, can be deployed locally, but typically we've deployed through Kubernetes. It's a Helm chart. You can go out, look at our documentation, ageriaproject.org, which is on most of the charts apart from this one. I think maybe it's just a bit dark. And that gives you instructions for running what I'm going to show you now. And in fact, if you look at the foils later, there's a few links here. I'm not going to talk about these, but if you pull down the charts, that gives you a few links to the specific demo. So, we're going to flip on over to the web browser, hopefully. If I press the right button. Always the hardest part. Mirror. Okay, let me just exit. Somewhere. Here we go. Okay. So, what I've actually done here is, I've started up this demo before the presentation. It doesn't take very long. It takes a few minutes. We might look at that later. But we did this based on this organization that we've got called Coco Pharmaceuticals. It's a medical research company. It's gathering results on a weekly basis that it has to manage. And so they have a certain infrastructure. They have a lot of personas defined. This is all in the documentation. And this lab that I will show you, which will actually be... Let me just check if I've run this one. You can step through. And I'm just going to go to one I created earlier. Bear with me one sec. Asset management building a data catalogue. Okay. So, this is going to take us through a scenario, which is actually using a gerus. It's actually running in the background. And we're going to focus on these two people here. We're going to concentrate on Peter. And we're going to concentrate on Erin. And they're going to actually catalogue some of this new data that they've got. But first, let's look at the infrastructure that Coco Pharmaceuticals has got. And so they've got a number of different what we call platforms in Kubernetes. These are running in different pods as containers. But within each platform, we have a gerus servers. And these are what effectively service the requests that are associated with different lines of business. So, for example, a manufacturing organisation has got one virtual server. We've got our users on the data lake, which is CocoMDS4. They've got their own server. And this is what Coco is going to use. So we're going to run through this notebook step by step. The thing that, OK, what I haven't done is configured it. So let me just run this again. It's funny. It's always the thing we're doing live demos, isn't it? That they don't work. So let me just go to the top of this. And I thought I'd run it. Let's try again. So we will be there in a sec. What we have is actually have a notebook that configures all of the servers, which should have run, but obviously hadn't. So it's running now. And this issues lots of REST API calls. So you can read through this in your own time, but that's actually configured the servers. And then I'm going to run this one, which starts everything up. This is all Python scripts. You can go and look at the Python code. You can see what's actually happening here. OK, now we're ready. So now we're going to go back to this demo. OK. And this time it should be fine. OK, it's starting. This will take probably about two minutes. So what we'll do is, whilst that's running, we'll talk a little bit about the scenario. So Peter's going to add some data sets to the catalogue. A data set we record is what we call an asset. This is the thing that really provides business value. So it's the thing that we really, really want to manage. And so asset has attributes, of course, like owner, through these relationships that we have. And so the Python code in the next section, which I'll run once the service started, is going to actually add those assets into Ageria in the data catalogue. So this is actually running the REST API services directly. We also provide Java client libraries. So you can get sometimes more validation by using that than calling the REST API directly. We're hoping to actually wrapper up some of the libraries that we've got here in a shared Jupyter Notebook, maybe into a Python library at some point. And actually, if anyone wants to contribute to that, that would be really useful as well. At the moment this is running on the cloud. And it's just taking a minute. So we should be there soon. So when we look at an asset in this case of a file, what we're actually going to create is a whole lot of different elements here. We've got the file system, we've got the different folders that they're in, and the actual data file. And all of these will appear in Ageria as linked entities so that we can navigate the relationships between them. And we should be there very soon. Okay, there's a problem. I don't know why that's not working. Okay, that is going a little bit problematic. So what I'm going to do, let me flip back to the charts. And if we've got time, we'll come back to the demo at the end. I think that's probably the best thing to do. Okay, so let me just talk a little bit about community. We've got one project here, Ageria, which is in LFAI in data. But we're part of a much larger organisation. These are all the, not just LFAI data projects, but other projects in this space. And we can see lots of potential for linkages between them. For example, there's the Open Lineage project, which is really interesting. There's projects like Marquez. And Open Lineage we see a lot of other projects integrating with. So there is a presentation tomorrow, which you may be interested in, which is from our project lead for Ageria, which is Mandy Schessel. So she's going to be presenting on Thursday afternoon. So I would definitely recommend that if you're interested in lineage. And another part, of course, is what you can do, how you can play. So we have releases monthly. That just happens without fail. It's something that we do. We worry about things like security. The talks we've had here around S-bombs have been actually really interesting. It's something that we've talked about doing. We haven't done yet, but we need to. We manage all our dependencies. We try and minimize attack surfaces and things like that because we are focused on making this an enterprise solution. We also have things like different storage back end. So you can have a metadata repository with Ageria. And you don't have to. You can also link to an existing metadata repository. But we've been looking at XTDB, which is very scalable. We can do things like time traveling with that. Areas that we're also looking at at the moment include making tutorials and education better. Building connectors to multiple technology. We're doing a better JDBC connector at the moment. There is a user interface, which you can play with. I would recommend going back to the notebooks. If we've got time, we'll quickly touch on that. And our repos. So github.com, ODPI, Ageria is our main repository. We've got about 30 or 25 or 30 in total. So if you can star that, if you like what you see, that's always really good for us too. And if you use Ageria, if you take a look at it, give feedback. We've got Slack channels. We've got mailing lists. We've got the issues, of course, in the repository. We have an Ageria docs repository, which is actually what builds the ageriaproject.org website. And people that are new to the project often are the best people to actually comment on some of those documents because they make obvious statements about what's missing. So we absolutely would love your contribution. So with that, how long do we have left? Do we have time to flip back to the demo for a bit? I tell you what, why don't we start with questions and I'll look at getting the demo working whilst we're taking a few questions. Are there any questions anyone has on the virtual side or the physical side? I mean, the way that most people would get involved with this would probably be thinking about what technologies they were interested in connecting in and then be looking at creating connectors, Ageria connectors. There's a lot of detail which we haven't gone into. We're looking at the sort of very high level here, but looking at creating those connectors would be the way that you would get a new piece of third-party technology into this ecosystem. And we're accumulating connectors and I've written a couple and we're getting a JDBC one being written at the moment so that will cover a lot of connecting in sources. We've done one around Kafka events and Kafka, we're working on one for schema elements around events. We can do APIs, we can take in open lineage. So there is a lot of these connectors. We have a connector framework which everything in Ageria is written around. Everything is pluggable. So it's been written in a very rigorous way, a very architected way. The way it was written initially was the person that wrote it wrote the audit log first and then the connector framework. So this was a very rigorous layered way to create Ageria code base itself and as well as the open type system and the protocol that goes around that that facilitates their communication. So we have that layer and then we have this consumption layer around accessing the metadata in the different tools or different personas. This sort of which I had is the different coloured dodots. So you can, so data science sort of way of accessing metadata or a discovery orientated view of the metadata. So that's a little bit more detail about the technical side and where you might get might as an organisation might want to get involved would be to assess whether there's existing connectors that you could utilize, whether there's new ones. If you like this idea of it not being owned by one vendor because that's one of the big advantages of having it in open source it's not owned by one that we all own it. So it's not a Microsoft versus Google versus an IBM sort of standard. This is something that we all sort of can own as a community and so that people tend to feel less reticent to adopt it because it's a bit easier to adopt that sort of thing because it's ours to begin with. So that's a bit more detail. I don't know if that's triggered any questions or it's written mostly in Java. We do have some Python bindings. So if you're looking to contribute there are, there is UIs. So JavaScript there's a React UI. So we've got node and react around that user interface area. So from a developer point of view someone wanted to be involved with the community, contribute code, that could be an area, either the React area, JavaScript area, the Java area or the Python area. They're the main ones. So that we do have Kubernetes as Nigel talks about. So these charts are based on Kubernetes. I don't know if you saw I was reinstalling the charts as we were doing that. So we have a Helm repository and there's instructions. We've built these things called dojos, which are education sessions. So day one is called getting started. And in getting started it will walk you through using these notebooks and getting you to feel for what Nigeria actually can do. And then we have a second day of dojos and that's actually working more at the sort of Java API level and walking you through building an application or a little plugin that works alongside Nigeria and shows you how to build your Java code. We will have more dojos. We've got one on governance that we're working on and one on production cases that we're working on. And so, yeah, I've been doing some of the Kubernetes deployment. Some of the people that we've got deploying are obviously deploying in Kubernetes. It's still an area we're working very much on. So if anyone wants to contribute, that's a fantastic area for contribution to, especially in this sort of community that's so much going on around Kubernetes. I started working on Operator. And what that means is it just makes it easier to manage these systems because you can clearly define some of the constructs that we have in Nigeria and do operations on them in the normal Kubernetes way using custom resources. So that's kind of quite an exciting area of development as well. So as you can see, what we've done here is we've restarted the lab completely with about three minutes to go. So we're probably going to run out of time on this step, unfortunately. We have got a link in the presentation to agiriproject.org on there. It talks about our community, how to get involved. We have weekly calls. We have the webinar programme where we educate people on a particular subject. Come along to GitHub. I'm Planet F1 on GitHub. David's David underscore Radley on GitHub. David Radle, sorry, on GitHub. So come and take a look at the project. And if you want to try this demo, I'm happy to point you in the right direction. If you get any difficulty with it, it will teach you a little bit about how to make those REST API calls, how to actually make use of Nigeria, and also look at some of the UIs with Nigeria. So the question was what kind of policies can be defined for data security and are there any examples? The way that we define the word policy in our open types is actually it's a verbose definition. So maybe all confidential data must stay in the building, or something very verbose. And then we have, in the metadata, we define rules associated with it. So security-orientated rules might be around PII sensitivity. As you need to treat PII data differently, to be able to comply with GDPR. So GDPR has a whole series of constraints about the way that you need to manage that data, and those could be articulated in rules, on what we would call rules that live under a policy. So that might involve obfuscating if we're the metadata layer. So we're not the one that's actually implementing the rules. So you could look the rules up and then run them either in sort of enforcement road if that's the way that you want to run things or just to say how valid is my, how compliant is this data and use those rules to describe that. Did that answer your question? I don't know. Yeah, they probably can't hear my... There's a bit of a delay, so I'll let you know. Okay. Anything, any other questions? I can see you're all rearing to start coding Igeria PRs. I think everyone wants to go and start working their S-bombs, because that's been... Everyone is doing that these days. So thanks everyone for coming. I hope it was useful to you. Igeria, we're on GitHub. Metadata, come and take a look if you're interested and thanks for coming today. Thank you. Thank you.