 Hello everybody, and welcome to this next session. This session is about the ODPI, which is one of the foundations that are part of the Linux Foundation. The aim of this session today is to talk a little bit about the purpose of the foundation, but also to talk through some of the projects that we have. We are also looking for new projects relating to data, so if you like what you hear, like the way that we work, and have a project to suggest, then please feel free to contact us through the ODPI webpage. So the ODPI has actually been around since 2015. Its role initially was to help the different Hadoop vendors. They were all shipping different versions of the components that make up the Hadoop platform to come together and have a great set of standards as to which versions they were using in combination, and other aspects that would help an organisation that was using the Hadoop to be able to switch between different suppliers of the Hadoop platform. That was very successful and it created important discussions between the vendors. Now of course, the number of Hadoop vendors has continuously diminished in the last couple of years, and so the ODPI was looking that it really didn't have a role anymore. However, there was a strong belief that the data space is an important space for open source. In fact, there's a lot of open source in supporting the data space. And so the ODPI looked for a new role. And at the moment we have, it says not sharing my slides. So how do I do that? Let's do that. Okay, that looks a little better. Alright, so let me continue. So apologies for the technical issues. So we're now looking at the header page for this deck. And you can see that the title is ODPI Making Data Better. So I was talking a little bit about the history of the ODPI and the fact that it was a foundation that originally started around the Hadoop platform but felt that going forward, even though there were much fewer and fewer suppliers of Hadoop now, that there was still a role for a foundation focused on data. And about that time, there were a number of projects starting and so the ODPI moved into a new phase. And that's actually what I'm going to talk about today. So if you look at our website, you'll see that we're still very focused on big data platforms and how you use data at scale for things like artificial intelligence and other types of technologies that make use of a large amount of data. And we're still focused very much on bringing additional value to an organization based on their use of data. And for many organizations, they've had a past where they focus very much on the data belonging to applications and maybe they extract that data and build a data warehouse. But with new technologies like AI, there's a much greater opportunity to use data beyond creating reports through a data warehouse. So that's the area that we're focused on. And if I show what we are today, so as you can see here, the ODPI is this blue box. We have a board since we are a sort of our own organization, nonprofit organization. Then we have a technical steering committee and I lead the technical steering committee, which is what I'm talking to you today. And under there, we have three active projects. And these are the projects that are currently active. And as I say, we are always looking for new projects. So if you have any suggestions, then we're very happy to hear from you. So the idea of the technical steering committee is to basically act as a set of mentors to the projects as they come into the organization. And each month we share the status and between the different leaders of the projects so that they can see and pick up on best practices and good ideas from any of the other projects. And so I'm going to go through and talk a little bit about the projects that we're getting involved in. But what you'll notice is this isn't just about code. For many organizations as they try to become more data driven, they face cultural issues, organizational issues, as well as technological issues, because for many organizations they operate as a sort of hierarchy, a controlled hierarchy that creates silos between each of the different IT systems. And to make use of data, you have to sort of break down those silos and allow data and collaboration to flow laterally across the organization, as well as there's all the learning and new skills in terms of how to treat, how to manage it, keep it safe. And also how to have a proper understanding of how much data can be believed to be sort of appropriate use, its appropriate use. So we focus not just on technology, but we also focus on best practices and education. And you'll see that as we go through the projects. So I'm going to start with our newest project, which is called OpenDS4, which is actually, if you spell it out, it's open data science for all. And this is actually a project focused entirely on education. So the leaders of this project are Anna and Andre, and they are providing a lot of energy and leadership to bring this project to a very mature state very rapidly. And what they're aiming to do is to create material that universities, colleges, organizations all over the world can use to build their own data science curriculum. So it's all the building blocks that the educators need to customize their own environment. Everything they use is open source, making it easy for people all over the world to take advantage of this technology. So the charts that I'm showing you have come from them, so they take credit for the content. But what they have here is an illustration of actually the demand for data science skills is far exceeding the ability of universities, colleges, et cetera, to actually deliver them. And for a lot of places, they actually don't have people with the skills needed to build the curriculum. So the aim here is to support the educators as they build these programs. And when we talk about the educators, it's not just universities, it's actually a wide range of organizations that can take advantage of the material that's being created. And the delivery could be face-to-face or in today's world in a digital manner as well. So I said earlier that this is all open source and that's actually very important. So the programming language used is Python, it's developed in Jupyter Notebooks and they use the data it's provided. There are presentations, actually the non-open source pieces that the presentations are in PowerPoint slides. However, they're all shared as part of this project. The initial versions were built by professors at the University of Pennsylvania. So they have a very good initial heritage and the initial version of it has been available since February this year. So the material is very up to date, created by experts. And there is a very lively team of people from some very knowledgeable organizations that are working on the Technical Steering Committee for Open Data Science Rule. So I talked about the ODPI's Technical Steering Committee looking after all of the projects of the ODPI. And the Data Science Rule Technical Steering Committee experienced people in data science who are overseeing all the contributions coming into the Data Science Rule. So this material is being very well groomed and validated before it's made available. So I'm going to show you some charts from GitHub. And GitHub is our, across the ODPI, GitHub is our key delivery mechanism. It's the way that we collaborate and share information and provide information about any calls that we're doing and other types of projects where we're looking for help. So this is the main page for Open Data Science for All, which gives a view of the goals of the project, who the audience is, hints and tips on how to use it. And here you can see all the different types of modules that have been provided today and details of the content. And then if you go drill down into GitHub itself, what you'll see are the modules that are available. And you can drill down again and you'll see the content for each of the modules, which as I say, it's available for you to download, use, change, do whatever you like. And hopefully if you've got some new insights and some new content, maybe even a new module, then you can use the GitHub processes. You can create an issue describing what type of contribution you'd like to make and then provide the material through the pull request mechanism. And the Technical Steering Committee for Data Science for All will have a look at it. And probably have a conversation with you and hopefully bring it into the Data Science project. So that's Open Data Science for All. And at the moment they are still looking to expand the teams, looking for people who are interested in creating new content. And people who actually want to be part of the team that validate new content coming in from different contributors. And they're also looking for contacts with organizations that are interested in becoming adopters and providing a bit of advertising for on their use. Maybe provide it at blog post, connect with the team, talk about the work that they're doing and using the project to help spread knowledge that this very valuable resource is available for anyone to use. And so if you have any suggestions, content, volunteers, please contact either Andre or Anna. You can connect through to the team on GitHub. And as I say, it's a very exciting project, a new project and something that has the potential to offer a lot of value to the world as we need to grow our data science skills across the board. So any questions, let me just double check whether there are any questions for me around the around Data Science for All. Let me just give a little pause for that. What I'm going to do is go through each of the projects and you can ask me questions on either sort of towards the end of that project or while it's going while I'm talking about it. And I will also have time at the end to talk about it to give you a further opportunity to ask anything else that you're interested in. Okay, so let's move on to our second project, which is we started with the youngest. Oh wait a minute. So we've got what level is this target at? I would say undergraduate symbol so it's really, it's designed to take someone with little to no skill in data science, obviously educated to a fairly reasonable level through to actually being reasonably knowledgeable. So that was the question. So the question was what level is this targeted at the material targeted at. Okay, so let me go on a little bit to our second project. And as I said, this is called BI and AI. So the open Data Science for All is the youngest project and BI and AI is the oldest project. And it actually came from the, it was came from as it was originally a special interest group in the ODPI when ODPI was very focused on Hadoop and the big data space. And Cooper who leads this project has always been very interested in how business intelligence and artificial intelligence come together, particularly in a world of big data. So if you think about most business intelligence platforms they use for reporting in large organizations, and they have at their doorstep a huge range of data sources that have been specially prepared. To support the reporting process. So that's a very valuable source of data that could be used for AI. There are also very mature user interfaces displays of the data processing capabilities in business intelligence platforms. So how would AI fit into that into that environment. So that's really the goal of the project, but also to help BI vendors extend into the new world of AI taking advantage of their heritage. So the team has been going a fair while now, as I said, because they're one of the original. So there's been some very interesting collaborative pieces of work done between the different BI vendors on their approaches to bringing AI into that type of, in that type of world. And this next project that they're working on. So they tend to do sort of a project a year. The current project is actually very exciting because this is about defining a standard interface for a bridge to allow AI models to be plugged into a BI platform. So the first phase that they're working on now is around the specification for this bridge between AI and BI. And so the key vendors for BI platforms are sitting working together to say what should this interface look like. And the second phase is will be actually just to build a reference implementation of that bridge that the vendors can each use to demonstrate the bridge operating with their platform. And so the first phase we're looking for people with architectural or knowledge of a particular BI platform. In the second phase, we will be looking to add developers into this environment so because then we can actually build the code for the bridge itself. So as I say, a very interesting platform project around how we get bring AI into a BI platform and also make use of the data that's associated with business intelligence in order to create new insights through the use of AI models. Okay, so B is a really good question here. So what is BI? What an acronym for? So BI stands for business intelligence. It's a marketing term. So these platforms are typically the product that is used to create reports for an enterprise. So it will display charts and dashboards and things like that. So that's what it means. And most of them have now been extended to support analytics. So not just building a view over the data but also to analyze it to do simple analytics like predictive analytics based on that data. And now there's further interest in expanding that to support artificial intelligence as well. So good question. Okay, any more questions around the BI and AI world? So again, we've had one project that's very focused on providing education resources to organizations around the world. And this one is about taking something that is very powerful and valuable to many organizations, the BI platform, and bringing new capability to it in terms of bringing artificial intelligence into that environment. Okay, so I am going to transition to the third project. And then this project has been going for about two years. Oh, we've got a... So we're talking about the next question is, are there any specific open source AI models or communities that the BI and AI project is working with the Linux foundation or whatever? So at the moment, I don't believe there are. I think there has been a sort of call for participation and most of the people involved in it are people who work directly for vendors. So if there are suggestions or people listening today that would say, actually, I would really... It would be great to get my AI project, my AI modeling tool into this discussion. And I'd like to be a part of the conversation. Then please again, contact us and we would be delighted to hear from you because the broader the perspective that we have on these projects and the better the results. So again, another great question. And as I say, volunteers are always welcome at the ODPI. Just give people a minute or so to make sure I'm not missing any questions. So I think I've got them all. Okay, very good. Right, so I'm going to talk about the third project, which is called ODPI Algeria. And this project is two years old. It is a combination of a code project and an education project or best practices project. And it's two years old. However, before it was started, there were a couple of years of talking to different organizations about why they found becoming data driven so difficult. And one of the topics that came out in many organizations was that every tool seems to support the ability to store information about the data that they have to build what's called a data catalog. And to add extra information to that, maybe it's the profile of the data, maybe it's the structure of the data, maybe it's the classification of the data. But each tool repository or catalog or metadata repository is completely proprietary and locked down. So if you buy a tool suite from another vendor, you're starting again, gathering that information. And many of them are trying to build bridges between them to allow that exchange of metadata. And it's proved quite brittle and very expensive for many organizations. And so the result was that different professions in their organization or different business units within their organization were operating in isolated silos. And knowledge wasn't being shared. I talked earlier that a data driven organization needs knowledge and data to flow later in between the different silos. And the tools were actually reinforcing the silos because of the fact that they use these private metadata repositories. The other thing that was happening was that regulations were becoming much more widespread around an organization's use of data. And we're looking at it from an external perspective. And so I'm going to use the European Union's General Data Protection Regulation as an example or GDPR. Now, this particular regulation talks about the fact that an individual has rights over their data. So even though an organization may have collected data about that individual and this is paying for storage and processing and keeping up to date, the individual themselves still has some rights. And it also has provision for those rights, including the fact that the organization has to keep that data protected, can only use it for the purposes that that individual allow and also will delete it if the individual asks. Now, these sound very data oriented, but actually when you start looking at how you implement that in an organization, you need to make sure that the infrastructure is secure, that the security processes around it are good, and that there's a good knowledge of where the data is located and how it's being used by the organization. And when an organization is very siloed, not just in the way their data is stored, but also in the way that they might have a team at doing security, they might have a team managing infrastructure, they might have a team doing data governance, they might have another team doing privacy. And all of these teams have their own tools and they need to come together. So not only do we need to bring together tools relating to that describe and work with data, we also need to bring together the tools that different professionals, different governance teams use so that they can create a coordinated response to these more modern data oriented regulations. And there was a lot of discussion, well, there are many metadata standards, why doesn't everybody use them? And these metadata standards are very good, but they are limited in scope and they don't cover the whole space that needs to be managed. And so this is where the idea for Nigeria was born. We originally started working with the Apache Atlas open source team because we thought that we would use Apache Atlas as the basis for an open ecosystem. But actually it very quickly became way too big to be a sort of add-on to Apache Atlas. It really needed to be a project in its own right. And so the ODPI at that time was looking at for a new role and Nigeria was looking for a new home or the open metadata capability was looking for a new home. And that's really where Nigeria was born. And it's as a two years old and a very active and interesting project in terms of how it's starting to enable this metadata exchange ecosystem. Because although different vendors have tried to create open APIs on their technologies other vendors of course are reluctant to integrate with it. So what we need is a sort of a neutral space where and APIs and protocols where vendors can feel safe in integrating that they will still be able to deliver the value that they've created in their products but also release the knowledge that is coming from their users to allow other decisions to be made. But also to receive the metadata from other tools. So when a new tool is coming into the organization it comes in empty. We connect it to the open ecosystem and that tool now knows about the data of the organization and can start delivering value. So that's really the idea of Nigeria. And I'll take you through some of the functions and content that has been created for it. So if we sort of said what is the elevator pitch for what Nigeria is doing it's very simple. It's creating protocols and APIs to allow tools from different vendors from different parts of the life cycle for different professions to exchange metadata. And we always draw it as this bar because we can't have a sort of everybody integrates with a central repository. It has to be a peer-to-peer environment. It can let each tool operate with its own repository in the way it's always done. But then we need to augment that existing function with those white arrows that allow this peer-to-peer. It's got to be fair but otherwise vendors won't play in. It's got to be functional. It has to ensure the integrity of this metadata. So we can't allow a rogue tool to disrupt and create incorrect metadata in this ecosystem because as you start to look at the more advanced uses of metadata it tends to be, it can be used for security, for identifying where sensitive data is, sorting out data breaches, responding to regulators. So this metadata is very important to an organization and needs to be treated as valuable data effectively. So I've talked about its peer-to-peer multi-vendor and it also needs to be very comprehensive. So this next picture, it's got a sort of patchwork quote look to it but you can also see that there are lots of connections between lots of different types of data. So what we did as a very early exercise was to look at all of the, any hundreds, almost thousands of metadata standards and they're good. I mean, there's nothing wrong with these metadata standards but they each cover a particular very specialized topic and what we've done is we've knitted them all together, mapped them and they're also looked at many use cases from different regulations from different organizations. And so what linkage do we need to establish between these different concepts that are described in the different metadata standards? And the result is nearly 500 different types of metadata that need to be supported by an organization and this number is growing. It's growing with about 400. Two years ago I said, oh it's 400 types, now it's 500 as our use cases expand. And they are what we call entities. So these are facts, nodes, collections of concepts, objects, maybe you call them and relationships between them. So this particular data has this policy applied to it or this meaning attached to it or this classification attached to it. So we have this notion of things and relationships between things and we also have a special thing called a classification which is like a label that says this thing is like this group. So we can talk about things that are data that is sensitive, for example. So sensitive is a classification. And then so we have these three basic ideas and then the type system is built to say there's the idea of a data set. That's an entity, it has these types of properties associated with it and as a data set it can be connected to the following things. So it can be connected to a glossary term which is also an entity and there's a special type of relationship between them called semantic assignment. So that's how the very basic thing. So this gives us a language that the tools can integrate with. So basically if you're a database management system you say, well, I know about databases, okay, there's a thing called a database and I have tables and columns and then there's something in the metadata model that talks about tables and columns and how they relate together. So that starts just to say we have different database vendors saying, oh okay, that's how I fit my metadata in my database fits in. And so that mapping as each vendor maps to the same model, we start to understand the correspondence between the different technologies and the data that they store. So that's really what the base model supports and so we can start to understand how consistent and how common actually metadata is across the thing. But now we need to connect the tools together. And today's world is even more complicated and is getting increasingly complicated. So not only do organizations have many different data centers, but they're also using multiple cloud vendors. So their data is not just inside the enterprise walls, they're company walls, but it's also being hosted by third parties. Applications are not just sitting inside sort of large computers and data centers, whether on the cloud or on premises, but they're also out in the devices we use every day in our mobile phones, restoring data, there are many different types of sensors and IoT, Internet of Things, devices and systems that are also storing data. And so we're looking at places where data is stored, where metadata is being kept, that is becoming highly distributed and also highly variable in terms of the platforms that it's running on and the number of vendors that are involved. So what Adgeria needs to be able to do is to take, if you imagine the metadata that's being stored in all these systems is the orange cylinders and Adgeria is connecting these together through the dotted lines. And it has to follow the topology of the deployment for a particular organization and bring that together in an effective way to give the impression that there is a virtual metadata repository and say although we don't centralize metadata. The other thing that organizations are doing is they're also talking to... So I've got another question which is excellent. So I'm going to respond actually to these questions because they're very good. So the first question is for the metadata repository, AWS Glue is widely used. Does Adgeria work or is compatible with that? So it is perfectly possible to build a connector. So the way Adgeria works is that third-party technologies plug into Adgeria. So Adgeria has lots of plug points for different types of technologies and the thing you plug in is called a connector. And so there are connectors for metadata repositories and you could build a connector for AWS Glue. And that could be provided either by the Adgeria project team if they felt that that was valuable or provided by AWS or third-party provider. The interfaces are very clear and the connection and the sort of plug-in nature of it is handled by configuration. So it's possible to connect in lots of different types of technology through our different interfaces. And a little bit later on I will start talking about those different choices as well. So that was one question, it's a good question. And the next one I've got here is does ODPI have tools for exchanging metadata between blockchain platforms or perhaps are they working on such tools? So again, the question is around the plug-in nature. So there is no reason why you can't build connectors between any type of third-party technology. What we typically try and do is to get... We focus very much on minimizing the amount of code that's required to connect in any technology. So rather than having one very generic plug-point that everybody codes to, we are creating an increasing number of specialized plug-points that there may be one that's very focused towards ETL tools another that's very focused towards database engines another that's very focused towards BI or reporting tools so that the mapping that's required for a particular vendor or third-party implementation is as little as possible. So I'm not sure in terms of what... When we talk about changing between blockchain and chain platforms whether we're looking at an application style integration where that's using multiple blockchain platforms or whether we're actually looking at a low-level integration between blockchain platforms I would suggest that those are two different types of connectors but either are possible and whether we simplify that process through a new specialized connector is really a question of discussion rather than a problem with the interface. So the simple answer is yes. The complicated answer is exactly how we would do that integration to give the maximum value to everybody. So I have a question here which says any suggestions for the resources for further research? I'm not sure which one that is for so if you could just expand that question as to whether you're asking for suggestions for research projects that we would love you to do or something different, just clarify that question and I'll come back to it in a little while. Okay, so I'm going to go back into more discussions about how Agiria works and the picture that you've got on the screen here is showing all these dotted orange lines between the different repositories and actually what I was about to talk about was that organizations increasingly when they're working with data don't work with just themselves they also work with business partners and the business partner might not want to connect up their tools and their metadata with their partner with yourselves but what we can do is pass, export, metadata about a particular data asset in a standard format that can then be sent with the data and then can be loaded in when the data comes in. So imagine a pharmaceutical company receiving information from a particular hospital the metadata there would have all the terms and conditions about how that pharmaceutical company is allowed to work with that data so that's also supported by Agiria by providing that open format and what inspired us on that was actually the way that digital photographs work today you can take a digital photograph on any type of camera lots of metadata about the camera settings where you were and what date and time it was is incorporated into the photo and then you can use many different vendors photo albums and library type tools and that metadata is readable because it's both open in format and included with the photo so that's what we're trying to accomplish with business data in this particular function so let me have a little look at the questions to see if I've got any more to go let me continue on and I'm going to talk a little bit about the infrastructure underneath and make sure that and how that works so this high level picture is actually from our website which you'll see on GitHub and it tries to describe what it looks like when Agiria is deployed so you imagine the green clouds are all the different places where metadata and tools, platforms and information about an organization's IT data people activity is located the blue boxes are what we call the Agiria open metadata and governance platform or omag platform or omag server platform you'll see it's proper name and that basically you put one of those in each place and it is able to host what we call servers or omag servers I think I've called them Agiria servers here and each server hosts a connector to a tool and depending on the type of connector you want you use a different type of server and the server type has code that we provide that does all the blue arrows that connects it all together and make sure that that particular tool connector is notified when something interesting for that particular tool type is available and also make sure that the information that's coming from the tool is properly distributed to everyone else so all the orange dashes in the previous section is handled under the covers by Agiria and I'm just checking the questions again I think we need to give a bit more time for some more questions so let's look a little bit more detail about these orange circles here the servers or omag servers we'll look up on our website to find out a little bit more and these pictures are all from Github and our web pages so if you want to learn more you can go and look at the information and see more so orange circles which are as I say the omag servers open metadata and governance servers is their full name and here you can see the hierarchy generically the open circles are called omag servers and there's three main types cohort members are for doing peer-to-peer exchange of metadata view servers provide REST APIs for user interfaces including our own and we keep that in a separate type of server so that it can be separated by firewall from the metadata itself so it's just a useful architectural separation that helps people when they're deploying an area to make sure the security is in place and then we have another type of server which is a governance server and this helps us provide additional value on top of the metadata so if you think of the cohort members like the messengers for passing metadata around the governance servers are for adding value making more use of metadata whether that's distributing it to third party tools or actively receiving metadata from third party tools that have no metadata integration or providing automated metadata discovery or stewardship services that's the things the governance servers do always think of those as the value add on top and then for each of these broader types we have specific subtypes that do specific jobs so when we look at the cohort members we provide something called a metadata access point and this provides a whole series of specialist interfaces for different types of tools and so this allows you to integrate directly so there's one called Data Engine Omas that has specialist APIs for ETL engines another one called Data Manager which has specialist interfaces for databases file systems, file managers, content managers etc and the list goes on there's 25 different types of interfaces for different types of tools and then we have a metadata server which also can be an access point but it also provides storage and that's very important because if you have one tool that for example supports the ability to define ontologies and glossaries to describe datasets but neither supports the other metadata if you want to create relationships between them we need a place to store those relationships and so that's the job of the metadata server here the agirio metadata server is fulfilling gaps so if you imagine crazy paving where you've got lots of different pieces of stone with different shapes and all of them together I feel that the agirio metadata server is the concrete and the slabs of all different shapes are all the different third party technologies so that's the access point of the metadata server the repository proxy is a host for connectors for metadata servers and you'll see that in a minute and then the conformance test server is a server we can connect into a cohort creating that one of the members is behaving correctly and it will test the consistency through the API through its events and pass it valid and invalid bits of metadata so we use that in environments where a vendor wants to validate that their integration into agirio is safe and that's what the test server does and we have a conformance program that sits with it so vendors can display that they have a safe integration into the open metadata ecosystem so that's the cohort members talked about the view server and then the different types of governance servers are the integration daemond is for exchange metadata between third party tools that don't integrate directly with agirio you put discovery server hosts discovery engines for doing automatic metadata discovery so they could do quality rules profiling of data sources it could be actually running different types of analytics on data sets and creating results so that's the discovery server and the discovery server might be doing something like deduplication of metadata so you've got two tools both loaded in information about the same database and now we've got two copies of the same asset so the discovery server can do deduplication and identify where it thinks we have multiple copies of the same asset and then the stewardship server is for running remediation allowing triage of issues raised by the discovery server or raised by any other system in the ecosystem so it's really where the humans can interact and make changes to the open ecosystem the security officer server is for configuring complex security requirements around the use of particular data based on metadata values and the open lineage server provides a historical view of lineage from many different technologies so we have as you can see a very wide variety of servers that are in various stages of development some are shipped and are active and ready for production use and others are in development and that's the interesting thing about this project is that we do everything in the open through github so that people can be watching and as new interfaces come out they can comment on it before it's fixed and we find that this is a very valuable way of enabling the community and our consumers to guide us to make sure that we create the most valuable interfaces and events and other types of integrations as possible a little look at the questions I think that's all fairly static going so there's a sort of a standard way that these different types of servers connect together and I've shown them as a series of rings so in the centre is what the cohort members we talked about those and they talk they have two main communication mechanisms the cohort is a dynamically organised and configured environment so basically a new server comes in and puts a registration document on the central topic and says hello I'm here, I want to join and the other members look at that information as long as there's no problems with it they then reconfigure themselves to have a new member and send information about themselves to the new member so now we say we had two and now we've got three members are configured to talk and it means that when they issue a federated query metadata from all of the members is included and if at a later stage a repository leaves the cohort then they send out an un-registration request and all the other members are reconfigure to remove that particular server so it means that there's no central control beyond providing a shared topic in something like a Batchy Kafka to provide that mechanisms so the topic can also be used for a background notification mechanism to say that various pieces of metadata in each of the servers has been changed as well so repositories are able to cache metadata from other repositories through the cohort mechanism so that's the heart of the environment and as I say gives you both gives you access to all of the metadata in all the members from any point then we've got the governance servers, I talked to you about the three different four different, five different types I think it was integration daemon, the discovery server, stewardship server open lineage server and the security officer server they sit in the next layer out and this is providing that extra value add so in the middle here we're just exchanging metadata, the integrated governance we're trying to make additional value from that and then the final thing, the view servers and the UIs and things like that are adding the sort of governance solution, the support for business users doing governance things through the interfaces so that's how we sort of build out an environment and if you were planning to have this in your own organization you would probably look at the different tools and things you have deployed and think about which ones you want to be integrated and then based on their type you would look at the different interfaces for that particular type and maybe create connectors to create that integration and then depending on their type they would plug into one or the other of the servers types and then configure them as you configure them they would basically follow this pattern in its process and there are a number of things like we always say make sure you have a metadata server in your cohort so that any additional metadata has a place to go because often third party metadata servers only support the metadata that they can work with which is quite reasonable, look at the questions that's looking okay and let's move on so I mentioned earlier on that there are a huge number of different deployment platforms and scales that we need to support and so the idea of this omag server platform and the servers that sit on top it has enabled us to be very flexible in our deployment, we have also quite strict coding standards around bringing in dependent libraries to keep our footprint at a point where we can run it on a Raspberry Pi but we can also do horizontally scale deployments using Kubernetes to support very high workloads and that's because of the fact that right from the start Nigeria was architected to be this integration platform that has to go everywhere, we also divide our technology into three layers and this is very similar to the circular piece but at our heart we are a developer platform so we provide libraries and things that vendors can take and other third party technologies can take and embed in their products to give them open metadata capabilities and there's also at the base is a connector framework and that provides the plug-in architecture so all of our plug-ins are done through connectors that follow a consistent framework and then depending on the type of connector there's a specialist interface that augments the core connector interface that allows the connector to behave in a certain way or support a particular type of technology so that's the developer platform and we could have stopped there but the integration platform is saying well it would be really helpful to have a library of pre-built connectors and also a user interface to allow people who are running Nigeria to really monitor the entire integrated ecosystem end to end and to make sure that it's operating functionally and then also utilities to allow different types of files in a particular format so we might have a design model in the JSON-LD format we might want to load that into the ecosystem so it can be distributed that's really what the integration platform is doing it's saying here is something that you can use out of the box to start to integrate your metadata in your organization and then on top of that we're building governance solutions which are things that organizations in their governance journey or looking to fill a gap in what they're currently deployed could take a area and use that as their core metadata in governance function and so that's really where the governance solutions come in and we basically started at the bottom so let me show you that in a bit more detail on the next chart so here you can see more of a sort of traditional block diagram but it's to say the bottom piece is that developer platform so you can see all the different libraries and you can see the services that are plugged into the different types of servers that I've talked about before and in this corner here you see content, open metadata types that's the 500 types and the type system is completely dynamic so you can define your own types but by sticking with the ones that are defined by the Agiria project you increase the chances that your metadata will be shareable across a wider range of technology and we also use those types to build our solutions on top so again using the open types means that there will be further capability that will exploit that metadata so as I say right at the bottom here you can see the connector framework which is our plugin, then the discovery framework allows you to write discovery services that do this automatic metadata discovery the governance action allows you to create plugins for different types of remediation and actions that you need to take around issues in the open metadata space or even actually in the data space and then the audit log framework allows monitoring system to be plugged in and the operation of the Agiria ecosystem can be monitored or automated as necessary because we believe that automation is key as humans we're really not very good at admin and so the more that the Agiria ecosystem can be automated, can be self managing self configuring and self healing then the more accurate and valuable metadata will be to the organization so that was the bottom thing then we talked about the next level which is that integration platform and so there's a UI that is really designed for people running Agiria to understand that integrated platform the utilities allow us to bring in third party content and sort of files things coming from standards bodies and stuff and then the connectors are pre built connectors to different third party technologies and if you're looking thinking this is a great project and you'd love to contribute this is an area where you could provide tremendous value to the team in terms of connecting in your favorite technology into this ecosystem and providing us with those connectors for other people to use okay so that's the integration platform and then the governance solutions if you're a governance person you would start to recognize these names if you're not then don't worry too much about them but asset ownership and management this is around how you define who owns a particular data set who is responsible for classifying where the sensitive data is located who gives access to that asset all of those types of responsibilities that are around only something of value and the duplicate asset management is the support to work out when you've got copies duplicate copies from different tools of the same thing secure database sandboxes uses metadata to create a secure access to data in say a corporate data lake and create safe copies of that data for data scientists to work with so that might have certain pieces of sensitive data masked out etc and that will be done based on metadata the historical lineage is a sort of UI to allow people to look at where things were what was the lineage of a particular a piece of data so lineages what was the set of processes that provided that data and you can start to ask questions about the time so it was working two weeks ago and it's not working now what was the lineage two weeks ago what has changed in those two weeks that's really what this is all about is around capturing knowledge of a particular subject area so a subject area could be customer, employee it could be energy generation it's a topic that people know about and there's terminology in that topic there are rules in that topic common data structures and standards in that topic so a subject area management is about how you manage those definitions that describe everything that should be common around data APIs, systems that work in that area and then governance program management is around is for governance teams to define their policies and rules and the different types of management options and it's where you can define security classifications and manage terms and conditions around particular data and APIs so that's the structure and one piece builds on another and since this is an open source project obviously not everything is finished and we're actually pretty proud of that it's done visibly everything is done in the open so you will see some modules that are clearly still being worked on and there are other modules that have a lot of maturity in them and so you'll see here we have these frameworks and the large part of the developer platform is actually production ready and in a very strong state and then work is going on most of our focus here is actually building at the integration layer that's really what we're looking at and the view services are around UIs and the UIs really sit up at this top level so they're a little less mature than you'll see in that repository services and access services so over time you'll see as we build out new function and as I say this is the area where we are delighted to get help and so I've got a question here about whether we're having other projects for the ODPI so at the moment we've had a number of different projects different people with projects approaches but nothing but we haven't got anybody who's sort of in the process of working at the moment so we are very interested in new projects and particularly if there's one or two companies that would like to work together in an open source way this is really what the ODPI does very well is that sort of open and safe environment for competitors or people who work in slightly different industries and not used to working together to come together and build something that is not possible in an individual organizational environment so there is we have three stages for a project there is the incubation stage and this is where the project is coming together they're forming the team, the governance processes the team is going to use are being defined and then once the project looks like the people involved are happy to get going and they're happy to bring in new members then we go into the active state and that means the project is running and it's active and then finally we have an emeritus state when a project is running and nobody is interested in it anymore and it's just being shut down so yes we have a three stage process for projects going forward it's good, thank you okay let's have a little look at some of the other things that we've been doing so one of the things that we've discovered as we go through this project is that we're breaking all the rules, we're ignoring the boundaries that have been set through the way that software product markets work the way that organizations are organized and by software and that creates the silos in organizations and the silos between different professions create silos in the software product market that meant that if you imagine two different groups they buy two different tools their knowledge is segregated and we put Nigeria in the middle and suddenly that that metadata is being shared and so it's possible that some of these assets that are being shared now are actually highly valuable or very sensitive and so we need to think about how we scope what people can see how we secure the metadata so we are enabling it to be shared but we also need to be responsible and say and provide very accurate and fine-grained control over who can see what we need to think about how do we create this environment so it can be self-configuring, self-managing and whatever so in many respects we are doing a lot of research type work and solving problems for the very first time because we've never been in a situation where we've brought together metadata from so many different places and so and even so even around the visualization there's been some very interesting work done because the internal model I talked about these entities, relationships and classifications you can think about this, we represent metadata logically as a massive virtual distributed graph so how can we take advantage of the fact that we have that and so these screen captures come from a new user interface we have called the repository explorer and it allows you to step through the entities and the relationships irrespective of the projects the products or servers that they're actually coming from and build a view of a certain piece of knowledge and so this has been a very interesting research project in terms of how should a person interact with a graph so it's not just a question of doing a query and throwing a graph on the screen you want to be able to allow people to choose how the graph is expanded and select different types of nodes and so thus this is the result of that work is shown in our repository explorer and we have another tool called the type explorer which is aware of actually looking at those 500 types and being able to look at types with respect to an entity or types with respect to a relationship and again that was a very interesting exploration of different ways of visualising graph data so we do that type of work as well as the very middleware focused data integration type work that's going on in the cohort and the other integration environments that's going on as with all of our projects GitHub is where everything is located so if you go to GitHub repositories we've got four or five repositories but they all start in Algeria and you can see again how vastly how vibrant and how fast-paced this project is but also get involved in the different pieces of work associated with GitHub we have a Slack channel as well which will allow you to ask questions and we have weekly calls too so I've got a question here which is an interesting one for this conference it says does IBM provide behavioural data e.g. banking for data industrial I don't really think I can answer that question I think you need to ask IBM that that question as that's not really something that we can answer in an open source environment so sorry let's go back onto this so as I was saying we don't have anywhere else we just have GitHub and the Slack channel and the calls that you can join and be a part of doing on time right so I mentioned the fact that we have the technology and also that there is an educational aspect to this and so one of our GitHub repositories called Data Governance has been used to develop out a series of best practices so we've got a set of personas that are all characters from a fictitious company which is a pharmaceutical company that is going through a massive transformation they're going from creating drugs that are electrically usable to personalise medicine and that means from a data point of view rather than releasing a new product every six months or to a year they're having to have a very tight integration from research, sales, manufacturing on an individual basis so the whole of their IT ecosystem infrastructure needs to be changed the way they operate they need to work with business partners digitally rather than in a much slower sort of person to person type way and so we're using this as an example to show how integrating metadata how good governance practices can help a business do this type of business transformation and all of this is documented in what we call governance, the GOG and we then take those best practices and link them down to and this is how you do it in Nigeria so this is basically coming to bringing practitioners and somebody who's suddenly been told oh by the way you're the Chief Data Officer it gives them a place to go it will help them think through the questions and types of issues that they need to tackle in their own organisation and then hopefully give practical advice on how they would take that forward so that's basically trying to help practitioners to practical solutions as well as providing support to vendors to allow them to plug into this ecosystem so that their tools give the maximum value to their customers and have a much lower cost of ownership and a faster time to value because of the fact that they're able to plug into the organisation's knowledge base from day one so that's Nigeria I have a little picture here that just sort of shows what we're trying to do with Nigeria I've spent most of the time on this one because it is the biggest project and it's and there's a lot to it but the other two projects are still important to the ODPI and we're hoping that the data science aspect from Open Data Science for Nigeria will probably provide a module on governance and management to the Data Science piece and as we enable our Data Science APIs then hopefully they will tie back into the education material similarly with the BI and AI you can see the opportunity for integration between those projects as the BI and AI come up with their bridge definitions so we see ourselves as part of a collective family which working together and that's what the ODPI gives us but also each project has a very focused mission and set of consumers that they're trying to serve as well so I think we are at the end so I have a chart here for questions and so we've had some great questions during the session let me have a look see if I've missed anything in the questions here I can't see anything so if anybody's got any more questions I am very happy to answer them let's give people a minute or two and I didn't put up any links to show today but if you search for ODPI Nigeria you will certainly get to our websites since that's a pretty open DS for all that will bring you to the ODPI ODPI on its own will probably bring you to our main homepage as well and then you can navigate to the different projects of interest so there's a question here about how do we fund all of this so the ODPI is a foundation in its own right and organisations that value our work can become members there are different levels of membership that an organisation can choose to join at there's a premier level for people who want a board place that will allow you to input in how we spend our money then there are other levels and so really the value of membership is that you get to influence you get to support the work that we're doing and just basically have your logo on our website to say publicly that these are valuable projects that you're supporting so we have our funding comes from our membership fees but you don't need to be a member to take part in the projects to make use of the technology all of that is free and open so there's a question what is the online reference for examples of use cases so that is in the get hub repository under the ODPI so if you go to get hub ODPI and you'll see the use cases there we are also improving our documentation around different solution examples in the main area of the repository as well so it's always an ongoing development in that we're not only improving the code but we're improving our documentation all the time so I think I'll cover that one I think yes I can't see any more questions right so if anybody would like to contact me you can find me on LinkedIn my name is Mandy Cheswell and that's a unique name so you should be able to find me and my names at the front of the piece or you can contact me or any of the other people involved in the ODPI through the ODPI website or through get hub and our Slack channels alright I think we are now to questions one more check that looks like it so I think we're finished for today so thank you all so much for both listening and also providing me with an excellent set of questions and I wish you all thank you very much