 Good morning. Good afternoon. Good evening wherever you are. So this is a discussion or presentation about the OSDU platform architecture. This was originally done by Steven on May 18, but seems like due to some scheduling issues, a lot of people could joining, could join. So I am repeating this on his request. Quick antitrust. We shall not discuss or exchange information related to the following companies proprietary or confidential information prices of products purchasing plans or any other company specific aspects. As I said, let's get into this. What we're going to do is we're going to establish a series of these talks to make sure that everybody in the community understands what's going on with the OSDU platform. We're going to move into R3 and subsequent revolutions that are new components and services that are coming online. So this may be a good way to bring everybody up to speed. Right. So this is sponsored by the EA and BMC. And the plan is that we will have this as a one hour session every two weeks in the same slot that we have right now. So the initial one is sort of a gentle overview of the architecture platform. We also have another one coming up two weeks from now to talk about ingestion and a generic ingestion framework. We've been working on and we will try to schedule more presenters across the different development organizations to make sure that, you know, we can build this as a strong community. The link there on the screen. I've also put that into the chat window so that's where we will keep track of the slides and materials that we are presenting here. We've been talking to Judy and Dennis to get this thing recorded. So we will have this on YouTube as well. So, you know, for those of you who have missed it or you want to share it with your peers, colleagues, whatever, you will be able to do that. Another thing to note is this is a presentation session where the development team comes and tells you about what they are implementing. This is not a discussion session. Right. So this is not where we're going to, you know, architect on the fly, but more to understand the current status and plans and what the designs and that said, this one is a kickoff for the series. And so what I wanted to do today was to give you a functional architecture of the overall data platform. So it's not going to get into the details of any of the individual services or implementation, but gives you a broad spectrum. And keep in mind that this is really just a functional portion of it. There are also things that are on, you know, validation, certification documentation operation security, etc. So these are things that we will schedule less subsequent sessions. And a lot of these are also in the R3 plates. So, you know, we were really developer ready with R2. And some of these features are not as mature today to, you know, get into much more details there. So then let's introduce the OSD data platforms. Some of you may have seen the slides from the virtual face to face meetings as well. So this basically shows you the data flow and the different aspects that we need to think about. The flow goes from sort of left to right, if you will, the security and operations or more of the cross cutting things that serves the whole platform. Anytime we bring in data from any external source be to file a database and national data center vendor data source, whatever the case may be. We need to make sure that there is governments that's put into the data. This means tagging the data making sure that this is compliant for retention, it's compliant for access and delivery. So that is sort of a precursor stage to what we would call ingest. So once the data lands up in the cloud, how do we then parse this, how do we understand the format of the data and extract the metadata that is necessary for discovery. So that's the next phase, which is using that metadata to provide rich capabilities of discovering the data that is there with them. And one of the tenants that we've taken in this case is a scheme on breed principle. So what that means is, you know, we want to keep track of how the source data. And then enrich that within the platform to create whether this is a canonical model for discovery and access or a consumption model that may be more suited for the workflows that you see on top. So for example, the perspective for a piece of data that you may need in a field development workflow may be different than unconventional or production or something like that. So those are enabled by the phase that we're calling enrichment. So ingestion is about teasing apart the data extracting the structure but preserving it in a source form. And rich is the sequence of transformations that is necessary to bring the data into a consumption ready format. And across all of this, we need to think about, you know, authentication authentication user authentication of the calling app or the service authorization or data entitlements to make sure that only the right person receives data. So this goes hand in hand with the governments as well. So the tags that we've put in at the point when we ingest, we need to make sure that the point of access, we are verifying that as well. So if there are TCC, embargo, other legal considerations for access for a particular use case or an individual, those are enforced at the point of access. And last but not the least, all of this is sort of the functional portion of the architecture. How do we deploy this? How do we keep this environment refreshed? How do we monitor it? How do we secure this from an operations perspective? How do we think about service level indicators, service level assurances, SLAs? Those are all the aspects that come into play from an operations perspective. So the platform needs to provide the right level of capabilities to be able to hook in these operational tools and procedures into the platform. So that's roughly the high level view of this. What you see down below is basically the different domain optimized stores that these services would then operate on. And these are typically services that would come in from the underlying provider or the on-prem provider from an infrastructure perspective. Just to put that in the context of a data flow, what you're seeing here is the first step that we talked about, which is bringing in the data as this and tagging it. So that's the upload phase, if you will. So you're making the data ready for the cloud. And once the data is in the cloud repository, that's when the ingestion platform service takes over. And what that does is to parse this from the storage and it moves the file, if it was a file-based ingestion into a persistent file storage. If it's array data, then it may optimize the array data storage into specialized storage. And then the metadata that it extracted, it puts it into a data store. Think of it as a JSON document store for lack of data. So at this point, the data is now readable. And then what we do is we then make the data available in a discovery sense by publishing it into different types of indexes. What we have now is a search index that's based on Elastic. And what that allows you to do is to take the metadata that you've extracted into the data store and make that searchable and accessible for users. And then once you have that, that's sort of the basic of it. So effectively, at that point, you could directly consume the data within your target application or service. But if you needed to further enrich the data, improve its quality, curate the data, et cetera, those are all things that would be done as part of the enrichment service portion of the platform. And then of course on the consumption side, you may be thinking about additional caches for the data and so on. So that's what the optician store and the delivery services. So that's how roughly the data is being brought into the system and consume and any data that is then produced by an OSDU based application or service effectively follows that loop all the way back. So it goes back through the ingestion indexing enrichment and delivery steps. So that's the sort of 50,000 feet view. So let's look at a few core principles for the data platform that helps you understand the design before we get into the functional. On the data side of things, we put these into four categories. The first one is valuable data. So like I said, we want to minimize the friction on the ingestion, we want to preserve all of the data. We want to provide a schema and write principle and transform and use valuable data at the point of ingestion. So any data that's brought in is preserved in source schema. The data is immutable. The data needs to be secured. And we talked about this already, which is the entitlements and the compliance and the authentication of the user application. So the framework that the platform hangs together provides the right level of discovery, the consumer mobility of the data. So what that means is data needs to be globally identifiable. It needs to have enough attributes that makes it discoverable. And the framework needs to provide the support that makes it consumable in different personas, workflows, or domains respectively. So from a data management data context perspective, we want to establish and continuously improve the data and what that means is to be able to provide data quality capabilities on consumption. There's already some work that's being done in terms of quality tagging and so on. And in combination with the immutable principle of the data, any enriched data becomes new data that is then linked back into the source data. So effectively you have the lineage and you have the enriched data and you have an ability of going from the enriched data to the source data, if you were trying to understand the context in which the data was processed by the platform. So the data principles from a software or system design principles favor agility. What that means is, you know, we're not building a monolith here. These are microservices structures, which means continuous integration continuous deliveries is really key. With all three, you can find on the community. We key we've listed the number of projects and who's working on each project so we have autonomous teams by teams that are aligned on on the core principles of the platform. And of course, these teams are working with the standards team for the architecture or data definitions or information security to make sure that the platforms aligned with the security that's that's absolutely key. We're tapping into expertise of the community, the cloud service providers to make sure that whether this is data encryption applied data encryption at rest. The deployment of the services doing static code scans, protecting the, the API endpoints authentication authorization compliance. So all of those are concerns that we look at from a security perspective. And given that this is a an operational service we also need to think about this from a DevOps perspective. So as operators deploy these, how do we provide the hooks in the platform for logging for monitoring for operational support, when it releases established for tolerant pattern so you know you can you can bring in new services online with minimal interruptions. And also to improve the efficiency of the delivery team. So we've taken a few lightweight architecture decision records, or ladders, and what these are, or some principles that we would read upon with the PMC and the contributors and the committers within those teams to make sure that for example, you know, what are the frameworks, what are the frameworks, what are the design principles that we want to keep in mind. So, from a DevOps perspective, the environment that you're deploying into becomes more manageable and maintainable. So at least on the poly cloud side, you know, obviously we know that different operators that are consuming this or having different choices of public cloud or even in country type of departments. So, we do want to support this in a poly cloud manner, but at the same time we don't want to dilute it to a point that we always go for the lowest common denominator and cloud providers are not able to provide that differentiation. So this is going to be one of those things where we strike the balance between the lowest common denominator approach versus a common code platform approach in an elegant manner. And of course where we can we leverage other open source projects and any device as possible. So, given those data and system principles. Now let's go back to the same picture that we saw before and look at it from an API perspective for each of these services and see where we were with release to and what are some of the top things that we're working on So jumping into governance. First, what we've released and released to is a legal service that helps with compliance. It helps with the tagging of data. You know, things like country of origin other countries where the data may have been cost us these type of tags you can you can put on top of it. But it's it's still sort of limited to a few attributes, and the policy is actually evaluated by the legal service itself. The improvements that we want to do to the service in release three is to improve the flexibility of the tagging. So you can you can bring in additional attributes that may be relevant to the country that you're operating in or corporate policies. And what that also means is if we increase the flexibility of the tagging, then we should also bring the same thing when it comes to the enforcement. So which means we want to move from a policy coded in to more declarative policies and dynamic policy evaluation by the services. Where we are, we're still sort of finalizing the requirements. There's a couple of policy engines that we've been looking at. And so the next step is an endorsement from the team to get this, what will become an incubator project kickoff. Then let's move to ingestion. And what we've accomplished in these two is effectively the, the entire preparation phase is done outside the platform. And at the point when you ingest we are also assuming that the addition would transform the data into the OSD use schema if you will. So the data loading itself was done through a bunch of scripts and I mentioned it's limited to the OSD or two structures. And we did a few sort of manual enrichments, if you will, to make sure that the data that is brought in could affect the needs for some of the search workflows that were in place. So what we're doing with R3 is we are moving this into a more composable ingestion framework, something that can support a directed as a graph. So you can actually plug in parsers, you can plug in pre and post processing steps in your addition pipeline, and this could then open up to the community whether this is operators, SSIs, ISVs, whatever, to build additional parsers or additional ingestion data flows, if you will. The addition framework itself we have a prototype that is based on Apache aflo. The schema service was contributed by Schlumberger. And so both of them we have a starting point from an R2 perspective. So this is what we're going to bring forward into R3. And take it to the next step. We are also collaborating. This was last week to the definitions and the teams that are working on data loading and software development a lot of engineering itself. That's what's going on in ingestion, then moving on into discovery. Like I mentioned, we have an elastic based indexing. We have a number of functional improvements that we've done over R1. So, for example, you know, new schemas, how the data gets automatically indexed, all of that is completely streamlined. There were a few things that were impacting the usability. So for example, mapping the data types between the OSDU format and what elastic supports. And very specifically, you know, we have JSON documents that have arrays and arrays of objects, for example, or nested objects. How do we then do this from an elastic perspective, you know, flattening techniques that may be used. So one of the techniques that we can do in R3 is now that we have a data flow framework that supports ingestion and enrichment. We could think about optimizing the mapping for the index configuration as part of that framework. So this is something that we're collaborating with data definitions, the software development team and the elastic. In the long term, you know, there are other things that we can do with respect to, you know, today we're doing basic property search. But could we do semantic search or NLT or, you know, additional capabilities, those could be, you know, improvements after the basics are done as part of the R3 short term. And then we can move a little bit into enrichment. So, like I mentioned in R2, this was effectively done outside as part of the scripts. So when you broaden the data and you were bringing in data from, let's say, last file or CSV file or any other source for that matter. You had effectively generating a manifest that was compatible with the OSDU R2 format. And then you had the transformation and the enrichment of the data that was happening through scripts with release three. The same sort of data flow framework that we're using for ingestion should help us with creating a DAG directed as a good graph for the enrichment phase as well. And then you get triggered by the notification that comes from the storage service. So anytime that there is some new data that is brought in into storage. That would then trigger the enrichment and therefore your workflow will be able to transform the data into a form that is more conducive for discovery and consumption. We're also introducing things that are very relevant to data flow, such as frame of references, and the ability to transform data across frames of references so think units think coordinate reference systems. So if you're looking for data within a particular spatial region or well greater than 10,000 feet or something like that, but your data may be coming in with feet and feet US and meter and so on. What these frame of reference services would let you do is as part of enrichment to bring the data into a homogeneous frame of reference that provides you discovery or from a consumption standpoint to translate the data from one frame of reference to the other based on what your application or consumption workflow moving on into security with or to we now have open ID based authentication. So this is how we verify whether this is user or service account authentication into the system. The entitlements of the data authorization itself is done through a basic access control with specifically defined data groups and user groups and the actual basically is a intersection of you know which user groups have access to which data if you will. The services themselves support encryption so these are all TLS encryptions all the way from the client application or service to the OSD data platform so data is encrypted in flight. And based on the choice of technologies that we've used for storage the data is also encrypted at storage. For static analysis. This is from a SAS perspective. We are using some tools like code bugs. For example, spot bugs and and what that does is it as part of our build pipeline to check for not only the, you know, the free code but also look at it from a static code analysis from security standpoint. And of course the infrastructure itself is provided by the CSPs. And so what we've done is along with the CSPs we've looked at hardening the infrastructure to be at least developer ready for this to where we want to go with release free is to move from basic ACLs to declarative policies. So not only can we enforce security policies that way we can enforce compliance policies that way. We want to bring in additional tests from a security standpoint to the CI CD pipeline. So as the code evolves we are catching more of these issues early if you will. Because we are working with the infosec team to get those requirements and try to see what additional tests we may need to put in here. And to draw the line between, you know, what should the platform do from a code standpoint versus the responsibility of the managed service provider or the operator who's operating the system versus the infrastructure. So this is what we mean by a shared responsibility model. And that will translate to more crisper scope definition and specifically requirements for the security components of the system. And again, these are things that you can go to community that open group or go into platform under security. The current list of issues they have prioritized. Last but not the least operations. So certainly with R3 we want to move from a ready release of R2 to something that is department ready from release three perspective. So what that means is we need to think about CI and CD as two unique things. So the platform does the CI, the platform provides the basic scripts that's needed for CD, but the deployment itself is happening in different environments based on the different operators. So they should be able to hook in into their CD pipelines, the deployment scripts that are coming from the platform. They should be able to hook in, you know, whether this is monitoring tools, telemetry tools, operation support tools into the logs that are being pumped out from the platform. You're also looking at any out of the box tools that we can provide to help with telemetry, things like Prometheus, for example, right. You're also looking at GitOps. And this is a technique where, you know, very similar to how we manage code from a DevOps perspective to look at the infrastructure and the operations themselves as code. And therefore those are version and those are also put into Git and how a update into Git could then trigger a CD on the operations side of things. So here again there's a work stream that's going on in R3 to look at these type of requirements, disaster recovery, logging, performance, etc. And so we're waiting on additional input from that work stream to, you know, crisp up the requirements for R3. That's roughly the sort of high level things in terms of the services. Let's look at it from the data perspective. What we did in R2 for data is really it's covering the wells in the seismic effort as per the OSD data definitions team. And the storage itself is, you know, largely what I said before with a JSON store and an optimized file or a blob store that captures the rest of the data. The one place where we have more optimized access is with respect to seismic trace data where we're using the open VDS library to be able to provide optimized access to the seismic data. So what we are doing with release three is sort of expanding on that, formalizing this and setting up what is called a domain data management service, something that can provide type safe access for the different data types, something that can provide optimized access. So for example, if imagine you're writing an application and you wanted to show a cross section window where you're showing, you know, five different wells and three different curve types and different channels for a particular depth interval. You're able to express that as a query and go after the individual log curves, rather than have to parse, you know, last files or other original files that may be capturing the data. So those are the two things that the DMS tries to do. Because there are a few more domains that we are looking at in R3. One is this war. The other one is delivery. So again, based on where the definitions team ends up and the contributors coming for those two projects, those may get added on top of R3 as well. So we have worked with the data to make sure that what they are defining will become the source for the code side of things. So they can become the JSON specifications for these TDMS if you will. And they also need to be linked into our test cases to make sure that when we make a release of the platform, that it's compliant with the standards as set by data definitions. That is very quickly, you know, a quick overview and contrast of, you know, where we were with each of the individual services in R2 versus R3. Now let's look at the API layer and see, you know, how can you consume this understand a bit more of the API. These are slides that were previously presented. So I'm going to go really fast on this. But if there is any questions, we can we can take it at the end can come back and details. So the API itself is divided into really two big categories. One is the platform service and the other one is the data service. Both of them have a footprint in R2. And of course, these are improving in R3 based on the functional improvements that I covered in the previous slides. The data services are really related to the data flow side of things, whether this is the ingestion of data, the enrichment of data, the consumption or delivery of data, or the services that go hand in hand with the the data from a heterogeneous set of sources such as handling frames of references like units and and the data service itself then depends on the core services of the platform and the platform service is responsible for keeping track of the schema, keeping track of the objects, keeping track of the metadata of the objects. And this is the one that helps you with identity, versioning, lineage, context, indexing, search, arguably indexing in search, a few people have mentioned, you know, isn't it related to a type of consumption of data service, but the ability to discover something that you've stored is is so, you know, primal fundamental to the platform that we would like to view that one as a platform service. And then last but not the least, the, the, the CI, the CD scripts, the support for operations as we talked. Those API's that are really domain agnostic and enable both data flow and domain specific capabilities or effectively the platform services. So jumping into a bit more detail on the platform services always delivered in R2 is a storage service. You may have seen that with the OpenDES contribution there is a slight adjustment to how the schema itself is captured. The data reports also have a slightly different format compared to R1. And on the security side, this is based on the ACL based. And just to recap on R3, you know, this is a global deployment updates to indexing and search policy based entitlements and a new schema registration API. So if you're bringing in data from other sources and you want to bring in the data in source fidelity, now you have a convenience API to register and manage schemas. So the data side of things. Pretty much this was all done through scripts outside the platform in R2. So effectively these are going to be brand new APIs in the R3 sense. So the ingestion framework was a type in the R2 sense. OpenVDS is the only one that's really delivered as part of R2. With R3, you're seeing enhancements both to the framework side with respect to the ingestion and the enrichment framework. We talked about domain data management services. So to provide a pluggable way of doing things and I'll get into this a little bit more detail. We will also have a DDMS registry. So, you know, it's easy to plug in new domains and domain specific services into the framework. Likewise, the ingestion and the enrichment framework will support DAGs that will allow us to plug in the parsers, new data types into the framework. Very quickly on DDMS. So today what we have is a generic set of APIs. So if you go to storage or search, you get a JSON document out and the structure of the JSON document is hopefully compliant with schema. But if you try to generate your time code, for example, because those structures aren't baked into the definition of the APIs, you're not going to get type safe accesses, right? So for example, if you generated a Java client library, for example, you're not going to have an entity called a well, you're going to have an entity called an entity whose, you know, kind is a well. And it's basically effectively think of it as a name value pair bag of values, right? So it's good. It's flexible. It's the foundation that we need. But it is not possible to bring in those additional semantics and the type safety that is necessary to build robust applications that can run natively on top of it. And that's what the DDMS does. It brings the semantics, it brings a type safe API, and it also provides optimized array accesses. So from a consumption standpoint, you don't have to think about this as an export or delivery of the source data, but one that can actually give you the relevant chunks of the array data that has been, you know, enriched and digested from the different sources from which it wasn't just APIs should you use. So of course, everybody should know how the core platform itself works. So everything that is related to the platform service, if you will, how the authentication works, how the search works, how storage works, how it really works, how compliance works, those are things that everybody should know. And if you are an author of a new type, or you're going to connect the OSDU platform to a new source, then you may need to know a few more things. You may need to know how the ingestion framework works. You can plug in your parser into the ingestion framework. How can you build a DAG using pre-post process steps? How does the security of the system work from an entitlements perspective, from a compliance perspective, so we can tag and enforce the data properly. And like I mentioned, you know, if the source of data that you're bringing in is with different frames of references, then how do you use the helper functions to process the data during enrichment for a homogeneous discovery, or to make those available in your consumption workflows within the applications to transform it to how the user wants to consume it in their work. So then let's look at a few more use cases. If you are looking at building an application or providing enrichment services, data quality classification, or other types of curation of data that can bring in transform and higher value data out, you may want to look at the enrichment framework and how you can plug in, again, work steps within the framework to react to notifications of specific input types and be able to transform them, extract them, match, map, merge, classify, curate, if you will, to bring the data into an enriched form and bring it back to the data principles. Of course, you're writing this back as a newer instance of the data with the appropriate lineage pointing back to the source data that you had basically improved upon. So the consuming user or the application has the relevant context of the lineage and what transforms have happened on that piece of data. So if you look at the domain object management service APIs to register your DDMS, so if you want to bring in a new domain. So for example, logistics and Emerson are looking at bringing in reservoir domain, Schlumberger and EPAM are looking at the well-delivered domain. So if those domains have optimized array accessors, type safe accessors, and you want to package them into a DDMS, you will look at the DDMS APIs to register and link in those. And of course you need to make sure that you're using all the platform APIs shown here as code APIs to make sure that you register your schemas, you register the identity of your objects, you make enough metadata available. So it's indexed, it's searchable, and you tag the data from a entitlements and the compliance perspective. So any new domains that we bring in still follows the same crosscutting principles of the overall OSDO platform. Last but not the least, if you are an SI, if you are an administrator of the system, if you are responsible for operations, then these are some of the APIs that you'll be looking after. Entitlements, compliance, and again with R3, you will also have the policy-based definitions of compliance and entitlements, so how you can define and modify the policies as needed for your deployment environment. And if you have specific tools, maybe companies subscribe to Splunk, for example, then how do you then route those logs into those types of enterprise tools that you may have for your operations monitoring, your telemetry and insight support, how you want to notify the users. So those are all the things that you may want to look at. And again, the platform will provide you the capabilities around logging information, telemetry information, modifications on new and changed data items that you can then react to and hook it up into the other tools that you may have set up in your enterprise. So hopefully that gives you a flavor for the different services in the platform, where we are with R2, what are some of the work that's going on with R3. What the API landscape is, and again the two broad categories, the platform services and the data services, and in particular where these DDMSs fit in in another time frame. And hopefully the last few slides gave you an overview of, you know, based on your persona and what you're trying to do with OZU, what are some of the APIs that started it towards you and how you can benefit from it. So if you want to learn more, what do you do. So a couple of options. There is a application developer bootcamp that EPAM had organized. There could be additional sessions that could be created. We are also contributing a platform developer bootcamp. So this would be to join the PMC group in one of the projects as a active contributor. Of course, this is also a pitch from my side from a PMC standpoint, to all of you to look into your organization, to look at who could be a good resource to help become a contributor. And of course, the best way to learn the ins and out of the system is to actually join the development of the platform itself and become an active contributor and eventually even leading up to becoming an active contributor within the platform. So hopefully that gives you a flavor for a overview of the OZU data platform and, you know, what are some of the principles, the overall functional architecture, APIs and so on. Like I said, let me read and close by saying, you know, appreciate if you can bring in additional contributors to the system. At this point we have more requirements than whatever kind of contributor pool can handle. So any helping hand that you can provide there is really good. And again, this is the first in the series of these data platform architecture series. This is going to be two weeks from now presented by Google, and that's going to cover the new ingestion framework that is coming in the R3 timeframe. So that should leave us maybe 10, 15 minutes or so Dennis for any questions. Thank you for the presentations. I was wondering whether the well known entity will be included in R3 timeframe. So you can merge data from different sources with different schema. I think the enrichment framework will certainly support that, Philip. So first step that we want to go to is what's a well known schema so meaning, if I loaded a well from, you know, Petrel or OpenWorks or anything ever to be able to engage it into the standard OSDU structure so I can look at all my wells through a single lens. And then perhaps after we've reached that milestone, the next phase could be, okay, now that you have this, how do you then rationalize put it to the duplicate instance and a part of instance Allah. Hey Philip, can you hear me it's Jay. Yeah, so for the energetics projects which obviously is going to be the first well known entities being defined publicly in the system. We have two alternatives. We can go map in our ingester or our ingestion process between the incoming file format. So that what gets loaded into OSDU is already in the schema of the OSDU JSON documents. Now we're talking about only that small number of attributes, or we do have the option under open desk from Flumberjay to simply load with some well exactly as it is in and then, but that wouldn't be in the OSDU name space. Maybe in an energetic shell name space or an energetic name space and then do an enrichment process to convert that later to convert that into the OSDU format. So it has a big advantage and a big disadvantage big advantages. I now have that document available in JSON in a document store. You can choose for the indexing service to notice that it's there or choose not to have the indexing service notice that it's there. And that's a really good thing. Plus, the document would come with all of its attributes, not just the tiny set of attributes that are defined the OSDU data model. So that seems like a good thing until you realize that now all of a sudden you're going to have this chaos of hundreds of different styles of JSON documents and that seems very chaotic. Flumberjay donated code supports that for lots of workflows. It may be that OSDU as a community will choose that they don't want that to happen that they want to see data right when it appears in the OSDU document store. As it's already been subsetted down to that set of things that are in the schema and has been translated into those words. And then, but then the full document would go into a well related domain specific data management store right so you're still you're still never lose the data right the question is how much of it makes its way. So to me, that's an OSDU forum decision. So whether how they would like that to work, but that decision is key to what it's well it's key to what it just exists doing. It's also really key to what we're doing on the data loading data prep and loading team because there's three possibilities. You do the mapping before the data appears at OSDU you do the mapping in the ingestion pipeline, or you do the mapping after the fact after you've put foreign data into the document store. So I'd like to comment but I think what I everything I said I think is correct and open desk allows all of the above. And so, but the OSDU forum, in order to make a standard that's going to kind of guarantee the kind of interoperability and and marketability, if you will, the solution could could choose to say we want that mapping to happen during the ingestion pipeline, which will be talked about in two weeks. So, Rob, would you make a remark. Yeah, so, so thanks. Anything wrong. No, no, no. Let me let me let me make a few comments. So, yes, in fact, with R2 effectively the first mode is what you said, meaning the processing of the data, the extraction, the transformation to an OSDU schema or a manifest file is happening outside the system. Right. So, the OSDU system evolves and we want to bring in a new attribute into the mix, if you will. You know, unless you know what your sources and hopefully you can reach it at the point that happens. The system will not be able to sort of auto upgrade and evolve its its model in that sense. Right. So, that is the reason why just to reemphasize what is ingestion versus enrichment. The idea was to say, okay, look, I want to bring in the data with the source fidelity. So once I've brought this data into the platform, I do not have a reliance on the source file or source database or source web service or whatever, as the system continues to evolve. And the data model becomes more richer over time with attributes, relationships, etc. Effectively, what you're doing is you're rerunning an enrichment to then process that input data if you will, and then generate the OSDU perspective of the data for consumption. And what that also allows you to do is to keep track of the lineage and because the enrichment random Python script that somebody's running outside the platform. You can actually say that, okay, this OSDU well, I was actually generated from this with some file, and this is the specific tag routine that we ran to generate this and that sort of a context brings trust in a user to know, okay, trust this information more than something else and what processing has happened on top of the data. With respect to the pool of data and how to make sense out of it. So that's where the entitlements comes in. So, a lot of cases, at least, you know, now wearing my Shlumber J hat rather than the PMC hat. And some of the commercial customers, the way they do it is to the impact. Right. So you would basically say that the source data is entitled to the curators of the data, you know, think data managers think it type people. And then you effectively provide access to only the well known schema to the end user. So that way you're providing good quality standardized data for consumption. But from a management standpoint, you have a single platform a single place where you can manage and curate all of this. So if you choose to do the reverse, like I said, you could choose for the indexing service to index data that is not yet in OSD format. So the indexing service today does not discriminate. So if you have a schemas from other name spaces, it will index all of them, but the ability to retrieve anything including the index is driven based on the entitlements. So if I say J is a data manager, so he can see the logistics namespace data, but Philip is an end user and therefore he can only see the OSD one from Philips standpoint, that's the only thing that even exists in the index. Maybe I'd like to ask Doug to kind of take this issue back to data definition subcommittee for some discussion to provide some guidelines. We've actually talked about this. Okay, so in data definitions, but remember there's many data definitions calls, but you know, at the risk of sounding like a broken record. And Doug, please jump in, but the data definitions crowd is waiting for this deeper dive into the details of how the ingestion pipeline. Without really having the next level of detail of what services are there now, what services do we think we're going to develop? Energistics will probably develop a few, you know, Mikkel and CGI guys will probably develop a few, you know, as we identify things that are needed, we may spin up tiny PMC projects to add little services into the core. But Doug, is that a fair statement of where we're at? We're kind of waiting for everyone to get on the same level of knowledge? Yeah, I think it is Jay. I don't know if we've said that explicitly, but we certainly are kind of forming our understanding right now and aren't really able to answer those questions, Philip. So, two weeks from now, we are very excited to hear about ingestion pipeline. Then after that, I think data definition subcommittee can have a good discussion on this, the very issue. I think a lot of these workflow things belong to Mikkel Founderhoven. They belong to the data prep and loading team. That's kind of the way it's fallen out in the, within the data definition subgroups is that Mikkel's group, which I'm on. So I can say we, Doug's on it so we can all say we, because all the same ones are on a lot of the same subcommittees, but the truth is, you know, rather than having the people doing well delivery worrying about this. It either belongs to core concepts or the data loading and I think it's pretty much James has pretty much driven it that that particular thing is in the data load data prep and loading team. So if people have energy around that, join our calls. Yeah, I think Wednesdays probably should also encourage data loading team to join the next architectural series. Oh yeah, we'll be letting everybody we need to let everybody know about that. The data loading calls are at 9am Houston time on Wednesday. So if people want to join those, please do. Any other questions guys. In this team. We saw that the problem would be really, really the data preparation and the status of the data preparation when we want to ingest it. I think that the main problem we have for the time being. It could be very easy for us to ingest rescue ML fun, which is generated by paradigm or fire should measure, which already respect the rules, and then after we can enjoy the same. The same way we can really ingest easily something we should be somewhere in the PPM environment, among which we can after apply the rules, because by example, when we are looking all the data, they are so inconsistent that it could be necessary to pass in a software before going to the to the ingestion part. So that's a good point. So that's that's also the reason why we want to make sure that those type of manifest generation, one is not manual, and two isn't done outside the platform so we needed to, you know, redo that, we don't have to, you know, go after wherever we ended up dropping these scripts and manifest at the point when we initially loaded the data. Roger, where can we find this PowerPoint. After meeting. Yeah, so I in the chat window, I've put in a link. This is a community wiki page link that contains the different topics that will come in this architecture series. So I have already linked the slide in there. I don't see in the chat project. It's not in the chat. Not in the chat. My chat is empty. Sorry, sorry to interrupt your sorry to interrupt your flow, Raj. I can see it. I've gone to the link. My chat is empty. That's interesting. Okay. So if you go under platform, there is a new folder called enterprise architecture. And then under weekies, you will see this is where the topics are. Okay, okay. And you see the slide is already attached. And we also have the slides for the next one but you know, hang on to it until this has actually done the presentation two weeks from now. I'm also working with Judy and Dennis to get these recordings uploaded to YouTube. So we will also put a link in here. Then if you actually want to hear the speaker and you missed the session you have a way to replay it. Yeah. Thank you so much so much. No problem guys thanks thanks for joining this morning.