 Hi, so I'm Karin Becka from Federal University of Rio Grande do Sul and my presentation was prepared together with Givanil do Nascimento who is from Petrobras and he's also a PhD candidate in our university. And actually what I'm going to talk is about our experience in assessing the OSDU platform to be used within our GISTAL team system. So this is done in the context of the project Petwin, which is a cooperation between two countries, Brazil and Norway, and which joins academic partners, Universidade Federal do Rio Grande do Sul, in my case, University of Oslo, the Libra Consortium in Brazil, and Shell in Ekinor in Norway. So the overview of our presentation, briefly I'm going to talk about digital twins, I don't know if you are familiar with this concept, and then what it means in our project, and then I will focus on the requirements of a component called data management component in digital twins, and then the strength and the opportunities that we are working on in our project. So there is no consensus on what digital twin means, so the definition I prefer, and nobody agrees with that definition, but basically it's about emulating physical systems as a virtual systems to make decisions, and that requires us to collect data at near real time and then emulate this behavior by different kinds of models to generate new data and make decisions, and those decisions in a closed feedback look, they go back into the physical system. So among all the definitions we can see a digital twin as a descriptive system, prescriptive system, or predictive system, and the later one is the focus on our project. So in terms of architecture in the earlier days, people just focused on having a physical system and a virtual system, and then there was data in and out of those systems in an ODOC communication. In a more modern architecture, now we have a specific component with the role of data management, and it acts as a bridge between all those systems and between the physical space, the virtual space, and also we have a different system called service system which is more about not the exact configuration of the virtual space, it's more about the services like event monitoring or visualization or simulation or predictive modeling. So in our project it's about production, digital twin in production in subsurface. At application level we have different kinds of applications for visualization, for prediction, for event analysis, for simulation. We want to store the data in corporate and cloud-based database, but we want to use open models and industry standards as much as possible, and the glue among everything is an integrated view provided by the domain ontology that describes the concepts of the domain, and also the concepts within the disciplines that are involved in the digital twin. So about this project, I'm focusing this presentation in the data management component and our evaluation of what are the requirements of that specific components, and our evaluation and assessment of the OSDU platform. So we did a systematic review on what the data management component should be in a digital twin, this is brand new, this became online this Sunday. So we did a systematic literature review, there is absolutely no consensus on what it should do. So the way we see it is that it's a big data problem, it has big data, has a lot of Vs, here we focus on the variety of data, the volume of data, and the velocity in which it is produced and it must be consumed. But because digital twins are about decision-making systems and data-centric systems, we believe that it's very important the V about the value and the V about the veracity of data to make valuable decisions. So when we see this in the in the big data value change, there are activities for acquiring the data, there are activities for preparing the data for analysis, there are activities for curating the data, there are all the activities for storing the data and querying the data, and there are also activities for using the data, which in the digital system is the role of the service system. So we are focusing on activities of the four big key activities. And when we consider that, we see that in terms of role and functionality is more like the role and functionality of a lake, data lake or a data lake house which is more modern concept. So a data lake is a scalable storage and analysis systems for data of any time retaining their native format and used mainly for the purpose of knowledge distraction. So it should support decision-making. So for that, it must support the integration of heterogeneous data in their native format. It should support the organization, the logical organization of the data and the physical organization. It should support various user profiles and using this data. And it must method describe the data and enforce the quality and of course provides scalable storage and processing. And at the end of the day, it's all about using it for knowledge extraction. So in some senses, you have all the sources of unstructured data and it's about indexing and providing metadata about that and having different kinds of APIs to provide access for different types of tools and different roles of knowledge extraction. So in conclusion, the data management component in a digital twin should provide support to represent data at different abstraction levels from raw data, then providing context for that raw data and then from their knowledge. And it must also integrate all the functions to support the digital twin data pipelines to have this data in different abstraction levels. It also must provide data storage and consistency of course. And because we put data there, we want to get it back, we want to find it for different purposes of exploring there for knowledge extraction activities. So we must provide support for data search discovery analysis. And all that in a way that it promotes interoperability and all the different systems of the digital systems are highly cohesive and there are local coupling among them. So our first question was, can the OSU data platform provide the core functionality for that? We have two papers on that together with student of mine, Jacquelini and Correa. Those are referenced there and the answer is yes, it can provide. And the interesting thing in my perspective is that it has different architectural roles and it can be used in different architectural roles. It was initially devised as a system of records for long term data. And for that, it provides a microservice architecture. It's cloud native and provides agnostic. It has centralized storage and it has all these APIs that I don't go through all that because this audience knows about it. But also for real time, short term data data, it can also be a system of engagement for this decentralized storage. And in our perspective, there is a third role, which is the system of reference. In that case, particularly useful in prescriptive systems. And in that case, it must be the trusted source to field the machine learning. I'm kind of this because I don't think this is the last version of the slides. But I'll manage. So in that case, it's because we are focused on production systems with predictive behavior, we are interested in having the single trust of truth, the golden records for our predictive modeling systems. And this is very related to the idea of data lake house. So because we need to provide metadata about our data, OSDU has all these types, group types they are called. So there are a lot of different schemata provided for different domains. There's a lot of work on that. There's a lot of results. And there are huge efforts being done. And you can put all the records related to all this metadata using standard schemata. And there is a significant efforts in aligning with existing standards like PPDM or in logistics or Cyphus, which is the production system. So also, we can see that it covers the basic life cycle. It ingests heterogeneous data in their native formats. It provides a system of records, a system of engagement, and a system of references as well. And it allows us to break down Cylus and all having all data in a single platform and avoiding all the minimal truth among the applications. Definitely it's not the last one. I'll manage. So, but there are challenges in our perspective. And some of those challenges were addressed in the previous presentation. So one concern of ours is that there is no patterns for modeling. And there is not a rigorous definition of how we develop new schema when, for example, to handle missing concepts. Or there is no rigorous semantics on how we do extension on existing ones. And this is not a problem of the OSDU working force that are not working enough. This is about it's going to always miss some concepts whenever I want to model something. And the issue is that at the end of the day, we describe the metadata about the data because we want to find it. And if everyone can take their own interpretation on how I should do it and which concept and how do I link it and extend it in my own way, or even as time evolves and the schema evolves, how do, at the end of the day, I'm going to find the data that I want for all my applications. And how do we balance all the flexibility for modeling, which is a good thing, to find the standard ways to representing that such that they do not need to do mapping if I want to put together data that people represented in different ways, exploring that, sorry, flexibility. There is also some time necessary for people to work in new domains. For example, we work for production. We are waiting for the production data definitions to be ready for a while. And this is not mean as a criticism. This is very hard. We're always going to need new concepts. So how do we create customized descriptions that can survive over time? So in our perspective, we want to explore semantic enrichment using standards and using ontologies as a mean to help contributing to these issues and extending core functionality. And we are also addressing another problem because these are time-centric decision-centric systems. We are wondering whether the lineage mechanisms that exist embedded in the OSU are efficient enough to link the data to decisions that were outcome the data and that were put back in the physical systems. I'm not addressing that. As a very simple example, imagine here we try to represent time series data here, let's say pressure time series. And then I have, for me, is about the what is being represented, representing some production measurements, the where, the who, the when, and the why. And somehow I can do this in the OSU. This is our first version. Now we have three other versions of the same thing, but I can make the point here. So I can do easily the where, the who, and this is improving over time. I can do the why. For example, the extended world test as a business activity. And I can say the what. I can say the time series and I can say the what. But I can only say the unit of measure. I cannot say, for example, it is pressure or it's temperature. That's inside the technical description of the header in the file. So what if I benefit from not only the standard schema that exists in the OSU, but also standard ontologies that already describe this knowledge. And it's well known in the industry. For example, for network systems and sensors, we have SOSA and SSN. In our project, we are developing ontologies like O3PO, which are about production, or the onto tag that semantically describes a tag. Or there are other standards in productions like Cyfus, for example. So in the left side of the picture, I have an ontology that says, I cannot see very well, but it says basically that there is a concept which is a time series sample. And this time series sample, it is about physical measures specific like a bottom hole pressure measurements. And then I can relate my time series to these different concepts at concept level and at instance level. And then I can complement the data that is missing in the original OSU schema. And we can link information model in different ways. And doing so, we can do different things. Like we can guide modeling. Even if we do different models, we can relate those models. We can do consistency checking about the things that need to be consistent. We can do the model alignments. And we can also enhance the type of search that we can do about the data. OSU is very powerful with regard to structural data search. But we can also leverage the ontology to do semantic search. And we can also do data reasoning, which we are not doing here. And we can also evolve for DTs that are system of systems. So when now we can exchange information and decision between digital twins. So thank you very much.