 Good morning. My name is Michael and today we'll talk about the HS2 governance and specifically about data storage. To begin with, we'll make a short introduction and a short historical discourse on the topic and start with the era that probably everyone is most familiar with. We'll talk about Internet in general and history of Internet, which may seem a bit not that related to the topic, but it has some relation. Can you put the song up a little bit? Is it better? No? Can you put the song up a little bit? Is it all the way up? Is it better? We'll try. Okay, so let's talk a bit about the evolution of the web. And if you look at the slide, you'll see that, first of all, Internet was simply to connect different services. Everyone was thinking about how the services are connected and the core challenge of that time was to make things work. It dates back to 1970s, 1980s. Then people started talking about the idea of consumption of information, and this is where Web 120 comes. And most of their websites, as you remember, were just study websites when everyone was downloading pages, downloading content, some lightweight graphics, and it was pretty straightforward. It was new and it was a kind of a master culture. Web 2.0 came and we got into the web where we can not only conceal the content, but where we can create the content. Share information, upload the data to the services online and to use the data in some kind of easy and straightforward way. And now more and more people talk about Web 3.0, where we are not only sharing the data, where we are not only reading information, but also talking about how do we control what we share and how do we let's say own what is the information that we have. If we look at this point in view, if we look at the world around and especially health information systems, we are somewhere in between Web 2.0 and Web 3.0. And for most of us, it is not a very, let's say, cut in the age where they, so we think about centralized services, we think about services that are used for sharing information, but we don't think about knowledge. And in the recent years, maybe two or three years, if we look at what's going on in the world of regulation in the world of intentions, there was a strong trend of open information, having control with information and trying to get a holistic approach to managing the data. And this topic is related to data governance and this is what we'll talk about today. We'll start with the definition of data governance. And the one that comes with standards is quite complicated, so I tried to make it a bit more straightforward. And for those who are more interested in the topic, I would recommend you to the ISO standard of 85005, which is quite reasonable and dates back to 1070 and describes quite a lot of concepts. Just a heads up, we will discuss here today, it will be not very online with the standard, we will discuss different approaches and different ideas around data governance, it's not the summary of the standard and the people who roll the standard may have a different view on this topic. But at the same point of time, the standard is a good reference for implementers and once we start thinking about how data governance applies to your systems, to your scope of responsibility, looking into the ISO standard is a generalizable standard. So if you look at the definition, it talks about availability, it talks about usability, integrity and security of the data. And if we look into the aspects of data governance, we will talk about three things, what it describes, what it defines and what it does. So, you see that there are, that usually the suspect's ability, is ability, integrity and security of data. So something that relates to the business or like practical qualities of information that we process. Data governance defines processes, roles, policies and standards, all kinds of components of the framework that make it live, and it impacts information of the data in the process, it impacts technology, it impacts people, and more importantly it impacts culture. And apart from all like methodology definitions, apart from everything that creates kind of a governance framework, we probably should think that all things related to technology, all things related to information, they always hit the barrier of people's adoption, they both would hit the cultural barrier, and whatever the right to implement or try to resolve the related changes is always about people and culture. Let's go further and talk about two definitions. First is data sovereignty, and this is a concept that tells that data is subject to the loss of the cultural barrier that was generated or collected. And the second is a related concept, it's a data residency or data localization, and it talks about the way how governments try to restrict the use of data outside of their restrictions. So if we go a bit more in depth, what are the most often questions that we receive is, isn't there restriction to process the data within the country only, or can data be transferred, and the answer is it depends, because sometimes there is restriction of certain processing outside of the country, sometimes there is restriction of having data only within the country, and sometimes it can be both, and in certain cases it depends on the law and the type of data. So that would be a bit more clear. The main topic of this conversation is data sovereignty, and it came from the concerns that governments are not able to control the data, and if you look at the whole evolution of the data use, governments understand that the data becomes more valuable asset, and sometimes it has a traditional meaning, sometimes it has a natural meaning, and it applies both financially and personally to citizens, their habits, interests, and of course to the health information that governments have, on final policies. So that was an international choice. On the one side, we have a development of the cloud services, on the other side we have a development of data value, and there are two different directions, one to make data more accessible. Generally, this is where the infrastructure develops, how it develops. On the other side, it is about having control of the data that you assess. And we see that once governments claim data sovereignty for the most critical asset, the assets that they have, it causes some conflict in the need and demand of data sharing of global use and regionally. And this is one of the internal conflicts that they have. This is a very recent picture demonstrating how many countries have recently adopted data privacy laws. And if you look on the statistics behind them, in this or that form, more than 50% of them have statements or clauses about data sovereignty. Quite a lot of countries consider or overthink about introducing requirements for data sovereignty in their laws. At the same point of time, you can see that if you compare this picture to what was five years ago, the amount of the countries we were acting data privacy legislation, including the legislation for the critical data transfer, the critical disclosure of data in the country, it flows. And we expect that within five years that the coverage and the impact of data protection regulation will be much stronger than even now. Okay. On one side, we talked about criminal information and opportunities that showed use and reasonable data gives to researchers in the communities to mankind in general. And on the other hand, there are also technical aspects of that called services and cloud storage of data, very easy to use, very inexpensive. And they give a lot of opportunities for cloud processing and capabilities that we have never had before. And on the other side, it comes on the cost of limited control of the data, kind of a left trust to anyone who contains a cloud, anyone who has control over the infrastructure that process our data. On the contrary, if the country introduced a country to introduce a data residency regime, they think that they have full control of the ownership of the data. And there is no tenancy from all kinds of external factors, and they, at the same point of time they may have reduced capabilities for research using this data. There is no silver bullet and there is no solution. We see how the situation is evolving. And the tendency is that we will see more data reservoirs which is in action rather than more freedom in using and reusing the data. The impact is not predictable, but at the same point of time, I think that the industry and the technology is slowly adapting to that, and we'll see what kind of technical solutions and what kind of approaches exist to handle the whole type of data sharing and data transformation. Let's think about how data sovereignty matters to us, why do we generally talk about this in relation to health information systems. So first of all, we have to meet revenue requirements. So whatever we create, whatever we use as a information system, it should be legal, and it should be protected by, let's say, covered by the legal requirements within the country. The second is, once planning or refactoring system, we need to understand the impact in the server architecture. So this is where the technical part starts, and we need to decide can we use cloud services, should we have everything on promise, can we use local cloud, where it was stored back up. So different questions, what's the latency, can we afford having the server stored, the latest server to the server in another country, what's the cost of that and so on and so forth. The next important topic is data ownership. It is generally required or generally mandated by most of the information security data governance standards, and data ownership is one of the, like, core concepts that govern the use of data. So once there's the owner of the data, he can define what to do with this data, what risks apply, and what decisions the organization should or the country should make in relation to this data. Then, the next, I would say, is a step we need to plan much of the data capacity, so the planning for data governance impacts where we store the data and the whole framework of the requirements, so it is somehow formed by restrictions that exist from the legal point of view and from the data governance process. To learn the technologies that depend on what conditions we have and understand what kind of capacity we need, and you need to show some operations to keep the criteria to keep the data quality at the desired or required level, and finally we need to protect the data. So, these components are maybe not the full list, but quite a big impact list of things that we need to consider while implementing our systems or while thinking about data governance. Now let's look a bit into different types of data processing and we will start with local data processing. So one of the clear aspects, one of the clear things that comes to mind first, we'll talk about benefits, it's supporting the development of an infrastructure capability. So once we say that we would like to store data in the country, we unlock the opportunities for building the data centers, and we unlock the opportunity for creating infrastructure and maybe local businesses to use the data and transfer information within the country. It's an index of internet connectivity, which defines the speed of connection to the internet in different countries. The same, in this kind of generally applied for can be measured against the data capacity that exists in the country, the amount of data centers, and the latency to these data centers. So, I definitely can be understanding the nature, but definitely recommend to use this kind of resource to check and compare the levels of development of data processing facilities in any country and see what is kind of the best approach. The data owner has some leverage for providers, and this is also the way of control and have a kind of better security and better control framework related to the critical data protection. This is also which I think, if you look at the trade organizations and the risk reforms. This is one of the larger topics protection is all kinds of the sanctions, all kinds of geopolitical risks, having data within your country. Also, it has a big technical reason that you don't need to think a lot about our internet connections, which are not available to high quality in the market. At the same time, a lot of data processing has quite significant changes and issues to consider. First of all, when you start developing data centers, businesses or services in the new market, there is not that much competition and in order to reach a certain level of stability, to reach a certain level of quality, you need to, like, ask quite a long way, and the, like, you must have issues that are available from day one. So it means that, especially if there is not enough development infrastructure in the country, you will always have a lower SLA, you will always have some kind of service quality issues. It also comes with a high operational cost. And if you see the recent developments, for example, in the server printing price, and the time that needed to look and deal with service, it takes months, or even half a year, up to the year for specific configurations. It means that it is not something that is immediately available in the market, even if you have funds to buy it. Another technical difficulty is, like, over-president commercial machines, capacity planning issues, and also it's a problem of skills and access to the market of skills in place in the country. So, to rely on the idea of a lot of universities that have ready-to-carry graduates or experienced engineers in this field, quite a suitable for this picture. And this also happens, and once we talk about statistics, the set up time is, I think that every engineer or every manager of the information systems have experienced something like that at least once. And the more we are in the industry, the more we look at that. I think the probability of getting into such a situation is increasing. What else can we do? We can use our data processing, which is, as you mentioned, it is cost-effective, it is easy to scale quickly, so you can buy more disk space with a single click of mouse. You probably have less concerns about managing the group, because everyone gives probability to the service and you literally don't know where it all runs. And you can get quite a lot of complicated configurations out of the box without any extra efforts. And then, at the same point of time, there are drawbacks as always. Payments are complicated. Sometimes payments are subject to delays, sometimes payments subject to inter-country sanctions or trade wars or any kind of extra complications. Sometimes payments typically supported by providers are not recognized by the most of the budgeting authorities in many countries. Sometimes you can't impact the schedule at all times when you have a mission-critical task, so a provider can have a don't have, but it will be primarily believed. And the biggest issue, I think, is the issue of trust, because even if it is a public cloud with a lot of specifications and all its assessments, it is quite hard to terminate the trust in this infrastructure and generally terminate the trust. So that's why if we look at more straight-forward eternities of data, there is a lot of work, there is always the issue between storing the data within the country or having it in the cloud. And disregarding the way or the path it is led, as we discussed at the very beginning, there are other factors that impact the data stored. And one of them is security. And security is not only about having trust, but it's about following the set of measures and procedures in relation to the data. And if security itself now became kind of self-contained and independent discipline, so it means that whenever you have sensitive data stored locally in the cloud, different manifestations of different, say, regimes or logins, bigger regimes, they require specific processes and specific procedures for having the data. And the more sensitive there is, the more obligation you have in relation to this data. And typically it's not about just putting the data in the local server or in the cloud, it's also about ensuring quite a complicated or quite a robust process for having this data, including access control, backups, capacity planning, change management and incident management. And as a kind of a quick illustration of that, we would like to talk about incident handling and we'll talk about some requirements that are related to incident handling and kind of a personal problem to this point about this. And what I'm going to discuss is actually related to my function as a co-chair on a small working group from the State Department's PEPFAR initiative. If you're here, does any kind of work with PEPFAR is under here. All right, here's a few of you. So just before I go into what I'm going to describe, this is very early. I'm going to read out a few slides. This came up with a conversation with Bob and David and they asked me to do a quick point because it's an example of data governance. To get back to the point, there is a requirement in a country operational plan 2022. That's on the State Department's website for those of you that are aware of this. You know, plans and governance around PEPFAR work and there's more IT language in the country operational plan than really a previous year from understanding. But in there, I can't remember the exact page number, but there isn't incident reporting requirement. So a PEPFAR funds and partner, so some of the PEPFAR funded activity, has a breach of a personal identifiable information. So any kind of data put together or individually can identify someone. If there's a breach of that data, and that's a broad term, that could mean a loss of thought that could mean a third party accessing your national data repository or RMR. So you need to contact and notify your PEPFAR agency so it can use USAID State Department, CDC, etc. So we're still developing standard operating procedures at the agencies we're developing training. So I don't have the time yet, but you should see some kind of notification on this requirement pretty soon. Hopefully the next six months we'll see. Yeah, if you have any questions, you know, for pretty proud of me, I can let you know what I know. Okay. So the requirement of reporting about the incident is generally mandatory in any kind of a privacy law. And I think that if you look at the whole life cycle of data data process. The life cycle can take two years, five years, 10 years or even 50 years. And with the incident reporting, we have a really very short timeline. It is literally hours, maybe days depending on which law applies. And typically we are not very prepared to act that quickly. Staying aware, staying on the top of this process is quite an unusual practice and it requires quite a lot of training. It requires a lot of internal work for the team to ensure that they process the data. And it's very efficient when they have enough safeguards and controls in place. So, the more requirements like that we have, it brings us to the like much more difficult and much more challenging environment, which is quite reasonable on each side, but this is something that we are generally not very prepared to. Let's talk a bit about practical aspects of data governance. And the first one is about data ownership. So, it is a key concept because data owner is a party that makes key decisions about the data. First, it's about data strategy, how we use the data, what we use the data for, what are the most important elements of the life cycle, what's about retention of the data and so on and so on. So, the data owner definitely establishes high level of creation data processing requirements. So, who is the process of data and what kind of process means journey value. He or she also approves functional changes and access to the data. And of course, this all defines back-up requirements, which have a joshing voice. Sometimes we hear, okay, we have retention policy, we have data backup, we have access control process, but when we start asking who's the data owner, who decides, it turns out that there is no person on the appropriate level of seniority to make these decisions. And in case of the incident, or in case of the post-mature, we need to, we think we need to find the proper stakeholder who can access the data or handle the incident or can teach how to revise the current strategy and show the direction. And a lot of, let's say technical decisions and a lot of issues with these decisions, they come from the lack of ownership, which is quite natural due to the historical development for information systems. So, one of the immediate advice that we have is talk to your data owner, understand that this is a real data owner, and confirm that your current view on data governance is aligned with their case. Data life cycles. We talked about data life cycle, and this is what we can do in this kind of different stages of using the data with different types of security measures to be provided. There will be a lot of comments in the future, but I would like to keep it as a kind of connection between what we do with the data and how to provide it. And I think it's one of the most frequent illustrations of how data life cycles should be applied. So, depending on the methodology used for instruction in the data life cycle, there can be different stages, there can be less, more stages, but at least this one is quite comprehensive and shows a lot of evidence that you probably should implement or remember about capacity planning. Yeah, this is quite a complicated topic, and it includes a lot of math to study and to apply, but at least you can just go through your current capacity planning process and see what is there, your production data size, what is your data backup size, how much data do you store in cash, and how much data you use. And then you can make an estimate for the next one to here and see if your current storage capacity matches this plan. There are different ways to define that, but at least this is the kind of Indian way of doing the capacity planning. There is a constant difference between local data center or local cloud and looked into cloud solutions in the market. So first I'm here to help make sure you have to define if your current solution of on-pens can work. You can probably study typical delays for public cloud services and for what they have, and try to relate a lot more slowly. And then you can help you to decide if the local storage is sufficient or what can be done to improve it. And of course you need to sign up for data over to see if they are technically your current solution. Most of the public, they are publicly available in different approaches to collect that, and during the expert knowledge we can make a guide into this topic if you are interested. I mentioned also some connections in market. So for those who would like to use cloud solutions but still are thinking about visualization and having a point of data on the premise, key cloud providers like AWS, Microsoft, and Google Cloud, they have solutions for that. There are a bit different in terms of what they offer, but at least key technology providers they have solutions for government clouds, for current clouds, and for the data logic. And also they can be subsidized moving to the cloud, and all the markets are very different, so options available and you can probably reach out to your legal teams on your engineering projects to see if it makes sense to consider such issues. And another topic which is more shading than that is data-sorting as a service, which also is often where the commercial profits, and they say guarantee data-sorting as a service by encrypting the data. So there are still these big discussions about when the key for the guiding of this approach, but still a lot of data processing happens in the cloud, and even if data is encrypted and shared to different regions, we're concerned about if they can transfer commercial cloud access into the cloud. Let's sum up what we discussed today. First, privacy laws and data-resonance laws are changing quite often. In the last years, we see an emerging change in the regulation, so there are new requirements appearing literally every quarter, and it's important to track what applies and what will be applied in the next two to five years. It's especially important if you consider developing the systems or you can see the major upgrade or major renewal in the architecture. So I think it's smart to look into what are the current government plans and plan your system architecture accordingly, looking at what will happen in the next two to five years. Data residency will definitely impact your operations. So it's about architecture, about finance control, data management, and technical aspects that can be comfortable using the data if it's told remotely or locally. And we talked about Clouds, Clouds versus Cloud Solutions. Cloud is great. At the same point of time, we're very, very beginning to talk about Web 3. The Web is developing in the direction where both people and governments and corporations would like to own the data. It means that data ownership will become the key topic for the next years and the ways to ensure that we have control over the data and be one of the predominant topics in the system development and the infrastructure. So this is what the industry is going to develop and to use today. And generally think strategically, we see more and more data processing, we see more and more applications for the data. And it means that in the next years, we will see a huge spike of new data users, we will see more and more data being kind of conserved, the better the data increases, it means that the user data will be more and more related. That's all that we wanted to tell you in these sessions. If you have quick questions, we will try to answer them right now. And as a reminder, we will have another session at 4.30pm, we'll keep going in depth. Thank you.