 Thank you very much. And thank you for inviting me to speak today. It's great pleasure to do that. So in my talk, I'm going to talk a bit about data protection and IP and some things to think about what I want to do, I suppose, is build on the previous session where I think we've got some of the building blocks in terms of data protection laws and intellectual property laws. And then think about how some of these apply in relation to projects involving the use of data. So what I'm going to cover today, a quick introduction to, I suppose, some of the key themes in relation to data. This stuff will be very familiar, I'm sure, to the audience, but just to go back to some basics there, and then look at what rights there are in data. And so this is a build on the previous talk. And then you look a little bit about data stewardship, which is a bit of a hot topic, a phrase we're hearing a lot over the moment. So what does that mean and what might that look like? And I'm then going to talk, and this will be the main part of the talk, I suppose, is looking at how you manage some of these risks in practice. And as I said in the introduction, I've worked with a number of clients on data projects in both the public and private sector involving data taken from various sources. You know, that might be, say, in mapping information from the Orton survey or in measurement in the Met Office and then other research sources and applying that data together, applying layers to it, send the data wherever and then producing some sort of output. So based on my own sort of work I've been doing there, I hope to share some things you might want to think about when involved in data projects. And then as I said, we'll finish up at the end with some questions. So I just wanted to start and these are terms which I'm sure you're all very familiar with, but just you're looking at some of the terms talking about here. So when we're talking about big data, what are we talking about? Well, I've got the Gartner definition here, which is always very handy to go to. And they talk about big data being high volume, high velocity, high variety of information. Assets that demand cost effective innovative forms of information processing for enhanced insight and incision making. So you're taking large amounts of data from lots of different sources. And that requires you to look at them in a different way. So there's clearly a lot of data out there and we can do lots of interesting things with it. So that's part of the challenge is actually trying to work out where the interesting stuff is within that and what we can do with it. If we do a bit of a comparison. So, you know, traditional research, you will have a statistically representative samples, you might have random sampling, you might have data resources of data and you'll be testing your hypothesis. So in research based on big data, you're going to have far more data sources from lots of different areas and they may have different different levels of information within that different levels of accuracy or completeness or whatever. But you're looking at all the data is there and you're trying to use algorithms to find, find correlations within that data. And that in turn enables machine learning and AI and therefore hopefully be able to derive some interesting information from that data to actually properly properly explore it. And we'll look at what that means in the context of IP in particular further on. The other thing that is obviously a much bigger issue now is metadata. So not just the data itself, but actually the other data that is associated with that. And again, you're a turn to the Gartner definition here. So it's information that describes various facets of information asset to improve its usability throughout its lifecycle. And it's the metadata that turns information into an asset. It's the metadata definition that provides the understanding that unlocks the value of the data. So the metadata is what tells you about that information or that piece of data to then actually understand that and how you can then look for the correlations with other data. So, you know, there's lots of examples here. We go back a long, long time, you know, library catalogs are a classic example of that. Metatags and webpages. We look at digital photos as one that we all know, but there's a huge amount of metadata embedded within the digital photo. So you've got exposure length, either rating or the geolocation data showing where that photo is taken. And when we're looking in the broader sense of sensors, then there's a whole load of information that sensors will gather as well. And all of that, there are international standards out there, which deal with how metadata is organized and sorted. And the important thing here again is looking at how, because that can be created independently of the data to which it relates. So, yeah, you can look at that metadata that may be set by IP and that it may belong to someone or someone else and in terms of what analysis can be done on it, again, you may be able to create new information, new data, which in itself may create new IP. And then the final sort of theme I want to discover to start with was open data. Again, it's something we hear a lot about what do we mean by open data. We hear the Open Knowledge Foundation, so they define open data as data that can be freely used or reused and redistributed by anyone. And then usually subject only to very minimal obligations around attribution and sharing on the same terms. So what do we mean by open data? Well, it doesn't mean there's no intellectual property rights in that data. You don't know what you can do with it. You're granted a far broader right to do that. So, as I say, open data is usually licensed to the basis that enables people to freely reuse that or to distribute it. You can pass it on again or publish it, provided you acknowledge and attribute the original source. And remember that open data is generally distributed, a bit like open source software, on a basis that there's very little protection that you get in terms of being fee from infringement of copyright or other intellectual property rights or accuracy or completeness. So it is a great source, but also needs to be used with care and just to be aware of the risk. Within the UK, we have regulations that apply to the use of data by public bodies. So we've got the use of public sector information regulations, and they set out the rules apply to enabling access to data sets held by public bodies in the UK. And this is different for those of you that are familiar with freedom of information laws. There's a different set of legislation to freedom of information laws in particular access to data may be subject to a license fee. It isn't necessarily going to be available for free. And information is held by further and higher education institutions is also outside the scope of the 2015 regulations. Alongside that we also have the inspire regulations which deal with space information is held by public authorities as well and the basis in which that is to be made available. The government has a portal data.gov.uk which is designed to try and bring together the various sources of open data data within the UK. There's no no restriction on commercial use of that and in fact actually that that's very much encouraged so that the point of opening up access to data sets held by public bodies and public bodies hold a huge amount of data is to actually try and encourage innovation and to enable the commercial sector, the private sector to find new ways of using that data on a commercial basis and making that work better for everyone. So in terms of licenses, so if you're looking to use the public sector data in the UK, the UK government has produced a couple of model licenses. And the main one here is what's called the open government license. So for those of you that are familiar with things like Creative Commons, the open government licenses are very similar models to that and it is the default license that is expected to be used for data held by public bodies in the UK. There's also a non-commercial variant of that as well, the non-commercial government license. And generally speaking the expectation is that public bodies, whether central government or other public bodies will will use the OGL to make available their data to third parties. And there are some exceptions to that. So if the provider is able to charge, then you can use the open government license. So take for example the Met Office which uses or has powers to commercially license its data and that's where it gets its revenue forward to do what it does. Similarly, there are circumstances where bodies can limit the use of the data to non-commercial purposes, in which case the ANCGL can be used. If the dataset has personal data in it, then you can use the OGL for that and we'll talk a bit more about data protection issues later on. If there are third party rights that don't align with the OGL, then you can't use the OGL there either. So say for example a public body has licensed in data from a third party, then its ability to license that out will be subject to the terms of the inbound license. So you can't grant broader rights to third party than you've got from the party that's licensed to you. So in that case you mean a bespoke license. If there's patents or trademarks and design rights, then again the OGL can't be used and then the final one is around software code. So there is an exception if there are particularly technical benefits of using another open source license and there are many open source licenses out there for software, then you can deviate from the OGL. And I've got a link there in the slides to the webpage in the National Archive website which provides all the information on this licensing framework. And if you're involved in projects that are using data from public sources and it's a really good starting point to understand, I suppose what you can and can't do with the data and how the different models work. The key thing about both the OGL and the NCGL is actually they are incredibly, incredibly easy to follow. So they're written in plain English, they are barely a page long. They are very permissive in terms of what you can do with them. But it is worth it if you're looking at a project involving a data source and public sector then actually having a look at that page. So having covered those sort of basic themes, I wanted then to look at what rights there are in data and say in this I really want to build on what was said in the previous talk and not go into too much detail on it before going on to some of the practical issues. But first off and when I'm doing these talks I always like to dispel one of the most people, people quite often talk about owning data. And that's my data and I own that and you don't own that. You can't really own data. It's not something that you possess, it is something which you have rights in, but different people can have different rights in the same information. So information on its own generally isn't owned. And indeed you can't steal information. So it's a very interesting case and this I promise you I think is the only the only bit of proper law I'm going to talk about the only the only case but it was always interesting when Oxford V Moss, which was about a university student, about just over 40 years ago, who stole from his professor's office, the proof to an exam question, so the answer to the exam question. And it was never in doubt that he was going to turn that bit of paper. He was just borrowing it to read what was on it. And there was an attempt to prosecute him for theft and that that failed because of course said what you can't steal information. You can steal the physical bit of paper, but if you accept that he wasn't looking to steal that bit of paper and was always going to put it back. Then, then I can't succeed. So quite an interesting one in terms of just differentiating between what we call tangible assets of paper and things like that versus intangible assets which are electronic files information things that you can't, you can't actually touch. But why why you can't own information, you can restrict use so intellectual property laws grant you certain monopolies to use certain information where they are protected by IP and data protection laws can also limit what someone can you do with information where that contains personal data. And lastly, there's also the law of confidentiality. So you may have a duty of confidentiality in relation to information, and that might be implied by law because the information is is has the necessary qualities of confidentiality to create that obligation of confidence. Or it might be through contract that you write into a contract saying this information is confidential and you will not disclose it to any other person. Okay, so when we're looking at rights and data come out my little diagram for database here so within the IP laws apply in different ways so we've got the database structure, which might be protected by copyright might be protected by confidential information and then we've got the contents of database itself, which are protected by different rights so there may be database rights in that. There may be copyright in that there may be privacy rights under the protection law, and there may be confidential information. And then in terms of, and I'm not going to talk into this detail because I know that was covered in the previous talk. So in this process we might have new research new, new things are created we might have automatically generated data. So that might be data that comes from a sensor or through some analysis or something like that. And we might have data is provided by partners on the project. And so they're licensing data into that. So that's the information, which is already in the public domain so you can see in that that when we're talking about rights and data we need we need to be clear about what what it is we're talking about here in terms of whether it's the structure, whether it's the contents of that, that database, and then where that data's come from because it can come from different places. So I think each of those sources may be subject to different rights in terms of someone already owning IP in that or the being personal data in that or it being newly created in which case the creator of that will own the IP. So that's the two licensed data is the last one there. I did want to talk a bit more about data collection laws because I think that's kind of quite important one to work through so data collection laws apply to the use of information that relates to an identifiable living individual so if, if an individual in question is dead, then data collection law doesn't apply to that that data. And the person needs to be identifiable from that now that doesn't mean it's only limited to names and addresses and telephone numbers. Any unique identifier which can identify a specific individual is his personal data. That may potentially include things like IP addresses, or other identifiers used online, unique reference numbers, things like that are all potentially identifying an individual. So it's quite important when we talk about anonymizing and pseudonymizing data to be clear about what's actually involved here because if data has been genuinely anonymized and such you can identify that individual anymore, then it will be outside scope of data protection law. But if it's only been what we call pseudonymized, that is, you've done something to it, but you can still identify that individual, then data protection law will will still apply to what you're doing there. So it's quite important when you're dealing with personal data that you understand what the information is and whether that person identified and what steps you're taking if you want to actually anonymize it, you can actually take to genuinely anonymize that information. There are different rules in the under data protection law depending on whether you're dealing with what we call personal data, which is the broadest category or special category personal data. So that's data that relates to health or to religious beliefs or sexuality or membership or trade union party or political affiliation, that kind of thing. And there are stricter rules that apply to that which I'll talk a bit about in the next slide. But with all personal data, there are a series of principles which you need to follow. And this is what all data protection laws built on. So the first one is that whatever you're doing needs to be fair and lawful and transparent. So fairness is what you're doing is fair and not unreasonable. Lawful, you've got to have a legal basis for that. We'll talk a bit about that. And transparent means that you've been clear with the individual about what you're doing with the data. We have the purpose limitations. That means if you collect data for one purpose, then you can't use it for another purpose that's incompatible with that original purpose. So if I collect data for say to service a customer, I may not then be able to use that for something like certain types of marketing without having told the individual I'm going to do that. So being clear about what the purposes are not not using purpose incompatible. There are exemptions around research, which are quite helpful. It can be relied upon for research projects. Data minimization. So that is that you only collect the amount of data that you need that's necessary for the purpose, you're not getting more than is necessary. You can't collect as much as possible just before you because you can and that can sometimes be a problem with using big data accuracy you have a duty to keep the data accurate and we're appropriate up to date. Now that doesn't mean to say that if you have a database that is several years old you have to contact someone five years down the line and check the information hasn't changed. There are limits to the obligation to keep up to date but you do have an obligation to keep it accurate and to correct that if someone tells you that the information that you hold about isn't accurate. Storage limitations so you shouldn't hold that data for longer than it's necessary for the purpose. So once that purpose is passed it should be deleted or destroyed. Data security you keep that data secure. And then we have a final principle here and this is one which came in under GDPR a couple of years ago and that's the accountability principle. And that is that the controller has to be able to demonstrate compliance with all of this. So it's not enough just now to comply with the law. You don't have to show how you're compliant with it and show that you're able to comply with it. And this is this was the big challenge everyone from GDPR came in because a lot of organizations kind of broadly comply to data protection law but not necessarily because they had done that through it was more accident by design. And quite often things would be justified retrospectively someone complained about it and it would all be fine but with the accountability principle, you know organizations now have an obligation from the outset to be able to show that they are processing data in accordance with the law. So that means showing that they have policies and procedures in place and showing that they have training in place for for staff. And that they have data retention policies that are being adhered to the carrying out internal auditing on that kind of stuff. So some of the key issues with the come up and compliance. So first of all, I mentioned you have to have a legal basis. If you're not processing special category personal data, then you have a choice. There's a number of principle that's a legal basis. So that might be that it is necessary for you to comply with the legal application. It might be because you are a public body and it's it's to help you do a task that you can get public interest. It's just being kind of base of your legitimate interests. So something is reasonable for you to do and doesn't unduly impact on privacy or the rights of the individual. And if you can't then find a legal basis for one of those then you might fall back in consent but we always leave consent to the end of the list because that's the one you, you want to be lying upon the least if you have another legal basis is much better to do a task on that. Because consent can be withdrawn and if you rely on a consent someone withdraws consent, then you can no longer process that data. And getting consent may also be problematic in certain situations. If you're processing special category personal data then there are, as I said, additional legal basis that you need to identify and they are much narrower and much stricter. I'm not going to go into those in detail but they are set out and schedule one of the data protection active if you want to look at those. The second issue and this comes up quite a lot with data projects so understanding who is the controller and who's the processor so the controller is the legal entity that decides how and why personal data is being processed. And that could be one in one party or it could be two or more parties acting together in which case they are joint joint controllers. So it's always important the outset to understand who are who's the controller is one or more parties and who is responsible for such being done. You might also engage with that party to do some processing. And they are they're called a processor and they have to process only on the instruction of the controller and they have a narrower set of obligations under under data protection law but there are requirements in terms of what you put in your contract with that party. So you need to show privacy notices explaining what data is being processed for what purpose and what your legal basis is who you are and how an individual can exercise their rights. So you think about how those are drafted, how you ensure they're accurate and how they're issued to individuals. If you're sharing data then you want to think about how that's dealt with. So what's the basis for that, where's your data coming from, who you're sharing it with, how you're ensuring it's being used properly. So using AI for automated decision making then you also need to be aware of the additional rules apply in that as well. So there are particular rules where automated where decisions made solely on the basis of automated means and it has a legal effect on that individual or similar to significant effect. So that is kind of things like you know making a decision whether someone can enter the country for immigration purposes or perhaps whether offered insurance policy or something like that but but there are many decisions out there where I could be used to have that impact and if that's the case then individuals have rights to understand how that decisions being made and to challenge it in certain situations and that's quite difficult with AI. And no one really quite knows how the black box actually works and how the decisions being made, how do you explain that. Data subjects have rights under day protection law so I will all be familiar with the right to subject access and also the right of the right to be forgotten or that the right of razor has had quite a lot of press but there are other rights as well to object to processing or to restrict the way in which your data has been processed so you need to think about how those are managed. A lot of them aren't absolute rights that are qualified so they may require some thought as to whether you apply. And then the other point according to years around transfers outside the UK or the EU which I know was covered in the previous session and there's been a number of cases in the courts in recent years on international transfers and what can and can't be done under EU and now UK data collection law so one to be aware of if you're hosting or transferring data outside the UK or the EU. And then the final point and something again which has been good practice in the UK for a number of years but GDPR put into writing is carrying a data protection impact assessment. So that's doing a risk assessment on the project you're carrying out with personal data so trying to identify what the risks are and then steps you can take to mitigate those risks what can you do about that to mitigate them. It's the processing law for and recording all your steps and going back to the accountability principle, carrying a DPIA is a very helpful way of actually showing that you've actually thought about all this stuff but you can provide a document that shows all your working on this. Okay, so that follows on quite nicely to data stewardship and this is something again a term which is, we're seeing quite a lot, and there was a report from the Ada Lovelace Institute and the ICounsel which published last month. And it tried to define data stewardship so it says it's the responsible use collection and management data in a participatory and rights preserving way. So it identifies a number of issues, a number of challenges. So firstly that there's clearly issues of trust, which come out of a number of high profile data breaches we've had in recent years, and other scandals involving the shading of data or use of data for purposes that weren't terribly transparent. And that then touches on things like power and balances so individuals have very few in a very weak position versus a large organization with lots of data that the old, in terms of how that's done and that there is this power power and balance. Like transparency and I think that the report specifically calls out public private sector partnerships where you've got the private sector and the public sector in particular, the NHS, carrying out projects involving use of data where there's not a huge amount of transparency. And this question is to you know who defines what is good so the parties involved in the project to actually process the data might think it's good and it is worth doing but what do the individuals think about that and how do you deal with who defines what is good there. And I suppose is also just looking at personal data management but some of the principles here are equally applicable to projects involving non personal data as well and what what the report then looks at is some of the legal mechanisms that we could use to manage some of this and which one might work would be the nature of the project in question. But there's a few different concepts which report identify so one of them is around data trust and this is using trust law to have trustees who are responsible for exercising the rights granted in data on behalf of beneficiaries so individuals will provide their data to the trust on on and up to the trustees then to to use that data and to process it and exercise those rights, but it's creating a distinction between the individuals to whom the data relates and the beneficiaries and having someone in the middle and who will have particular duties and how that that data is being used is quite a novel idea and what all this trying to do is deal with that imbalance of power by by separating out the interest of the individual from the beneficiaries or the parties that want to use it. The second concept is talked about is something called data cooperatives so in a data cooperative the idea is you would have members who would come together in a cooperative and they would pull their data in some form of commonly owned enterprise. So that cooperative would then steward the data so they would look after it they would decide what was done with it for the benefit of the members. And that sort of model we, you know we can see working where you have members who want an equal stake and direction, a direct input into into decision making so everyone has a part play in that they have a vote that counts in terms of deciding how the data is being used. And then the final one that they identified is using corporate or contractual mechanisms and this is supposed to be one that is most familiar and tends to be the ones being used most to date but the idea here is, you know, if you're using a contractual model that you have some form of standardised agreement, either between two parties or between all parties you could have a sort of deal of adherence everyone signs up to setting out the rules, or a corporate model where you instead established some form of new legal entity and that is the entity that actually manages the database and what is done with it so it's quite innovative work in terms of thinking about how we can deal with some of these challenges and what what frameworks can be used to actually try and address some of the challenges that the Institute had addressed. And they give an example and this is one that I picked out from the appendix in the document but there's something called safe havens which is used by Scottish NHS Trust as a way to provide access to patient records so there are I think five of these safe havens and they're designed for providing access to anonymised electronic patient records and they provide the environment where researchers can analyse that data in a secure way, but the data then never leaves the environment so it's always under the control of the haven. And as a charter setting out the rules, in terms of what can be done with this access, there's an approval process for anyone getting access to it. And each haven has its own individual responsibility, but it's, it's cited as a good way of providing a sort of sandbox or a way for researchers to get access to data. But in a way that keeps it secure and avoids it being duplicated or potentially compromised by going outside the environment. Okay, so the next bit I just want to talk about how we manage some of these risks that we talked about in practice. And so I go back to my little database diagram. I thought it might be useful just to give an example of what might be in a database that you're dealing with. So you may have information that comes from a public sector data that might be on the basis of an OGL licence or it might be from say Ordin Servi or Met Office and it's licensed on my specific terms. You may have personal data in there. You might have some proprietary information from your own organisation or another that might be confidential. And you may have new data, which has been derived from analysis of all of this. And then the question is in terms of that output, what can you do with it. And as I said, you know, if you have data within this data set, which is say licensed on the OGL, then your ability to license the output to a third party will be dependent on that input licence. You can give someone greater rights to the output data than you've got to the bits that come into your organisation or into the project. And mapping data is a really good example of this where we might license mapping data from the Ordin Servi and then we might layer other information on top of that. And the licence terms you grant to a third party, you will be subjected to whatever terms the Ordin Servi has imposed on you in terms of the underlying mapping data as to what you can allow third parties to do. So one of the things I find really useful in these sort of projects is actually to try and draw it out and understand all the inputs to the licence terms that apply to that to then work out what's actually in there. And who owns what, what if anything is new, if this new stuff, is it derived data in which case it might still belong to one of your licensing partners, or is it new stuff that you now own. And when you're making that available to third parties or publishing it, what's being published. Is it just the newly created stuff or is actually a layered database or some form of output that has data from lots of different sources. And so this takes me on to how you manage the risk. So, you know, the real important thing here is carrying out your diligence and doing your planning in terms of the project. Thinking about what data sources you have, what data you're creating, how you can start to the project and the collaboration, why you're doing it. So you're doing it just for research purposes. Are you doing it to publish something? Are you doing it to commercialize and exploit it? In which case, what's your model for that? And then thinking about how you mitigate those risks. Part of that, as you're looking at the data. So, if you've got what types of data in there, is it personal data, special category personal data, in which case you may, you know, you think about data protection law. Is data anonymized or can it be anonymized, in which case you can perhaps de-risk some of the data protection compliance. If it's non-personal personal data, then is it confidential, in which case you need to deal with that, or is it publicly available. So understanding your data source is really important. Likewise, the prominence of the data. So, where's it actually coming from? Is it a public data set, which may be fairly free for you to use? Is it licensed on strict licensing conditions? Is it coming from within your organization? Is it research, in which case the academic or is it commercial? Have you got multiple parties involved in the project? In which case, who owns what data it's been created or is it jointly owned in some way? And what about the automated data creation? So you may have analytics, you may have sensor data or whatever. And then thinking about how that data will be used. So why are you doing this? What rights might be acquired? Think about those. Think about if you're going to be data sharing, then what are you sharing and with whom and for what purpose? And it's particularly important with personal data being clear. If you are sharing personal data with another organization, are they a controller or are they a processor? Are they a joint controller? What are you allowing them to do with that data? Because the act of sharing the data is processing itself. So you need to be confident in what you're doing is lawful in terms of sharing with that organization. You want to think about access terms. So who has access to that? Is it controlled or is it uncontrolled? Might you lock down certain bits of data compared to others? Will it be on a commercial basis or free to use? And say, you know, commercialization versus academic freedom. And thinking about, are you improving what the data you're getting or are you packaging it some way or doing something new and interesting with it? And there's lots of ways you can try and mitigate your risks here. So the important thing first of all is to analyze other sources and understand those IP rights. So what IP is in the data that you're getting access to? Understanding the license terms. Really important just to know because they said what you can do with that. And if the license terms you're offered aren't, don't allow you to do what you want to do. Then think about going to the license and saying actually can we change those terms because these terms don't work here. I've seen on a number of occasions organizations provide a license agreement that's just not the right one for the project. So quite often they will have different templates for different purposes and they will send one out to a particular person because they think based on the type of organization that sort of license is required but actually it's not the right one. It doesn't allow the parties to do what's actually intended. So it's important to actually read the terms and make sure you've got the right ones. From data protection perspective, look at the type of data. If you can anonymize it and work with anonymized data, that's much better. If not, you can think about your legal basis and conduct your data protection impact assessment. If you are getting data provided by third parties, look at those terms of use and the license scope. Look at what responsibilities the provider has to you and you might want to put in some warranties or some contractual promises in that document in terms of what you can do, what you can rely upon in the data that you get. So you know that you are comfortable using it for your preferred purpose. When it comes to exploitation and use, again, so thinking about what rights have been created and then thinking about how you assert those rights. So if there's newly created IP and it's potentially, you know, it might be registered in some way, then who's actually going to seek registered protection? Which party does that? Is it you or someone else? And what would you do about potential infringement? So how do you stop someone else from using it? Who's responsible for that? Who will grant licenses to use or access the data? So who will do that and what terms are they on? And understanding, as I say, there's really important point about ensuring that you're right-bound terms back to back with the inbound terms that you've got. So you're not granting more rights than you actually have and potentially putting you in breach of that regional license agreement. And in terms of project structuring, so if you're collaborating with third parties, this is a key part of all of this. So if you are dealing with institutions so higher further education, then their approach to risk will be very different to industry, which will be definitely again in turn to the public sector. You may have all three or four in a particular project. So how do you allocate risk? How do you know who's doing what? How do you identify ownership of IP? So what are you doing around what each party can do with the output from that? So different parties might need the data for different purposes and do different things with it. So the institution might want to publish the academic research. They may want to use it for the research industry might want to commercialize the output. The public sector may have something else they want to do with that. So it's important that the outset to be clear and all of this stuff. And so with personal data, potentially you have issues around joint controllership where you're more than one party is responsible for compliance, in which case you need to actually map out who's responsible for ensuring compliance, who deals with data subject requests, who's responsible for keeping the data secure. And with all of this, the best thing to do is to settle this out in some form of collaboration or project agreement that ensures all the parties are clear and what they're, what they can and can't do. And then on data security, so again, who's responsible for managing security and controlling access. If this personal data then the important thing to bear in mind here is that when we when we determine what is an adequate level of security for data. It's not based on the likelihood of that data being compromised. It's to do with the risk to the individual if it is compromised. So the expectation in terms of security measures that you have for financial or health information will be much greater than say, marketing database it just has an email address in it. So in the latter example of that database of compromise and the risk to individuals pretty low. In the in the first example of that database of compromise then the risk to individual maybe, maybe quite high so that the expectation in terms of the security measure you have will be that great higher, that higher for the first example. And so that you then if you're using a third party to host it you want to ensure that you've done your diligence on hosting environment, the structure of the database the access controls, etc. And you want to ensure that you test that and review those on a on a regular basis. If it's been posted off shore then as we talked about there are some issues around international transfers, particularly if they're transfers to us, which make it quite difficult at the moment given given recent case law. If you're hosting data there so that that's something that's going to require an additional assessment on compliance. And then finally just around ensuring that you document and you agree for those information security requirements are so you have a common understanding of what what's been done with that. And this is all good practice regardless of whether personal data is involved or not, you know if there is potentially sensitive IP in there or confidence information you want to ensure that it's kept kept secure as well. So that's just about me I just wanted to finish off with some final recommendations before we go to any questions that we have. And so what my key recommendations while carrying out due diligence at the outset of a project is absolutely key to identify, you know all these risks and work out how you manage them. If you are potentially using personal data then carry a data protection impact assessment and review and update that assessment as a project develops because it's pretty, it's in the outside the project for what you're doing will likely evolve as as time goes on. And what you don't want to do is have your impact system being based on information that becomes out to date so make sure you keep that up to date and review it. Be aware of the rights to exist in publicly available information. So, yes, a great source of information but they are quite often subject to specific rules and how they can be used and some of those are quite permissive but some of them also may restrict what you want to do with that information. And ensure that you can label the provenance information so internally you understand where where the data comes from and what rules apply to what restrictions by two years so you don't end up inadvertently breaching those licenses. By doing something that you shouldn't be doing or making available to someone that shouldn't have access to it. And think of the outset, you know what what you want to do with the output of this data or the information because if you can think of the, the outset about that it's much easier to plan to ensure you're actually able to achieve that. And that involves a bit of crystal ballgazing in terms of thinking about what you want to do, but the more you can do that the better. And then build appropriate steps and controls into each stage of this process so, you know, auditing your sources and looking at the licenses and carrying out security and access controls and reviewing it at a regular basis is really important and if you're collaborating with other parties, have have clear rules in place and just keep on reviewing and auditing your, your compliance. And that was all I wanted to say. Thank you very much. It was such a rich talk and it was very interesting to see how, how all these issues become relevant when we are constantly operating in a digital environment. It was very, very interesting. Thanks very much. And we already have a number of interesting questions as well from the audience, and I would like to now open the Q&A part. So, one of the questions relating to ownership of the intellectual property, that's something you also mentioned while you are exploring around mitigating the risk. So, if hybrid data from multiple sources is drawn together and the relationship established, who actually owns that IP? And perhaps the relationship might permit substantive financial gain. So would you like to elaborate on that question a bit further? I'm going to give the classic lawyer's answer of it depends. I don't then elaborate that in a bit more detail. So, as I tried to show in that diagram, you know, if you have a database which comprises data from different sources, then each of those parties that provides that data will continue to own that part of the data. So you will use it on the basis of a license, which is a contractual permission to use that data. And that's why it's really important to understand what your sources are and what applies to that. And then also trying to work out what is new and being created because it may not be the case that what you create using that data is actually IP that you own. There were two reasons for that. One of them may just be a sort of derivative of the inbound data. So there's nothing really new, it's just an evolution of that in which case it belongs to the original provider. Or it may actually be the license conditions that apply to your use of it, say, well, anything you do with that, you will own, sorry, we will own and therefore you don't own it. So it's a very difficult one to answer in the abstract, but it is the absolute key question here. And that's why, you know, trying to map this out and having a clear understanding of what your sources are, who owns that data, what's then being added to it or done with it. And then working on what's new and then what you want to do with it is really important. And the next question is relating to science. What are your views on rights in data generated by science. Um, can you clarify what you mean by citizen science and in that sense. Yeah, I wonder whether would you like to follow up on elaborate on the question. Sorry, yes. I know this is wrong here. I gave a very general question. So citizen science in a lot of the digital environment now we corrupt citizens in collecting data for scientific results. Ecology, biodiversity data or quality data or quality data. Good example is where they put their party monitors on the back of children as they walk to school and then we collect that data. I've always been very curious on exactly, you know, what the various rights and obligations are on the data gathered because they're not being gathered by the scientists, but by the citizens, such digital rights of those individual citizens that need to be taken into consideration. So that's a really good question. And I suppose the first, first point here is that the, from an IP perspective, there's unlikely to be much IP in what each individual person creates. So it may not actually be that the individual center readings from that individual actually create anything and clearly whether they are the ones that's actually creating that if it's a sensor is deployed by someone else, they own that sensor. So that each individual person would really own very much in that. I've looked at this bizarrely in the agricultural context where centers have been used in farms and things like this to track cattle and things that's come up before. And the way they usually deal with it is actually just to say, you know, whatever agreements you have with the individuals taking partners is to say, you know, to be clear, any IP that does exist in this, we will own and you don't actually own that. So I think that we've unlikely to be much IP in it. What you do need to be aware of those is data protection, because if you are tracking individual in terms of their movements, then you will be building up information that relates to an individual and I go back to a point there around, you know, to what extent can you identify that data such that you can't identify that it was Martin who was carrying that sensor and this represents your Martins what to work or, you know, whatever he was doing. So, I think that that game, it kind of depends on what what information is actually being collected. But what you probably want to do is in the bottle is trying to anonymize that data in some way that reduce the ability to associate it with a particular individual. And then that means that you are less beholden to data protection laws in terms of what you can actually do with that data. Okay, thank you. Another question is that it's becoming more common for academic journals to request data to be made publicly accessible by an open access database. Given many of these open access databases not have contractual warranties in terms of infringement protections. What rights does the researcher have if their data is downloaded from an open access database and use for a purpose that's different to which it was intended. Yes, again, another good question. I think there's a difference. The two things here. One is the rights to stop that if you don't want it to happen and I suppose you would need to know who was actually downloading it to do that. I don't know whether the the database or a positive you would enable you to track that. Clearly that this tension attention in terms of whether you can do that and we talked about the reason why that might be the case but you would want to try and understand who was getting access to it or perhaps try and impose some license terms and what they can do with it. The second issue is around the liability of the researcher or the researchers employer, you know if you signed up to license agreement and you have inbound data from a third party, then actually whether whether you are permitted to do that and whether actually train that the case is potentially put you in breach of those those license terms. So, I think again it's something that if you're asked to do that you need to be clear whether or not it's something you are, you're actually able to do as a matter of law. Thanks very much. The next question is about licensing. Is there a point whereby the great data is so far removed from the original data at the license no longer whole. Yes, that that can be the case you can you know do just enough for that it ceases to be relevant to the original data it very much depends on the on the context but yes the short answer is yes. Thank you so much. And I was also wondering, Martin, you have been doing a lot of practical work, how much awareness. There is out there in terms of you know all these issues relevant to digital environment or digitalization. And the answer to that is that there are certain sectors are much better much more aware of it than others. And I've done quite a lot of work in relation to the public sector of clients who are a safe example of composite a statutory database and things like that where they take data sources and they have to make available information. And that's a really interesting one because you've got your data coming in mapping data you've got meteorology data you've got other sorts of data like that. And it's a real minefield to actually understand all the different aspects to it because you got all that input data you've got what's been done with it to actually create the database. And you've got, you know, academic researchers getting access to to then do other things with it. You've got information made available to the public through a portal you've got, you know, subcontractors doing other things with it. And it's an incredibly complex idea to try and get your head around, particularly when you're dealing with your things like the OGL or the NCGL and your various overlays of statutory duties, in addition to the commercial or academic pressures that you're dealing with within the university. So it is something I think is, there's been a lot more awareness since 2015 regulations came along. And a lot of people asking for access to data from public bodies. And so I think people are becoming more, more aware of it, but we still do see from time to time, you know, projects where the issues haven't been thought about the public bodies and the license and how to actually allow them to do what they want to do. And they're in a bit of a stock situation. Thanks very much. I think this brings us to the end of our webinar. Thanks very much for Martin for this fascinating and interesting talk and thank you very much everyone for joining us today. I hope to see everyone again for our next webinar on the 29th of April, and we will welcome Natalia Domenia-Lahouz, the head of data ethics at the Cabinet Office, Government Digital Service, and we will explore issues around data ethics, the framework. But thanks very much everyone and hope to see you online then.