 Hello and welcome. My name is Shannon Kemp and I'm the Chief Digital Manager for Data Diversity. We learned and thank you for joining the latest in the Monthly Webinar Series Data Architecture Strategies with Donna Burbank. Today Donna will be joined by a special guest Becky Russell to present Data Modeling at the Environment Agency of England, a key study. Just a couple of points to get us started. Due to the large number of people that attend these sessions, you will be muted during the webinar. We very much encourage you to chat with us and with each other throughout the webinar to do so. Just click the chat icon in the bottom middle of your screen to activate that feature. For questions we will be collecting them via the Q&A section and if you like to tweet we encourage you to share highlights or questions via Twitter using hashtag DA Strategies. And as always we will send a follow-up email within two business days containing links to the recording of the session and additional information requested throughout the webinar. Now let me introduce to you the speakers for today Becky Russell and Donna Burbank. Becky is the National Lead for Data Standards at the Environment Agency. A role she has held since 2013. Previously she had held several other jobs in the Environmental Agency including leading a data team and both managing a team and acting as a technical specialist to regulate industrial activities and implement European legislation. Becky is a qualified chemist and initially joined the Nestle through their graduate program before working for Cadbury's and then the Environment Agency. Very interesting, very nice. And now let me introduce the speaker of the series Donna Burbank. She is a recognized industry expert in information management with over 20 years of experience helping organizations enrich their business opportunities through data and information. She is currently the managing director of Global Data Strategy Limited where she assists organizations around the globe in driving value from their data. She has worked with dozens of Fortune 500 companies worldwide in the Americas, Europe, Asia, and Africa and speaks regularly at industry conferences. In fact she was just with us at Enterprise Data World last week. And with that let me get the floor to Donna to get today's webinar started. Hello and welcome. Thank you Shannon. Always a pleasure to do these and thanks for some of the familiar I would say faces but names although some of the folks as you mentioned we are able to meet last week at Enterprise Data World in Boston. And for those of you this is your first time on the webinar series welcome. The good news but any diversity webinar and always the top question is it's this recorded and yes so any of the past webinars that you see there on data architecture or data strategy are on demand on both the Data Diversity website as well as the Global Data Strategy website and any of the upcoming topics that may be of interest to you we'd love to have you again. This particularly I'm excited so Becky Russell and I work together on this data model with several others on our team and data modeling is always one of the hottest topics at Data Diversity to have sort of a real-life use case I think is sort of fun and I think it's as you look through it's a really interesting application who doesn't love animals and water and nature and so it's a little different than your standard banking or you know industry use case so I thought for many reasons it's a really great story of how we actually got some scientists and you know environmental chemists and things like that actually using data models so for those of you who say business people don't and can't use data models I here's a great example where these were really business stakeholders using data models to do their job so if we could just go to the next slide I you've seen this slide before I've showed this at a lot of our you know presentations and what I thought might be helpful is to show you we I talk a lot theoretically in these different sessions about things like strategy and governance and architecture and metadata and modeling well this webinar today will actually be some real-world example especially around data architecture but you'll see how it tied into a larger metadata management and things like standard lists and you'll see a lot of some of the actual metadata standards that were used for the environment agency to really help them do their job which helps with governance right so this was sort of Becky will talk a lot about how people sort of self collaborated together to really get some of these standards really buying into that business strategy so I use business here but this is obviously a government organization and their business is the environment so this was a really great use case for some of these building blocks we've been talking about over the past months and years for some of you so we go to the next slide many of you who are data modelers on the call will sort of recognize this pyramid everyone has their own version of it we'll be talking about mostly today the ones in the blue so we started up with a conceptual layer we also went to the logical layer because if you've heard me speak or others who speak about data modeling it really is those levels about communication and defining core concepts and yes we also will show you we'll talk a little bit about some of the physical implementations and we did do some reverse engineering as well but the main focus here was on communication in fact when Becky first came to us years ago now I loved it because it was a you know Becky as Shannon mentioned the beginning is a qualified chemist she's a scientist you know she's technical but she's not technical as we say in sort of data modeling and she said you know I've heard about things called data models and I don't know how to build one but I know I need one and she was sort of the best learner very quickly because the best thing about data models is they're just very logical and very intuitive so Becky knew the business we knew some of the artifacts and I thought that was a nice combination of and this happens often and it happened with some of Becky stakeholders she'll talk more about that of you know it took a few minutes you know literally five or ten minutes to explain a model and then people dove right into it and because you were talking about their day job and we were able to help with that so if you go to the next slide you've heard me say this before if you've during the the meeting of the webinars um the beauty of these high level data models is that they do tell a story so this is kind of a facetious cartoon from one of my books and yes there are data modeling cartoons and my co-author Steve Humberman claims that he actually reads data models to his children the poor things um but in a normal business environment when a data model is successful part of the reason is so successful and so easy to understand is that you're using quote the language of your audience right we're not getting into tables and columns and you know although we did in this case that's not what we showed to the business user we showed their terminology their story um and and if you who are data modelers or have done data models in the call you'll realize that that language can be different so what one person might call actually it was funny I was laughing about the song before we joined potato potato you call it one thing someone else calls it another that's classic with data modeling and we ran into that at the environment agency where in that I don't want to steal Becky thunder but that was a lot of the aha moments of different teams were really doing similar things and calling them something different or calling them the same thing with a slightly different meeting so a lot of the effort was around getting that common language so if you go to the next slide um I want to pass it over to Becky to really talk about her work at the environment agency and how we kind of use that language to really get the stakeholders together so I'll pass it over to you Becky brilliant that's lovely thank you very much Donna so um hi everybody I'm Becky Russell and I'm what's called the national lead for data standards at the environment agency and I'll talk a little bit more in a moment just about the environment agency in our work um but as Donna said I am technical I suppose but from a science point of view um rather than a data modeling point of view and I think um you know that will become evident throughout the conversation through the presentation that actually yeah I've got technical skills but actually the data modeling you'll see how we sort of had the moment where we suddenly realized that we needed a data model um the other thing is around my role at the environment agency is I'm very much focused or I started off with my focus around standardizing data which literally meant kind of the terms that we use to describe things or the way in which we record location so it was very much down in the detail of the data not at the data model level but again it became evident throughout our work and throughout our story that that was where I needed to move to so just to give you a little bit of background to the environment agency because it kind of puts the whole kind of story in context so we are an agency of about 10 000 people um regulating the environment for England only so Scotland, Northern Ireland and Wales have their own organizations um and we have a huge breadth of kind of activities that we do so we regulate industries from very small sort of factories maybe doing some sort of surface metal treatment works through to intensive agriculture through to the biggest power stations that there are in in England um we have a particular focus on the waste industry where we do quite a lot of prosecution and enforcement work actually because people like to throw their rubbish in places that they're not supposed to and we do sort of more positive work I suppose in the sense of we protect habitats for wildlife and we sort of manage wildlife we reintroduce fish stocks to rivers you know and we kind of make sure that the habitats are maintained we also get rid of invasive species and things like that to make sure that our habitats are stable we also do a huge amount of monitoring um of air, land and water quality um and we're responsible for example for publishing that data particularly in the summer months for bathing waters around the coastline of England where people want to check if the water is clean enough for them to go swimming in I think the problem often enough is whether it's warm enough but people seem to want to know whether it's clean enough um we also regulate sort of recreational activities around water so fishing, rod licensing and boat use um and finally probably our biggest piece of work and the one that we're most well known for in England is our flood defence work so we have a huge portfolio of predicting and modelling the impact of floods um building flood assets to protect our homes and our sort of you know utilities um but then also responding in the event of a flood so we have a huge batch of what we do I work at the national level um so I kind of work across all of those potential activities or can be asked to support a particular activity at a particular time so we have got a huge sort of number of activities that we do so what does that mean for our data uh so this isn't a particular a picture of any particular data set or you know database it's just a representation of what our data looks like it's also much too simple we have about 850 applications holding our data so we're much more complex than it looks here what's interesting to note though I think is that we arrange our data in the same way that lots of other big organisations do which is around the activity that the data is held for or within a particular business domain um we're an organisation of about 20 years old um and I think 20 years ago it was fine to hold data in silos and manage it in that way increasingly we now really need to join the data up across both systems we need to get a big picture of what's happening in the environment um and one of the big problems that we've got one of the big barriers to doing that is both the definitions and then the standards that actually apply to the data underneath that as well you can see here it's not particularly spectacular but the word catchment sort of comes out of the screen at you so catchment is a really important concept for the environment agency it's kind of it's a it's to try and describe it it's like an area of land from which water flows into a particular water body so it kind of talks about you know if you have pollution in that area it can end up in that water but it's a really important concept and it's used across many of our business domains for monitoring improving environment measuring things reporting um and we knew we actually had a problem um with this data because and you wouldn't believe it but there have been some very very heated debates about whether sections of river should be called upper avan or avan upper but there have been plenty of debates of that nature so we knew that this was a problem and actually one of the very early pieces of work that we worked with global data strategy on was a catchment standard so again talking about standards rather than models very much at the bottom up um but what was really interesting was when global data strategy did the business analysis to try and understand what was happening and you know why the lists are in a different state we actually discovered that we had a fundamental misunderstanding of what we meant by catchment so there was a whole group of people who kind of had the idea that it was some sort of hydrological area there was another group of people who thought it was something to do with environmental monitoring other people who thought it was to do with water abstraction other people who thought it was just there for reporting purposes and finally someone who thought it had something to do with schools which was quite an interesting concept so I think we realized at this point that actually you can't you know solving the problem of whether you call it avan upper or upper avan is not really relevant at this point you actually have to come to a single definition of the word catchment and I think this was kind of the point to which we started to twig that definitions were really important however I would say in the environment agency that actually in addition to the definitions and of of those entities the data that we capture about those also has to be consistent so the picture you can see on screen there is the head office of the environment agency based in Bristol down in the southwestern England and it's known as horizon house that's its name oh yeah and five other names as well so it's known by six names I think the building is only about seven years old so I'm not quite sure how many names I'll have by the time it's 20 but clearly that is actually a problem now as a human we can interpret that those names are the same building but a machine can't and we you know we have to be able to report on things that are happening around the building or people have to find the building and talking of finding the building we even have to control things around location data so that's the building actually plotted using three different coordinate reference systems and the building moves now we're fairly close to the data points are actually the difference between them isn't huge but it could be the difference between receiving a flood warning for example which is based on location so those are kind of the problems that we face it's very easy I think to to look at these problems the academic and theoretical problems so yeah it's nice to look at pictures of data and say we need to join this app or to look at different definitions and say yeah they should be the same but that kind of remains sort of like yeah that looks like you know theoretically a good idea to do but to actually make these definitions the same takes a lot of business effort and a lot of business energy so actually we need to understand why we need to do it what's in why does why does it matter that we need to standardize our data I won't talk in too much detail about this example but just very briefly this is us again with our flooding sort of responsibility so in the event of a flood event so you know the rain is falling and the rivers are rising we are sharing huge amounts of information with a lot of people very very quickly and it's a very rapidly changing situation so we'll be taking you know measurements of river levels and river flows and taking pictures of where things have flooded and using that information to warn the public that actually there are flood is coming their way and they need to do something about it we are talking to our own teams to tell them where to put temporary flood assets where to sort of put the pumps in to get rid of the water we need to understand where our where our vans are where our equipment is we need to understand where our people are and what skill sets they've got we're also talking to government to give them constant situation reports for what's happening and perhaps most critically we're talking to other emergency responders as well so someone like the fire brigade and you can imagine that if we give a location or an address for a property that needs to be evacuated that is misunderstood by the fire brigade because they have a different standard way of representing it we could end up evacuating the wrong properties which could have some you know severe impacts even after the event the data is actually really critical we use it to plan and to decide where to build the next big flood defences so the picture in the bottom left there is actually one of our big floodgates outside the city of York we also share the data with insurance companies so they can actually make better judgments of insurance properties and we work with local authorities to improve the overall kind of flood response so there is a real business need for us to do standards and I think it's important to make sure that you know we don't get lost in the academic sort of situation so just moving on so I'm kind of sort of telling you I suppose the story of the particular example that that we have done in the environment agency so we've sort of gone on a journey I suppose to try to talk in a common language about chemicals so I'll try and sort of explain the journey we took both a little bit technically but also how we gained the business support and kind of brought them with us on this journey which was essential so the business actually came to us with a problem we were building or the development agency was building a new application to gather all of the monitoring data from our regulated sort of operators so we've got lots and lots of people all you know operating little factories or landfill sites or waste sites and as part of their authorization to you know to carry out their activity they have to send us regular information about the chemicals or the other parameters that they might be emitting to the environment and we have got thousands and thousands of these operators and at the same time we are also doing an awful lot of monitoring in the environment ourselves so what are the chances that all of those thousands of operators and all of our own staff are all using the same chemical names for the same chemicals yeah not a chance not a chance so this project approached us and said you know we need to be able to standardize these chemical names to a single list so that we know what we're talking about and that we can share information and we can you know understand the bigger picture so I thought well okay this will be this will be fine you know I am on the national standards lead and actually I'm a chemist so this this shouldn't be too bad so um I thought chemicals might be quite easy might be quite nice how wrong could I be um so first of all um chemicals themselves the chemical list is actually inherently complex um there is no single international master name or master list of chemicals or master identifier so actually there are aliases within the sort of global community for chemicals there's also a lot of hierarchy and groupings so it adds to the complexity of that chemicals list but even more complex than that when we started diving into our own databases to get the chemical names out so we could understand what we got we also found information that wasn't a chemical things like conductivity the size of something the shape people on a beach I think has to be one of my favorite things that I discovered we measure we also discovered in their information about the methods used to sample or to measure for those chemicals and we also actually found the units of measures themselves all kind of squished together in a field called chemicals but not to be put off I thought okay well we can still create a list nice controlled list of chemical names so we took out all the erroneous information we took out all the things that weren't chemicals we took out all the monitoring methods and we took out all the units and we do duplicated the list and to be honest we felt quite proud we had a nice clean chemicals list and we gave it to the business to ask them to check and the feedback was terrible and it took us you know a little bit of time to sort of understand what has happened and actually what has happened was this we discovered of no common understanding of the words or the concepts or the entities around chemicals or measurement at all so there was a big disagreement about chemicals versus parameters results versus set of values measurements versus monitoring and then what about the non-chemicals actually they do need to be represented and I think what we finally realized was that all of these applications had different data models and to add complication because they were quite old in some instances where the business requirement had changed slightly redundant fields were being used to capture data that really didn't relate to the name of that field at all so there was quite a lot of misunderstanding so an absolute recipe for miscommunication and this is why the feedback on that list was so bad because everybody understood a different thing by chemicals so I think this is the moment where Donna talked about earlier where the penny finally dropped and I kind of went oh I know this isn't right I know we've got to do something and I think I need a data model so I knew that we needed to understand the concepts around chemical measurements and we needed to identify and define each entity really clearly so that we could say what was a chemical and what was the unit of measure and what was the measuring of measuring method and we needed to understand their relationships and their logic but we did also face a problem I realized that that's what we needed that all of the intelligence that we needed to kind of build that data model the people who understood what these things did mean or perhaps should mean were all out in the business all really busy people with their own objectives and their own targets trying to do their own work so how are we going to bring them on board and convince them to give us some time to work with us on this data model so before we kind of leapt in we took a moment just to I suppose decide on some sort of principles of engagement I've called it and I think the first major one that we decided was we were going to take a consensus approach so we were going to try and create this data model using the consensus of of our business partners our stakeholders I have been criticized for that I have had people ask me why I don't just take an external model and basically mandate it force it onto the environment agency but actually the business culture is not right for that it wouldn't be at all successful the second kind of principle we have is that we're going to actually solve a real business problem there is no point in choosing something which is theoretical it actually has to be a real business problem which is hurting the business now that they really want to solve and kind of related to that is the fact that we want it to be business led we want them to be asking for this and to be needing it and having the problem not us just sort of really tapping them on the shoulder saying that we think they should have this they really need to want it the other principle we had was to use our existing networks we're quite fortunate we have a very well established data custodian network within the environment agency so it made sense to add sort of data standards and data modeling to their to their remix and their responsibilities because they would be familiar with you know being responsible for data and it also gave us a group of people to start talking to we also wanted to be very transparent about how we were working so we had to think quite carefully about the technologies of the platforms in which we might share information with the business because we wanted everything to be in the open we sort of didn't want to go away and do work behind closed doors and finally and this is a crossover from one of my sort of technical principles around both data standards and data models but it was useful here as part of our engagement principles as well is that this applies to new it only and again that was really important with the business because it enabled them to sort of what's left behind their concerns about their current systems and how this might fit and how it might work and it stopped them being kind of protectionist about their current data and it allowed them to sort of free up their mind a little bit and think about actually for the future if we were to do this properly what would that look like and I actually think that was a really important distinction to make so having got our sort of principles we didn't quite know how we're going to make all of these things work but we have these principles about how we wanted to work we needed just to take some first steps so the first thing was really if I wanted to have people's time I actually needed to get the support of all their managers and directors so I spent quite a lot of time going out to management meetings and directors meetings to to explain what we were trying to do to explain the importance of what we were trying to do and the problems that we might be able to solve so that they would basically allow me to call on the time of some of their staff once we've done that we made sure that we communicated widely across our business using sort of existing networks and people that we knew but also using things like internal social media platforms to try and to try and access or get to all the people that we needed to and finally once we sort of got this big group of people I suppose we kind of whittled it down a little more into a much smaller group of specialists with whom we were going to work to really take this data model forward and it was at this point that we actually realised we needed somebody to come and do data modeling I'm not a data modeler so at this point we actually needed the support to come and do the work with us and this was the point at which we were fortunate enough to be putting contact with Donna and Global Data Strategy who not only are data modelers but critically understood the importance of the business culture element of this for the environment agency and actually were very supportive in working with us to make sure that our business culture sort of part of this package and part of this work was successful and I think Donna I'm handing back to you now just to talk a little bit more in detail about the work you did with us so I think Becky talked about a lot of the concepts that we data modelers often face and that one of the questions I often get is where do you start do you start from the top down with the business language do you stop from the bottom up and where do you begin and the answer is yes right so and I think the examples we'll give and that Becky already did give really was a good example of doing both so sort of that top down where we spent a lot of time and Becky did even pre-work before we even came on site you know with getting buy-in from the stakeholders understanding all of the different terms and ways of looking at the world existed in the organization and making sure we got the right breath of people because getting those core terms is very important but we also looked from the bottom up as Becky mentioned there were a lot of different existing systems and that's where some of the different terms did come because each tool used its own technology or its own data model as you're aware so what we did was really that iterative approach and I would feel I felt that both of them were equally important so and we'll go through this in some examples of sometimes when there was a term we weren't quite sure why that term was used or why there might be difference we went to the physical implementation and sometimes that helped give us some clarity sometimes conversely when we're down the weeds and looking at the attribution of these it was helpful to sort of zoom out and and look more at the bit what are we talking about anyway these look like they're different fields but and Becky was really great with that of sort of how let's step back is this the same as a measurement or a sample and and you know I know that sounds so strange um when you're outside of it because these are just simple terms we use every day in just the regular world but when you're thinking of scientific measurements it's a big deal um I saw one of the comments in the in the in the chat and I think Becky addressed it as well if you know can't you just take an industry model well with any industry models that doesn't match a way of working of any organization so of course they're always good to start with I would say regardless of the industry or the organization but every organization is unique and even with standards which one right there's many so I think standards are great we did reference some we did take a look at other other I know when we were building the model we looked at the US we looked at some of the EU other agency that were doing this work as sort of a reference so we did not do it in a vacuum but culture as Becky mentioned is so important and you need to pick the right battles and use the terms that people are using so Becky mentioned that you know her sort of mantra was you know don't take something the standard and force it upon people so there's that cultural aspect and I think everywhere that's the case or very often that's the case of you know people have already been working a certain way so pick your battles if it works you know 80% is right keep that 80% and tweak what you need to so it was an iterative approach and if we go to the next slide um tada this is a subsets of the actual model um it may look different than a lot of other models you've seen that partly with a lot of the sub typing and supertypes so I often get the question you know in or so we get academic in the modeling world oh you would never show subtypes or supertypes than a conceptual data model and this is a conceptual data model and I use them all the time because I think this is the one of the easiest concepts to understand and we did a little data modeling workshop and you know data modeling 101 with each of our workshops and it lasted all of 10 minutes or five minutes you know you could have I don't know a person could be an employee or a customer they can be both or perhaps they cannot be both you know super types are very easy to understand and that really helped us doing this detailed sub typing really understand we have a measurement we have air measurements and soil measurements and water bodies and organisms and and if we step back at its core are we just measuring things and and we went back into the attribute level to really understand which we'll we'll go in a minute but I think just really going into the detail to sort of see what's different and then what's the same helped us with some of these very high level terms and and you know arguing about things like always a monitoring event different than the measurement we're monitoring and the sample from that event that those three entities I think took quite a bit of time and what about the parameters that we're monitoring in that event and these are very important when you're sending these terms back so if we go to the next slide we'll show you an example of some of that detail um actually we won't we'll talk about some of the workshops um and this is a great Becky talked a lot about getting the buy-in and I think one of the biggest successes of this was these workshops and I think we were all correct me from around Becky I think we're all a little nervous going into the first one um you know Becky mentioned that she didn't do data modeling every day and had a day job but saw the value but she has sort of already had that light bulb go off before the meeting and so we were challenged not only sort of moving people's cheese as they say right they're used to doing things the one way and now we're going to be asking them in some cases to change but also introducing them to a completely different concept that you know we're afraid might seem too nerdy um and I think people grasp onto it right away so we really did this is not an actual picture of the scientists um no just kidding um we we actually did sort of a we had a draft model so we often get questions you know where do you start you do just white boarding do you have something to start with I think it was helpful we had a draft we had done our research we did sort of a bit of a top down um and a bit of a bottom up and then we had enough to start but we were flexible we did not come in and say this is thou shalt this is how it is we came in and we asked um and I think that news I think that that's you know the attitude I think went a long way and we we did have a lot of arguments and personalities and I think it probably took a few workshops to realize um I don't know uh to realize that we were on their side and I think showing that we did make some changes and listened in and it wasn't just the workshops I think we had a lot of pre-meetings before the workshop and a lot of post meetings because if you're working with people as you know people react differently in groups as they do individually so I think you know part of it was the pre-work Becky did with managers and the entire team it was pre-work our technical team did with people individually to really understand and listen we did a playback and we iterated so the iteration wasn't just with the top down and bottom up it was also with the business stakeholders and listening and and repeating and I think that was a big part of it so if we go to the next slide uh this was the more bottom-up approach um and when we sort of talked about the um those some of those super types and sub types this I know was very helpful to me not being a chemist I liked chemistry in school but we can only do so many things in life so I that's one of the reasons I found that the other side of my nerdery came out I found this fascinating um but what we did was really look at a lot of the ways people were sampling and taking these measurements so that might have been systems it might have been existing standards and we really went down to the attributor or column level in the database and that's often how we found out were these things different so um you know how is it is an air sample the same as a water sample well there's certain things we have a parameter and we have a time and a place and the asset that monitor it and and etc and the value so in that sense it was different um but you know the source stream of that air air type or the emissions type etc are very different than maybe a land body or a water body so we we actually were able to kind of use those attributes to help us understand were these things the same I think one of them had to do with water bodies and we thought they were the same thing until we started looking at the the values and I think one of them was talking about really the shore and how high the shore was that's really the the land body associated with water not the water itself though you know it's not the values of the chemicals in the water so again it was a mix of going down into the details of the systems of the you know the actual measurements people were taking and then going back up and really understanding how people spoke you know in their daily job so if we go to the next line um this was even going a higher level up so in that pyramid we should have talked about you know conceptual and logical and this was more at the enterprise conceptual layer so in this partly um we should have nerded out extra just this wasn't something we planned to build um when we were working with Becky but partly I know I need to do this and some of our consultants felt the same way when when Becky was was talking about the organization earlier on the call my mind and I'm sure some of you in the call admitted it um you should have started doing entities and attributes so what do we do we do flood protection and we have you know um local authorities we work with at certain locations and whenever I hear an organization I start to create the nouns and the first thing I have to do when I work with any company is really understand what it is right so that little introduction that Becky gave you she gave us and we sort of did this very high level model and one of the reasons so there were several reasons one is just it's a great um education for us if nothing else or someone new coming in what does the EA do it also helped with prioritization of what what what piece are we working on because as you know when you go top down and bottom up the easiest thing is to sort of keep iterating and then you start to get out of scope oh what about this why don't we and we had to sort of keep ourselves in check through this model you know we're looking at the natural environment at the bottom and we're trying to do samples um but there's a lot of other pieces so that helped the other thing it helped when we step back in the prioritization Becky will talk next about some of these data standards and I think it sort of also helps so what as they say about some of these models they really helped build some of these control lists and so we did this mapping and it's also kind of a future state you know what have we done what do we need to do in terms of when Becky was talking about the flood sources control list or the what is the catchment and I know that sounds you know any of these terms when you say any organization what's an address what's the catchment what's the location and why are they different things can seem so academic and abstract until you start to use some of those real world examples and I think some of the real world examples Becky mentioned of you know this actually has an impact of you know someone may not be evacuated from their home so I think that also kind of puts in perspective so I'm going to pack it back pass it back to Becky to kind of talk more about these controlled lists that were generated as well brilliant okay yeah thanks Donna you'll be pleased to know by the way Donna the ones you've identified on their local authorities address have been done location is nearly done and catchment is in progress so we have actually used your prioritization to work to help us drive things forward which is great so and yeah so just moving back so as we said we did uh I'll talk a little bit about creating the control list very briefly um so yes obviously there's an awful lot of work done to do the data model and I think I will touch on it a little bit at the end but I think what's interesting about that whole process about doing sort of the top down and the bottom up and working with the businesses you know we started out really sort of having to bring the business with us and I think it's fair to say that by the end of the process they are and they still continue to be incredibly keen and incredibly supportive of what we're doing so I think that's been a real success of the project um and I also think it's just educated people around data models and the importance of definitions so there's actually now people out in the business who are also shall I say sort of you know supporting that and you know blowing the trumpet really for for data models and data definitions but just coming back a little bit to sort of creating the control lists or the standards because ultimately that's where we started from we wanted this chemical standard so we've got our data model um and as Donna said you know you can use it to identify where you might need standards or control lists and indeed the measurement one does but we already chemicals was already was one of those control lists identified and we already knew we needed it so that was quite clearly our next step but I suppose really the question was do we go back and create a list of single chemical names now we've defined chemical and we know what it is can we just have our list with a single name and identifier and what have you on it no actually it's not that simple so although we've got our nice definitions and we can now say what a chemical is and what it's not and it is not people on a beach that is not a chemical you know nor is it a unit of measure nor does it include kind of sampling methods um um sorry I've lost my lost my train of thought there um I've gone completely blank sorry Donna what was I saying you were talking about though trying to get a list of the chemical substances and it's not somebody on a beach and it's not a sorry yeah so even though we've got the definition sorry thank you Donna even though we've got the definition absolutely you know nice and tight and defined we can't just create a simple control list the chemicals list itself is just too complex and I think I mentioned at the beginning that actually there are multiple aliases that are quite valid in international sort of chemical communities it is quite normal for different scientific communities to call the same chemical by different names so there is no international standardization and I think we've realized through this process that actually there were domains of the environment agency which needed to speak in a different language about the same chemical so if you're working um in farming for example and regulating farmers you're going to call a pesticide by a trade name if you're in the lab monitoring for that pesticide in the water you're going to call it by its chemical name probably but actually trying to get the two to speak the same language doesn't work the other thing that was came out of the sort of when we were talking to the business was they needed to know which groups these chemicals belong to a really important concept so is this a pesticide is a persistent organic pollutant is it an annex for pollutants or some other legislative requirements they wanted to know which groups they belong to so um it was so what we had to do was effectively create a slightly more complex list and then use the concept of a register which is where you can link control lists together so very simply we took really this is right down in the data level now we took uh names from say the regulation domain and chemical names from the water quality domain which you can see sometimes are the same sometimes they're different and then actually understand how they join up I've made it easy for myself this time they join up very simply now clearly that's a big piece of work to do that understanding but instead of just leaving it there with the translation table which is useful up to a point we kind of looked to add external identifiers which might be useful as well but again we couldn't find an external identifier which was which actually identified every single chemical that we need so we've decided to add our own environment agency global ID that's not actually the structure of it it's just an example of how you know of the fact there's a there's a nonsensical identifier on there just a globally unique identifier so this was kind of the idea is actually would build up slightly more complex controlled lists which would allow for aliases to be represented but each alias would still have the same global identifier so in the future when chemicals are used not only is the chemical name recorded they must also record that globally unique identifier and that's what we can use to join our data in the future and the label can therefore be a bit more domain specific because we know we're talking about the same chemical so the advantage of these things is actually we can start to add lots and lots of other domain names if we need to we could even start to identify a preferred term or a mastered term within the environment agency which again over time we could move to because as long as the identifier remains stable we can start to get people to change their data but interestingly I think this is a bit which starts to get interesting and starts to be more like a register we can hold another list of chemical groups with a global identifier and link the two lists by their identifier so then you'd be able to say that cyanide via its identifier belong to a particular chemical group and you can start to represent the hierarchy and the complexity of the lists that people needed and then finally of course you need start dates and end dates so that you can manage life cycle and changes so that's actually kind of our approach to solving so you know we've got our definition we did the data model we got our entities defined we've got all the nice definitions and now this is actually how we're trying to solve the puzzle right down at these definite at the at the term level so this also I suppose the experience we've had on this journey we're still in the middle of sort of finalizing that register we've done we're probably about 90 percent of the way there and we've got to do some checking with the business that we are sort of nearly at the end so this kind of whole story has changed our approach I think to how we're doing data standards we've learned quite a lot so whereas previously we thought we would just create a data standard which is what I tried to do the chemical list at the beginning and just create a controlled list we now realize it's much more complex than that so in some instances not every instance you also need a data model which gives you the objects and definitions and identifies the standards which then allows you to create them potentially using registers where we need to link controlled lists to get added value and then you can deploy it into it so it's actually given us a much more enhanced process and that's much that's much more appropriate for what we're trying to do the other thing that is really useful again this is working with global data strategy and this this methodology is global data strategies I don't take any credit for this but it's a it's a nice sort of repeatable process originally designed for producing our data standards but can equally apply to data models or registers and it just sets out the steps that we will take in order to develop and define standards or models or registers it makes the process repeatable but it also has helped us again with that business buy-in if we can show the business and get them to agree that that process is acceptable when we follow the process they've got a lot more confidence in the outcome so just moving forward then to some lessons learned from this journey and I think we did learn quite a lot you know we set out with those principles of engagement and we weren't necessarily sure how we're going to implement all of them and I think we learned a lot through working with global data strategy and that whole process we went through which must have been it was about six months I think we sort of worked with global data strategy for on that particular project so I think the first lesson that we learned is and I think Donna said this many times but talking the business language is absolutely critical the minute that something starts to look technical or unfamiliar or requires some real hard thinking to make them sort of imagine how it applies to their own business area to be quite honest you just lose them and therefore they just disengage creating these virtual teams is really powerful again I think Donna mentioned that sort of at the workshops but we even did that through sort of webinars as well and what was interesting was when we were working with our stakeholders you know we kept them as a group we did webinars we did workshops we sort of kept them on you know email groups together and it got to the point where you know they were arguing things amongst each other rather than arguing with us which was actually quite a relief sometimes it was quite nice that they did that that was a really powerful thing to do again Donna's mentioned this but the webinars and workshops absolutely essential there is only so much you can do electronically and although you have to be careful with people's time for workshops and they do take investment actually what you get out of them is you know really really important the other thing and again Donna mentioned this before but more shall I say from a technical point of view how the top-down and bottom-up analysis was needed to deliver the model technically I'd also echo that but say this was really important to the business engagement so the top-down type approach gave the business they felt like they had been involved they've been spoken to their input has been made of the data models they were happy that you know they'd contributed but they also wanted to see us do the bottom-up analysis to make sure that we had gone to the right level of detail to make sure that this was correct so that was really important again from business point of view we learned that to be honest we just have to keep talking to people people are very busy and if you go silent for two or three months and then come back three months later they can't always remember what it was they were doing so we needed to just keep telling people what we were doing and reminding them why it was important and finally the other thing was we must actually we had to act on their feedback even if that was to reply to them and just say we've received your feedback and actually we don't quite agree with it for this reason but what we must do is continue to act on their feedback you know you ask people a question the first time they give you an answer if nothing happens or they can't even see that their response was valid then you go back a second time and they're not really prepared to work with you again so it's really important to do that and then finally this is my last slide you might be pleased to hear just some successes and benefits of this project it was you know we did go down some rabbit holes particularly I did try to do that chemicals list in the first place but it was part of the journey to get us to where we are now and along the way we've had some really big successes I think I think one of the big ones is we have started to solve problems that the business knew they had but they didn't know why and I have been into meetings where you know there's sort of data people in a meeting room and I walk in and they know I'm going to talk about data standards or data models and they've got their arms folded really defensive body language and after about sort of 10 or 15 minutes they suddenly realize that this problem that they've had with their data that they haven't been able to solve is all because there's not been a data model and suddenly the business that you know the body language changes and they really engage with it so that's been really really important the other thing I think we've mentioned a few times now but the business support has been absolutely excellent and I think that has been a success of our business engagement and how we work with them but we've had incredible support from the business and we continue to do so we continue to get emails you know messages now sort of you know asking where we are with things or offering for you know to help us when I sort of when the penny drops and I sort of spoke to Donna about doing a data model I think I was the first person who'd mentioned data models in the environment agency we've now got about four or five projects developing data models lots of people are coming to ask for them so it's really interesting how I think those people that we spoke to in the business have gone out and have you know sort of you know sold the story of data models and other people are now asking for them I think in the future the work that we've done now the journey we've been on will save us quite a lot of time it's very easy for us to recognize now and a data standard just isn't enough and we actually need to do a data model there are still some standards that can be done alone but we're much quicker to recognize when we need a data model and that's saved us an awful lot of wasted time and finally this is probably a benefit to the future really because we're we sort of a regulator on behalf of UK government we are required to make all of our data open unless we've got a good reason not to which means all of the controlled lists and the registers that we build will be made available publicly where we start to standardize define and create controlled lists or registers for concepts like catchment which are used outside of the environment agency as well we're actually starting to put together the building blocks for improved data across UK government as well so the benefits extend much beyond the environment agency so that kind of brings me to the end of my story I hope that was useful I hope there's some things in there that perhaps you know rang true with your own organizations or perhaps there's been some approaches there which you know you've taken or perhaps some new approaches that we've tried I think it's important to say that you know this is a long journey this is the start of a journey for the environment agency but the work we've done here has been really critical to add to our future success and I think I hand back to Donna at that point sure yeah thank you so much and just adding on one thing that Becky said about the workshops I want to just iterate because I think it's so important is that you know a lot of having what you the one on one was important but having other people hear from each other that they have disagreements and work it amongst themselves a a you the data modeling team less becomes less of the issue but I think people have their own sort of light bulb moments once they hear that other people did see it differently often you forget this as a modeler you've talked to the six different groups they haven't talked together so you might tell them that somebody else has a different definition but it's really helpful for everybody else to kind of come along the journey with you and as Becky mentioned kind of give that continuous feedback was I think super important to this one so this will be on demand for those of you who want to watch it again or share it with a friend next month we'll be talking more about data management with our other special guest Nigel Turner who is actually also on this project with Becky and me and we'll be talking about some of the fundamentals of data governance and so at this point Shannon I know there were some questions that came in if you wanted to sort of open it up for questions there's a lot of great questions and if you have questions feel free to them in the bottom right hand corner in the Q&A section and just answer the most commonly asked questions just a reminder I will send a follow-up email to all registrants by end of day Monday with links to the slides and links to the recording of this and so let me just dive right in here Becky this question for you any international environmental standards you are following in practice and the respective challenges of that so yes we do we do look to the external kind of international standards and particularly European standards as well I mean there is a I mean particularly for chemicals there is a European list of chemicals but again it didn't quite sort of cover the extent of what we were trying to do in terms of the data models again there are standards out there but they just don't quite fit what we're trying they don't quite fit our business which I think was kind of one of the one of the challenges that we had so I think what we've done has been inspired by external standards but we've not been able to adopt them I think the other problem the other barrier to adopting external standards sometimes particularly where you get to things like British and ISO standards is their chargeable standards and we're trying to be open across government as I mentioned there at the end and actually there is a feeling that things like British and ISO standards don't necessarily fall into that open category because they are chargeable so you know we might end up sort of being odds with perhaps what with some of the things that other people adopt also issues in coordinating cross agencies and cross disciplinary standards and how is the data how does the data model help in that respect is that familiar for Donna probably both of you can answer yeah if you want to start and I can chime in or if you like me to start you start I'll chime in okay yeah no I think so you were asking about cross cross functional standards is that I didn't quite hear you Shannon yeah so there are issues in coordinating cross agencies and cross disciplinary I can talk today disciplinary standards yeah there we go and how the data modeling helps respect I think that was one of the key findings when we were in the workshops helped with that so when we had these workshops they were different teams so when we think of things like even just the soil or the water or the air samples so those are really different teams and often they hadn't spoken to each other and so we're had spoken to each other using different language so I think part of it was and I think Becky and I both spoke to a bit in the session was knowing when to pick your battle so in some cases it was important though you know what we're all where this will be an open standard we must use the same terms we must use the same control list in some cases I think we have to sort of you know listen more and be a bit flexible and that's why I think Becky's point of not just taking a standard and forcing it from above was really important but yeah this was very much across functional effort and I think that was one of the key benefits to really get that common language across at least a common understanding even if it can be mentioned when you know some of the chemical standards yes you can use a little different term because what's a brand name is different than you know a chemical name but as long as we have a common ID so as long as we can make that translation yeah and I think yeah yeah just thought I was going to say I think sort of where we went with that wasn't it was you know the definition had to be the same so even though there was lots of people from sort of across disciplines actually what we had to come to as a single definition what we couldn't accept was having lots of different definitions of measurement because you know that that just doesn't work what what we then worked hard to do with this kind of register's approach was to then allow the flexibility in the actual terminology used so down in the detail you know yes they can use different words for the same thing as long as they use our global identifier so that gave them shall I say the sort of their domain flavour but what we were quite sort of decided about of what quite forceful about I suppose was actually the definition had to be consistent so you know there have been some there have been some interesting conversations and I'm sure there's plenty to come from people who disagree but you know in the end the business benefit is there so and I think that helped to that last point you made I think people did have that common understanding that you had built even before we came on site that people realized that there was a need for this and then I think people can be a little more understanding if they have to make a change yeah so how did you find some quick ways to build support for this initiative gosh um good question uh yeah good question um to be on if I'm honest we actually knew um so within our organization because we are so we have we are so big um within each of those business domains there are kind of shall I say data experts who support their own applications and who sort of deal with problems of these applications every every day if I'm honest we targeted those people first because we knew that they would be uh far more easy to influence they would already be quite supportive of what we were trying to do because they already they were experiencing the problems with bad data every day so those were the people that we targeted first to bring them on board so those were kind of the quick wins in terms of bringing people on board in terms of quick wins for actually doing data standards I mean they're probably slightly aside but actually we found sort of standalone lists that didn't really need data models and started to create sort of small standards around those to be able to demonstrate business benefits so we kind of did two sides of that coin I think you're you're you're right too that once you you get one happy customer everybody else is going to follow along and you've already seen that right if I want some of that too so I think picking some of that is a good already already is supportive it's not a bad I push water uphill if you don't have to you know um we I think I have time to slip in we're just one quick question here so many great questions so don't know why did you say that people say you should never use supertype subtype relationship in a conceptual model my understanding is that what they shouldn't do is in logical modeling but they're an extension of erd I say that because people say that um I don't I don't agree with it I think they I do not agree I think they're an excellent place in the conceptual layer for that very reason that they're very easy way to sort of have you know simple concepts explained but I have gotten a lot of pushback so people can tell me why they say that because I think you know the misconception I've heard that that's too complicated or it's too technical or there may be thinking physical implementation but don't get me wrong I'm a big fan of using them and I do it all the time because I think they're almost a perfect example of that conceptual is it this or this or can they be both things at once I mean that's a very simple thing to put right in the model so I'm a fan so I love it well thank you so much Donna as always and Becky thank you so much for sharing your story it's very much appreciated it's been great and thanks to all of our attendees for being so engaged in everything we do I'm afraid that is all that we have time we have for you today just again reminder I will send a follow-up email by end of day Monday with links to the slides and links to the recording and as all the great information presented here today thank you both thanks everybody I hope you all have a great day thanks Becky thanks Donna thank you very much thank you