 Mae'r ddau'r ddau'r ddau'r ddau yn y cyfnodol, sy'n gyfnodd ar y cyfnodd yma, yw Y Aja Bion Bwndry. Mae'r ddau'r ddau'r ddau, yw'r ddau'r ddau, yw'r ddau'r ddau'r ddau, yn ysgorydd y myfyrdd syniadau cyfnodd y cyfnodd erbyn gwellion, ond ein bod yn derbyn â'r byw ar y byd cyfnodd y mewn lleol a'r byd y Ddau Hindu Ybnodd, y gallwn i'n sefyllfa yn y Cyfnodd y Gwrtyr yn South Asia, yw bod y ddiweddol y 4 ac y 5 yr Heddf. Mae gennym ni'n danio'n ddysgu i'w iawn, ac wedi'u ffordd o'r wneud hynny, efallai eu cyntaf bwysigio cyffredinol o rhan o'n rhan o'r llant, oherwydd teimlo'r braw. Dwi'n mynd i'r cyfreithio, oherwydd mae'n gweithio'n ddysgu sydd o'r rhagorau South Asia a chi. Mae'r hynod o'r ideaiddau a'r mynd i'n gweithio'r cwmwysigio ar gyfer y cyfrifiadau a'r teimlo'r teimlo'r tais. Ond mae'n rhaid i'w ddweud ar gyfer argyllid. Felly, ymlaen i'r projeg... Ops, mae'n rhaid i'n ddod, yna eich ysgol i'r ysgol yma. Felly, ymlaen i'r projeg i'r ysgol i'r unrhyw bethau ysgol, ysgol ffordd meddwl ysgol, yn Llywodraeth Central India. Felly, mae'n rhaid i'w ddweud i'r roi Gymraggau Glwwyr mewn iadwyntio. Felly, mae'n gofal ar gyfer argyllid ar gyfer y glwll ddweud o gyfnod a meddwl i'w ddweud yma, i'w ffilm愴 i weithio'n ganddau i'ch credu bod hyn o'r cyllid, eu cyllid, i ddaleth, i bywyd, byddau'n enghraifft, yn ddweud, i ddweud i'w ddweud hyd yn ddweud, yna, iddo iawn, bod hyd yn galwch o deilladau i'r ddweud ac yn ddweud, y ffordd y gallwn ymddangos. Llygwr, y ffordd yma yn ymweld y ffordd 30 ymddangos, yn ymweld ymweld yma yn 11 ymddangos ymddangos ymddangos 4 ymddangos. A'r ffordd yma yn ymddangos, yn ymddangos, yma ymddangos ar gyfer y ffordd 3 ymddangos. A'r ffordd, y ffordd ymddangos, y ffordd ar gyfer y ffordd ymddangos, y ffordd ymddangos, a'r ffordd ymddangos, a'r ffordd yr argyllid ymddangos, sy'n oed o gweithio gwybodaeth gwahanol, ac y gwaith ymddangos, of the wider environment and the space within which everything happened. Examination of all of these subjects generates lots of different types of data. I'm not going to list all of those, but what's important is that there's a complexity to data in this project. It's not just one thing, it's lots of different things. There's not only a breadth to the range of data that's being incorporated in our analyses, but also the sorts of information that they provide. We can, if we're thinking about data management, reduce all of that down to different categories of data. We have the basic components of the data, the physical objects, the material, the sites in which that material is found and the landscape in which it's all contained, and then the digital data that comes from that or is associated with that, which itself can be categorised into documents and texts, databases and spreadsheets, raster imagery and vector imagery. I know that what we're all talking about is digital data, and that's by and large what I'll be concentrating on, but I just want to make the point here that in any sort of archaeological project we have to recognise that it is impossible to separate these two broad categories of data, the physical data and the digital data. They only have meaning in relation to each other, and both are physically carried right the way through the course of the project, through that data management process. Obviously each of these different types of data have to be managed in different ways. It's not quite as simple though as managing all of the texts in one way and all of the spreadsheets in another way, because each type of data is comprised of different data pertaining to various different subjects of inquiry and all of the different questions we might ask of them, and so they all have to be approached and interrogated and managed in various different ways. So that means that a fairly complex data management plan has to be formulated, which governs every single aspect of the workflow from the collection and production of the data to its analysis, its curation, its storage, how it's shared and ultimately how it's archived, though of course there's considerable overlap in how each dataset is managed at any part of that process. So rather than detail the data management plan for each and every single type of data that exists in this project, I'm just going to talk through each stage of that data management process and highlight some of the complexities that are involved in managing that data. So the first stage in the data management plan is data collection. Data in this project is both collected and produced. I make a distinction here between collection and production. That's largely artificial. I use that separation just to distinguish two different stages in the workflow between data that is collected and becomes instantly interrogatable and usable and data that then is produced from other datasets through a variety of processes. In terms of that initial stage of data collection, data is collected through both desk-based research and field work. So looking first of all at data collected through desk-based research, that's essentially background archival research, the research that has to happen before anything else can happen. So looking at sites and inscriptions that have been found, pulling together existing, mapping resources and spatial imagery, all of which results in texts of inscriptions, in translation I must admit, spreadsheets of data pertaining to sites and artefacts in those sites, raster images, vector images, and those resulting datasets either become the subjects of analyses themselves, in which case they go straight to that next stage in the workflow, or they feed back into and inform certain decisions governing other areas of data that are collected through field work. Either way, much of this data is digitally born at the point of its collection. It's collected in such a way to enable its use by multiple participants in the project. So we all agree on particular file formats, which for the most part means Microsoft software specifically, for documents, spreadsheets and databases, Adobe for illustrations. For raster imagery, we use TIFFs and raw formats as standards. For vectors, we use various formats specific to the programs that use them. All of this is with a view to eventual archiving, to facilitate the ease of eventual archiving, but of course we recognise that many of the file formats that end up being archived will have to be changed. At the point of collection, data is collected according to predefined and commonly agreed templates using predefined terminologies, with no additional formatting that will introduce difficulties in file conversions later on in the process. Data that's collected during field work involves a whole bunch of different types of datasets collected in various different ways. We have archaeological surveys, we have excavations, we have sampling, all of which generates attributes about sites, artefacts from those sites, environmental remains, samples for dating, various environmental data. I'm not going to go into any one of those methods or datasets in specific. In digital terms, what all of that results in is a series of written records, objects, physical material and raster images and vector images. Obviously with the data that's collected in the field there is a lot more analogue data, if that's the correct term in these digital forums, that needs digitising, the paper records, of course the objects and physical remains themselves. That digitisation is done using similar protocols that are already in place that we've already defined during other phases of data collection. On top of that data collection then, we also have data being produced in four main ways. First through documentation, second through recording, third through compilation and fourth through analysis. The documentation stage is crucial. While we are collecting data in the field, we are documenting absolutely everything that we do. That documentation of what we're doing, why we're doing it, how we're doing it becomes another dataset in itself. It's really important that that documentation stage is in place and that that dataset is produced in an archaeological project specifically for the three main reasons. One is that archaeological practice is inherently destructive. You cannot ever repeat exactly what you are doing at that moment. You can never dig the same hole in the ground, you can never discover that same artefact that you're discovering. You absolutely have to ensure that a record is taken of exactly what's being done, primarily to inform other people who may want to return to that site, that they know how to interpret an absence of information in that space that you've been looking because you've taken all of the artefacts from it. They need to accommodate that in their own interpretation of what they're looking at. The second reason and the flip side of that is that so much of how we approach and interrogate that data is utterly dependent on where it was found, how it was recovered and why it was recovered. Third, because archaeological sites are being destroyed and landscapes are changing at a really alarming rate in South Asia especially, means that a sort of parallel point to the exercise is to record exactly what you're seeing at that moment in time and that then becomes a snapshot archive to help future generations of archaeologists who might come along in just two years' time and not even see that there's an archaeological site there. We're preserving things at the point of destroying them at the same time, if that makes any sense. For all of those reasons, there needs to be that detailed record of what's being done that then runs parallel to all of the datasets that we are creating. Data is also produced through recording. That's very much pertinent to the artefacts that are found through archaeological field work, the pottery, the coins, the monuments. They're all catalogued and recorded and photographed. What data and thus the amount of data that's recorded very much depends on and is dictated by the material that's being recorded and what analyses are going to take place of that data. Most of that data is recorded directly and stored in spreadsheets and databases, which one is used very much again, depends on the material type that's being recorded and what analyses are going to take place. At the same time, multiple photographs are often taken of the same object, which in some instances absolutely have to be taken in a very high resolution to record the level of detail that's needed. There are certain issues of scale there as well, not nearly in the same magnitude as the Buddhist project. If we just think about pottery assemblages, any one site would maybe have 150,000 sheds of pottery. Each one has about 40 different variables recorded and at least one or two photographs taken of it. Suddenly there's terabytes of photos, tens of millions of little bits of data pertaining to individual pot sheds, all of which needs to be managed. We rely a lot on implementing the same protocols that we've already implemented at the primary data collection stage, being ruthlessly standardised about our file naming protocols, the file formats that we're using, really trying to ensure some level of standardisation and integrity to the data within those datasets, but we also now start needing to link various datasets together in more meaningful ways. At this point of data production, we're also making sure that we're using unique object identifiers within datasets that are carried through into file naming protocols for photographs that can be linked together with the attribute data that's stored in spreadsheets and so on. The third type of data production, data compilation, in many instances, datasets are single entities. They can be looked at separately or in conjunction with one another as needs be, but we have another aspect to data in our project, which is the construction of and use of a geographical information system, which, in a very crude sense, is a compilation of numerous digital data into a single geodatabase that then becomes something usable as a resource that is larger than the sum of its parts. That GIS comprises all sorts of different datasets from aerial and satellite imagery, digital elevation models, to certain types of spatial data that pertain to various things in the landscape. They need to be created. Points, lines and polygons are generated based on pre-existing information and then also go towards the compilation of this GIS system. In the construction and the use of a GIS, there are all sorts of data management concerns. For instance, there are different processes that are required to make all of the data relatable to each other, converting them all into the same projections and coordinate systems. It's also crucial, really, for the use of a GIS that all of the spatial data, our points and our lines and our polygons are associated with the attributes that explain what they are. That necessitates not only having relatable spreadsheets that contain the values for that data, but also assigning metadata to particular shapefiles within the GIS. There are, thankfully, various criteria and standards and conventions in the management of that data that exist, so we're not having to think anything up ourselves. We're very much relying on decades of codes of practice that exist, how to do this. Then there's the data analysis. Data is produced during the analysis of the data that's been collected, the results, that's data, all of which then feeds back into other points in the data management plan. So some results of analyses are research findings in themselves and so feed directly into later data curation stages. Some results feed back into informing further strategies for data collection. Some results of analyses feed back into informing further sets of analyses and some of the data produced during analyses goes towards the construction of our GIS, which is then capable of forming extra analyses in itself. In terms of the analyses themselves, various types of analyses take place at various points in time during each stage of project activity. I'm not going to go into the specifics of each type of analyses, but just to give you some idea, we might talk about and think about the analysis of background site data, the analysis of the text of inscriptions, a whole bunch of different analyses that are applied to different artefacts and environmental remains. Sites themselves are analysed at various scales, that's GIS analysis. All of these different types of analysis that are carried out all follow, have to follow, set protocols specific to those types of analysis. They each have certain processes, they each have certain methodologies and all of that is again documented. We're sort of adding to this parallel data set that runs the entire way through, which is the documentation of each and every stage of the process. In terms then of data curation, the fact that data curation is coming after analysis is a bit artificial, of course. It's not just after analysis that we think about curating data, for convenience that it's positioned here in this order. Curation we identify as having three main elements to it. One is the storage of our data, which of course pertains to both the material objects and the digital data. The sharing of that data. We, within the project, share data via email. It's quite simple or using a cloud service. All parties agree to and work within certain parameters, preserving and maintaining data integrity, keeping to the same file formats, continually checking for errors. But because we work in so many different locations geographically, there are certain logistical problems when it comes to sharing the data. Not every single member of our project team has access to the internet, for instance. So from time to time, digital data has to be transferred hard copy. There is just no other way around that. That creates certain challenges both logistically and in terms of workflow and the timetabling of research. It's also importantly in this post analysis stage that that's our main preservation intervention point. That's where we decide really what data we are going to discard and what we're going to keep. And it's largely older redundant versions of files, which pretty much means incomplete data sets that we feel comfortable we can discard after finalised, clean data sets have been analysed. Data is disseminated, I suppose in the usual ways in academic formats, that is through reports, publications and presentations. All of the different foci of research and lines of inquiry that lead up to publications and presentations on specific topics have data sets appended to them. And they are, as well as the main outputs themselves, all uploaded to open access repositories, which leads neatly into the archival of data. Zenodo, as has already been mentioned, is our open access repository of choice. With having implemented and being so strict with all of our protocols of maintaining clean data sets from the very point of data collection, the turning them all into an archival format really doesn't take that much work. We are able to just upload data sets that lie behind publications and other outputs, neatly to Zenodo, and there we really take advantage of the fact that they are assigned unique DOIs to create further linkages between our outputs and our data sets so that they will always relate to each other and are thus discoverable and usable to anybody that might want to use them. There are certain challenges within this project. As I just mentioned, working across countries, we've identified certain weak points, specifically in terms of the transfer of data, ensuring that similar standards of data accuracy are maintained across these different countries is something that we are aware of, though none of those issues are wholly insurmountable. On a different level, though, and this is my last slide, all of this works really within this project. How all of these data sets link together, all of that's managed, very much makes sense within an archaeological project, in part because doing this is such well-established practice within archaeological projects. We're not having to reinvent the wheel here. This is how we run archaeological projects. This is how data is managed within archaeological projects and has been for at least a couple of decades. The next challenge, as I see it, though, is how we can perhaps take this project, this data management plan, the linkages that are made between disparate data sets within this project and establish links between other sub-projects that are part of this wider project and think beyond the boundaries that still separate them from each other. That's where we still need to do some thinking. Thank you.