 One of the use cases where all of this came into Production was the T-Reno terrestrial environmental observatory where I Will give you a brief introduction what it was about They had obviously to do with environmental monitoring. It's an infrastructure Initiative by the Helmholtz Association in Germany to provide Environmental monitoring infrastructure for the scientific community. So it's an infrastructure in place that people can then group projects around to use that infrastructure Construction started in 2008 and operations plan to run for 25 years. It's subdivided into four regional observatories and I was involved with the T-Reno northeast Partial or sub observatory with which has eight study sites 32 platforms and Until earlier this year 35 million data in entries from various sensors and more platforms being added and The other three regional observatories are of similar scale the idea behind this is that with climate change There will be some areas in Germany that will be more affected than others and Affected in different ways these areas of vulnerability had been identified and the regional observatories cover these Areas so they're located in the Alps and pre Alps there and the Western mountains Looking at river catchment then in the central mountains and lowland and then in the northeastern lowlands then the idea here is to look at the Interactions and feedback between different compartments and our ecosystem the atmosphere the terrestrial biosphere and the Terrestrial hydrosphere and pedosphere so the water and soils but also looking at different scales at in basically cubic centimeter scale all the way up to hold river catchments and To bridge the gaps between these different scales in the northeast this is fairly spacious spread out over Eastern lowlands, which is an interesting area from an Ecosystem development perspective because it used to be heavily farmed in the Middle Ages Where you can see the this cross section of a soil horizon in the lower right and corner where there's a medieval soil Covered by windblown material from later times or younger times because actually since the Middle Ages the Area has been decreasingly depopulated and very quickly depopulated in the past 20 years So that's very interesting to see how things changed from intensive agricultural use to almost natural park national park like situations today particular trade of the northeast observatories the use of geo archives that this means looking at lakes and At trees as long-term archives in the past to then look at Processes that happen decades or even centuries ago. So this means there's a lot of data coming together in from four different observatories that is collected into a common catalog and But the data are held in the four Local systems so the catalog then has to paint point back to the local databases and the Central portal should also help not only to discover data, but also to a visualization and Access and allow you to download data in the case of the northeast observatory. This is the more detailed system architecture and The type is a bit too small to read on the screen But basically it has two parallel branches has the branch on the left-hand side Which is file-based to keep a record of the science and it has a branch on the right-hand side for the services the data come in at the top from from the sensors in the field by FTP basically over mobile phone networks and Collected on an FTP server and when the data import tools Recognize that something new has arrived They start a workflow to import data to start with the left-hand side the data are imported into the Data infrastructure data storage infrastructure, which could also incorporate external data sense and This data storage infrastructure has a metadata editor front-end Metadata are mostly added as part of the import process Because at the time when they arrive, we know what they are so they can automatically be annotated But sometimes the metadata might need some editing So that's that is what can be done at that stage And then the all the different metadata records are harvested transmitted over the OIPMH protocol Open archives initiative protocol for metadata harvesting into the geo network Portal software, but in the case of this system geo network serves only one purpose To do the translation from OIPMH to CSW the catalog web service So that the catalog entries can then be served to other metadata portals based on OGC standards Like the central to Reno portal or the German federal data infrastructure or any other metadata portals On the right-hand side is the services and there's some other Processes going on that's for instance looking at format transformations Transforming things from the original formats. They were delivered in to Things that can be used in the services also some initial quality checks that then Trigger email alerts to the scientists responsible for certain time series that they should have a look that maybe the sender is broken or Something else went wrong and that is then used as a stage to feed data into a post-press database the data model That we used here was the quasi data model. That's the US hydrological data model We use that because to Reno's very much hydrology and quasi is a very active Community working with those data and has developed a lot of tools and quasi standards to deal with these kind of data But one of the requirements and setting up the system was that we had to provide a sensor observation Service to serve data and make and allow users to query the data sets So we use the 52 degrees north SOS data server Which has a different data model than the quasi data model and to get from quasi to 52 north we created views for that views on to the quality model to be conformable with the 52 degrees north data model and that is the sense of observation service then that then serves the to Reno data portal or other OTC clients as a screenshot looks like this You can search things in the geographic context and then look at the time series download them Filter them whatever this gives a good first overview, but this is certainly only the starting point to then hook up your OTC compliant client and Start working with the data where do I think is this all heading data-driven research is Certainly one of the buzzwords around and it is now hitting the geological sciences with some delay because of the Geological sciences getting your hands and data is quite difficult and one of the things going on from do I Is I think will be identified for software with it starting already SIR. Oh is already Assigning do I through software and this is Something that I think is very necessary because similar to data and specimens also software should be identifiable in this Persistent way that would create The now missing link between papers and data because then we could understand how the data were processed and interpreted It would also make software recognizable as a scientific achievement, which is a gap at the moment something that's not always recognized that Creating software is a contribution to science and it would make science more transparent and reproducible So assigning you I to software is a good start, but might not be enough We would also have to think about other questions again the questions of identity versioning or location repository to To identify what we are referring to when we say we have an identifier for software then Sensor networks are becoming more important in the geological sciences than they had been a few years ago And these sensors can be manifold. They can be drilling grids. They can be satellites They can be measurements in the field. They can be drones or they can be instruments in the lab and at the moment these different subsystems are not well integrated and The ability of creating metadata as the data are being created That that ability is not Used to its full potential then With more sensors around with we also have more data So we have to find ways of working with very large data sets that are too large to be inspected in detail Or even to be loaded into the desk properties. A lot of Data sets they can easily download from the web already too big to be handled in your standard desktop software and Sometimes the question could be with time series. How do you inspect three years of meteorological radar for anomalies? You cannot sit down and watch three years of rain radar and also the process of data mining today is Mainly numerical and text data, but maybe we want to work more with images on Quite different other materials not only numbers and characters so With these challenges this also means that processing will have to move from the desktop to the cloud large data sets and you know this but I Think this is something that still needs some some research and how we make this operational And then there's linked data which has been around as a buzzword for some time and Tim Berners-Lee formulates these four principles of How heat things linked data should work that you use universal resource identifies to denote things You used HTTP URIs so these things can be referred to and looked up or as you call it dereferenced by people and by machines and then You provide useful information about things Using standards like RDF resource description framework or Sparkle Sequoia language and Then you include these links and other related things when you publish data on the web So this is basically what I showed in this earlier illustration with starting to look for papers and then going on to find data and other publications, etc the question is how to do eyes fit into this picture of What the link data community calls cool URIs do eyes being? Resolvable through HTTP services have a resemblance to that and could be used in the same way But I think we still need to do some thinking about how to bring these two worlds together So in summary persistent identify has now allows to publish site and identify data specimens and software and As we see from the numbers data publication is now becoming more common the principles of data Identification can also be used with other materials and with software and we encountered the same problems But certainly the future Publication I think will consist of elements linked by identifiers and the paper will only be the interpretation But it will also provide access to the data to the materials that were used and to the software workflows When more and more of the repositories are now offering Application programming interfaces based on linked data principles not all of me I do but I think That's the way they have to go because that will make them more useful and Also fits with this idea of pushing Processing into the cloud rather than downloading and processing on your desktop PC and There's also future data publication Whatever publication and my mean will cater for both people as consumers of that Publication at as well as user agents machines making use of these publications