 Welcome everyone to this presentation, to this webinar. Welcome to all our participants. We're happy you are there. There is a lot of you. We have over 170 participants to this webinar, so we are very happy about this. And most of all, welcome to our great speakers. Let me introduce to you Mr. Daniel Kantar from the European Commission, who is taking care of the policy driving and governing the Copernicus program. And welcome to Professor Thomas Blazk from the University of Salzburg, who will be talking about the applications and scientific applications of Earth observation data and Earth observation dissemination platforms from the point of view of its use, its applications in the scientific domain. So, starting with this, we will go... Okay, so here is our agenda for today. First, we will discuss... I will introduce the challenges facing Earth observation dissemination systems, what these systems are and what are the challenges they are facing. Then Daniel will present this dissemination systems from the point of view of the Copernicus and Sentinel programs, most of all, and from the point of view of the European Commission. Then I will go back with some ways to address these challenges and some best practices we have learned and we have found while operating big dissemination platforms for big-scale data. And then Thomas will come with the challenges and opportunities for scientists and users in using and exploring big data using these platforms. We will wrap up with conclusions and a short Q&A session. We will try to close it within an hour time. So, starting with these challenges, what these are? First, a few words about CloudFerro itself. We are a cloud operator specialized in the Earth observation data processing sector. So, we are running several platforms for storing, disseminating and processing Earth observation data. We are a fully European company with other European origin and infrastructure and capital and everything. We are concentrated on this Earth observation sector, but also climate research, science. We diverge a little bit in neighboring sectors, but this is the core of our activity. So, this is just a short introduction. We are not a very old company, but we've been here for over six years. And we mostly operate big data Earth observation clouds for several organizations and institutions including, of course, Copernicus-Round projects such as EO-Cloud, the Creodias and Wikio. We also operate a German Earth observation platform. And our business is continuously growing as all this sector is. So, this was about CloudFerro itself. Let's get down to the subject of this presentation. We'll be talking about how to disseminate, store, process and disseminate Earth observation data. So, as an introduction, let's see where this data comes from, how it is processed, how it arrives into, how it reaches customers, what's the path for this data to go from satellite to the user. So, first there is the observation satellite, which takes some measurements, performs some onboard processing on these measurements. This can be more or less sophisticated depending on the satellite. So, either it could be a systematic acquisition satellite that transfers mostly everything it observes down to Earth, or it could be a task-based satellite that receives some orders or tasks on what should be observed. There is some onboard processing of the data. This data may be initially cleaned. Sometimes newer satellites may perform some decisions on what data to, using, for instance, artificial intelligence and other methods, to decide what should be acquired and what should not. And this data acquisition, this acquired data is either transferred directly to a ground station. These ground stations tend to be located around the poles, north pole and south pole, because this is the place where the satellite overflies at every revolution. These observation satellites are mostly polar orbit satellites. Some satellites go through a relay satellite, which relays this data before it gets to the ground. And this extended path through a relay satellite allows for faster acquisition of the data. So, using this method, the data may be transferred down to Earth almost immediately after it was observed. With direct ground station, the satellite might overfly a certain ground station before it downloads data down to the ground station. Then from this ground station, the data must go through some processing, which is sometimes quite heavy. Sometimes it is delayed because it requires some additional metadata or other information to be included before the final product is generated. And then these generated products of different levels may be distributed to users. And all this data path is currently under heavy transformation. It used to be very silos-like with vertical infrastructures processing data from a single mission. While now it is being transformed and it has mostly been transformed into a cloud-based operation, where different stages of this processing may be performed by different cloud-based actors. And all the data tends to be available from common sources for these satellites. So, with all this ecosystem, there are different users and actors involved in this ecosystem. And these users have different challenges and different needs. So, there are different needs for policymakers from the point of view. First of all, I should start with users who need the data, who mostly need the data. They have different platforms, clouds and data sources from which they can get this data. So, this creates some problems as they need to switch between these data sources. They have data consistency issues that the data may or may not be in the same scale, both temporal and spatial, different instruments, and they would like to have mostly coherent data. There are different diverse access methods. The download times reliability is also an issue because users often need to have this data quickly and downloading the data in large volumes takes time over narrow links. Clouds and infrastructures have limited processing capabilities. Different tools are available on different platforms. So, this is not heterogeneous. This locks users into one cloud or one platform and makes transition difficult. There is also complex data licensing issues. The issues of data providers are mostly concentrated around growing data sizes all along this path we have shown. A growing number of data sets, a downlink bound with which is limited. They need two ways to reach customers. Of course, data providers want to have as many customers and users as possible. And while doing this, they need to keep the costs reasonable. From the policy makers and society point of view, what is important is strategic autonomy. This is especially important in Europe to be strategically autonomous. Sustainability of the projects is an issue. Standardization is another issue that these data and users are able to interoperate. While supporting the development of the sectors, policy makers need to take care to keep the market competitive, to level the fields for different players on the market to make it fair. And of course, they want to promote the usage of these platforms. So, these are the challenges. Now I will switch this presentation to Daniel, who will be talking about the Copernicus point of view, about the Copernicus program in the context of this big data. And then we will talk about how we could address those challenges and how we do it in our platforms. Thank you very much. Your last slide was very interesting. Here, Miko, I hope, is going to go away. Yeah, your last slide was very interesting. And indeed, we could spend quite a lot of time in those, because it's summarizing a lot of those issues. Now, I'll take a bit of a larger picture. What Seleslo has presented was very much focused on the space. And indeed, this is where the amount of data, the amount of data acquired and processed is the biggest. So, I'd like to introduce these Copernicus for those who do not know already. I will go a bit fast, because I have quite a lot of slides, but then understand that you will receive them anyway. So, this is the Copernicus. This is an environmental and security, more environmental than security public service. We are using the data from space. We have our own Sentinel satellite, but also other missions. We are using in situ data. We are distributing those. We are providing them to specific services. And then both the Sentinel data and the information provided by the services are distributed to users. This is the structure of the Copernicus program, where the program manager is in fact the European Commission, but we are really relying on the expertise of international organization like ESA, of course, for technical coordination. And then on ESA, you met that for the operation of satellites. We are also acquiring other mission data from commercial provider. Then the central part, which is the service part, we are also relying on specialized organism in the different topic of those services. And we are also relying on in situ data because the services are provided mixing and using different source of data, both from in situ and from space. This is another picture that Stanislaw just presented to you as a ground segment. This is the ground segment getting the data from the satellite. I will go quickly on that one. That was quite well explained before. Of course, there is the full process for those data and provided to both the user on the free food and open data policy and to the Copernicus services. The amount of data produced is quite important. We have also a lot of registered users. You see the figures in the green on your left-hand side. Over 400,000 registered users. A lot of data is being downloaded and used by those users. The green tables show a ratio between the data produced, one and then the re-uploaded and re-used of the data. The number there are showing for the full archive. Those are the satellites for the Copernicus program, the specific one, the dedicated one. We have eight satellites flying in orbit at the moment. We have radar system. We have optical system. We have the Sentinel-3, which has both instruments, radar and optical, but at course a resolution the two other considerations. We have the atmospheric chemistry with Sentinel-5P. And then we have the Sentinel-6, which is measuring the eight of the oceans. More satellites are coming already from this current Sentinel constellation. You see on the top the arrow on the top. There's two other instruments with Sentinel-5 and 4, which will come in the coming two years. But we are preparing also for other missions. We call them Sentinel expansion missions. New type of sensors. And also the continuation for beyond 2030, the continuation of the current Sentinel constellation. Of course with new generation of sensors to equip those satellites. For the first generation, we have foreseen four units for Sentinel, the constellation of Sentinel-1, 2 and 3. And at the moment, we have the two first unit flying and they will be replaced by the unit C and then the unit D. So we already have four also. We can cover quite a lot of surface for coming years. What matters of course is the continuity of the provision. We are establishing Copernicus for a long time to provide those information to the user. Those are the different services. All of them are providing their own specific information in the thematic in which they are acting, land, marine, atmosphere, climate change as well. And then they have two specific services which are acting on demand, which is the emergency management for men made of natural disaster and then the specific security services. Those two are consuming usually the data from higher resolution hands commercial providers but also using the Sentinel as input data depending on the resolution they need. What's interesting to see is that even the satellite data, the way it is acquired and processed, control for the quality archived, disseminated and then exploited all through the user need because Copernicus is a user-driven system. We are really putting a lot of emphasis on the user need. The same data lifecycle would work as well for the different services. It would be ingestion of the data, processing quality control, archiving, dissemination, exploitation. That means that there is a lot of those processes throughout Copernicus replicated for different need and purposes but of course we would be looking at system which would be helping the two blue boxer, the dissemination and the exploitation. Lots of information, lots of data coming from different place. We know now that there are the ICT information technology system in place. We look at the cross fertilization of both Earth observation and ICT to better provide services to our users. For this we came with the concept of the Dias. There is a system taking on board the data, put it in the cloud and then providing service to the user. Those are OGC standard services or Inspire standard services for the user and the API. This is basically the concept of the Dias to create an ecosystem on top of those data lakes. We have five of those Diases. Those are the Dias, Crayol Dias, Mundi, Honda, Wikio, and so Blue. Different Dias having different technology and way to address the user. They are quite autonomous in their act because there is one part which are the services required by us and then there's all the possibility for those Diases to develop their own services to the user of course investing their own money to do so. The idea is to be able to change different services from third party that would come on top of the Dias and build this Earth observation ecosystem. The Dias are responding to a baseline request and then adding other data. We have asked for the Copernicus data information to be stored but the Dias can store all the data of the user or the data they would like to present. They can of course provide other services and this is needed because there is always an evolution in the technology provided to the users. When you are looking at this technological trend we see after having established those Diases that we still have the full Copernicus system and then those Diases. Stanislav has been talking about the modernization of the ground segment, the transformation project where progressively there is a cloudification of those infrastructure, specific Copernicus infrastructure and the same is true also for the other services. There is no convergence between the service that is needed to provide the information to the user to provide the information, the data to the user and the dissemination exploitation. We see now we are reflecting a lot on the integrated data management concept looking inside Copernicus for those infrastructure, cloud-based but also looking at HPC for the modernization of course of the Copernicus information services but also looking towards the user, connecting also and looking at national infrastructure which are using the Copernicus data, the research world and businesses. This is a world that needs to be done and we are very much looking at the new services that industry is able to provide to Copernicus to be able to cross-fertilize as I said Earth observation with ICT technology and also looking beyond Earth observation to the other datasets because we know that mixing Earth observation data with other datasets are creating a lot of value for the user and the possibility to have innovation and new services. Thank you very much. This concludes my presentation. Thank you Daniel for this great overview of the Copernicus data processing and dissemination and we will follow up with our... I have talked a little bit about the challenges around these three communities that the ecosystem is addressing and while we don't have a very systematic way of addressing all these challenges through our operation of different Dias and not only Dias but different Earth observation data processing platforms we have found some ways and some ideas on what works well on such platforms, what attracts users, what is useful for users, what addresses at least some of the challenges we have presented and what are the ways of going forward into addressing these challenges. So we have built and operated a few of these platforms. There is Creodias where we are the builder and provider of that platform. We also participate in the consortium running Wikio platform. Before Wikio and Creodias we have created Eocloud which was a kind of precursor to the Dias idea on a smaller scale. We are also operating CodeD for DLR and also the CDS platform for the European Centre of Medium Weather Forecasts is also running on our infrastructure. First of all, through the use of these different platforms we have found that users have a few distinct ways for users of using these platforms. There are some users who just want to browse and view the data that is there and find things on it and these users require easy tools and a broad range of data. Then there are users who want to download this data and process it locally. This is the old way to go, that you download the data to your infrastructure and process it within your infrastructure. This is an approach that is traditional and still many users are using it. That's perfectly fine. A more modern way of doing things with cloudification as Daniel has rightly pointed out. Instead of bringing data to the users it makes more sense to bring users to the data. I don't mean that data is bigger than users but at least it's easier to process the data and do things with it close to the data because this data is very voluminous. We are talking about petabytes of data here. Transferring petabytes of data is not an easy task even considering current bandwidth. There is processing in the cloud. This can be processed. Some users process it using classical IIS infrastructures. By IIS we mean that users have virtual environments with virtual machines or virtual infrastructure as a service and they deploy their own application to process the data within this infrastructure level service with easy close access to this data lake. There is some users and this is a very interesting model that many users appreciate having predefined processes that are predefined and they can run at scale in the cloud. These are the basic consumption models. From these consumption models we will switch to the users using those models and tend to have some issues and some things working for them. Through our operation of the platforms we have found that some practices are beneficial for users and work well within the platform. They are also beneficial for the operator of such a platform such as we are and we generally have a few recommendations while running such platforms. We have a few of these recommendations and I will discuss them in the next few slides. First of all, it makes sense to calculate. In order to decide it, it makes sense to calculate and to estimate things. We made an interesting estimation comparing the costs of satellite mission and what it costs to acquire a data product then what it costs to download it, download it to Earth and what it costs to process it and what it costs to store it. We have found that the cost of processing and storage is relatively low compared to the cost of acquiring a satellite product. The calculation we have made here is for Sentinel-1 A&B. Roughly the budget of this project was around 400 million euros. Assuming these satellites will be operating over 10 years period which is quite an aggressive, I would say, assumption though perfectly reasonable. And the number of products generated each year. We come down to a cost of, this is misspelled, it shouldn't be a cost of acquiring one product. It's the cost of the mission divided by the number of products this mission generates. So in these terms the cost of one product is 91 euros which is to be compared to the cost of downlinking a product. This is an estimate based on the cost of an Amazon surface for downlinking because this was the easily available cost estimate. It may of course vary. But to downlink this 1.3 gigabyte of data, roughly costs 3 euros to process assuming the processing of such a product could cost one hour of processing using quite a large virtual machine. This would cost 90 euro cents. The cost of storing this data over one year is just 16 euro cents. And the cost of 10 years storage considering the aggressive costs of storage this digression of 30% per year is what we have seen over the last 10 or 15 years comparing the cost of storage. This storage over 10 years would cost just under 1 euro. So the conclusion of this is that the cost of generating of acquiring a product is quite high compared to the cost of processing it and storing it. So once you have acquired a product you'd better keep it. It makes sense to keep it. And even once you have generated a product it still makes sense to keep it as long as there is a chance somebody will use it. And on this usage of the products I really like Daniel's slide on the things that we should be doing in order to boost usage and to boost acquisition, to boost usage and applications of the product. And the issue for many users will never use the product if it's not there. They will have no chance to use it. And if they are to generate this product it creates a few issues. First of all they need to have all the software and infrastructure necessary to generate a higher level product from lower level data. Then they need to finance it and only at the end they will be able to use it. And for many product categories it makes perfect sense to pre-generate these products because it boosts usage of the base product that was downloaded effectively reducing the cost of acquiring one product in terms of the mission cost. So to randomize, I don't know if there is such a word, to make the mission pay off it makes sense to generate a lot of high quality, higher level products that users will use because it reduces the per product cost of the mission. So this is one interesting finding and this comparison I think is interesting. Of course it all depends, this is a slide we have borrowed from the Copernicus technical operational budget. It explains how an acquired product translates into higher level product. The sizes of these products may of course vary but the general idea that was presented on the previous slide we think is valid regardless of whether this generated product is 50% or twice bigger. So I was with this flow of Copernicus Sentinel data but this just illustrates what was shown on the previous slide and this holds valid for many, many use cases. Though of course not for all, it doesn't make sense to pre-generate everything. This is obvious. Okay, so second finding is that it makes sense to distribute data in standard easily accessible formats. So these formats are OGC based standards for tiled data, object access or file system access for raw data. We think it makes perfect sense to store data in uncompressed formats to make it available for users easily and directly so that in order to process the data they don't need to go through the download copy for a cycle. So this is important. The data cubes that Daniel mentioned, I think in Thomas presentation, there will be data cubes. We think it's a perfect format for keeping and distributing the data making it available to users. So things like the ZAR format for data cubes are perfect for multi-dimensional data cubes. It makes the use of special libraries optional. Users shouldn't be forced to use a very special exotic library that they don't want to use. Data should be as readily available as possible. A fast homogeneous catalog tool available through an API and through a graphical user interface is also very valuable in order to find the proper data quickly. So this is about data access. Then the next thing that we found is crucial for users is to have enough bandwidth and processing power. So in order to run such an infrastructure, this is big data. You need carrier grades, scalable redundant internet access. This is obvious. You need storage at the tens of petabyte scale that is redundant, reliable at this scale and provides the bandwidth necessary to do the processing. This is very often very IOI intensive business. Then the processing power is also necessary in order to enable easy repeatable large scale product generation when these products need to be generated on demand. So running such a processing campaign quickly is important. We think it makes sense to pre-generate useful data sets upfront to boost their usage as was shown on the previous reasoning. Another finding about bandwidth is that the infrastructure you build should avoid multi-step pipelines and bottlenecks because these just don't scale. Such pipelines often become bottlenecks in the system and even with very fast infrastructure, if the architecture is not right, this slows down the system. Regarding the data sizes, this is where in the left column you see the estimates from the technical budget of Copernicus where what sizes of storage and dissemination we are currently seeing or will be seeing in a very near future. On the right side you see where we stand currently with the current infrastructure we have. So we are quite comparable with the estimates for the next years of Copernicus. We have even tested the platform as being able to process two petabytes of data delivered to disseminate two petabytes of data per day. So this is just an illustration. Next recommendation or next finding we have is that it makes perfect sense to federate both users and data access. Users often need to combine data sets from different sources. Considering the data sizes, these petabytes of data, it doesn't make sense to keep too many copies of the data because it gets too expensive. You need, of course, a few copies for redundancy but no more than that. And then you should federate user access and access to the data to provide users with transparent access to a large number of data sets. And we do these things in a few manners and in a few models. First of all, we, for instance, we provide on the platforms we run, we provide access to external data sets, some of them commercial, some of them not commercial that can be ordered by the user and delivered to the platform within a few hours. Of course, depending on the size of the order and then used as any other data sets on the platform. Then we have platforms like CodeD, for instance, that uses some of its local own data sets but also has a way to access the data sets of Copernicus, which are stored in another technical infrastructure. So doing such federative moves can boost access to data, can boost the effectiveness of its use. Of course, all of these should happen through homogeneous interfaces. The next recommendation is to use and contribute to open source projects. This is our strategy of doing these things and we think it makes perfect sense both being a user and contributing to such projects. We think that it's a good and fair way to develop the business and profit it up. And the last thing is about energy and resources. I think it's an important one. The good news is that energy and efficiency often translates in cost efficiency. Not always, but often. This famous green premium that many people are talking about happens to be low or even negative in many cases. By green premium, we mean the delta you have to pay in order to make your solution more green and sometimes making it more green translates into better cost efficiency. So we have a few examples of these. First of all, use effective storage. Large hard drives use erasure coding which is a wise encoding data encoding method that allows you to keep multiple copies of the data without multiplying the size of this data by as many copies as you have. So this is a very good method of reducing the cost of storage. Compression is also another good thing, though it needs to be balanced. We, for instance, use block-level compression which allows you to have compressed data while providing this data in a transparent way to the user so he doesn't need to decompress it in order to access it. Decompression happens on read of the data automatically. Cold storage is also a good thing to reduce the storage costs. Energy efficient computing, this is quite obvious, but there are other things you can do. If you can, and this is something that we like doing, optimize over the whole technology stack. We like to keep the whole technology stack from the bare metal to the software to have competency across this whole stack and to be able to tweak things across this whole stack because it allows you to optimize across the whole stack and it works perfectly well for us. Another thing is to optimize on hardware, both on configuring this hardware, which if you manage the whole stack you are able to do this sort of thing. We try to avoid pre-built configurations that we cannot manage and tweak. We tend to keep, this is something that is, I think, quite important from the sustainability point of view. We like to keep selected hardware parts for a long time. For instance, things like metal cases, server cases or power supplies. To a certain extent, processors have quite a long lifetime and the evolution of technology in these areas is much slower than the marketing and the story from hardware vendors tries us to believe. In some cases, you need to use the most modern and newest technology that appears and you need to switch it quickly, but with many components it makes sense to keep them for four, five, six, seven years. It works perfectly well, saving on both the native footprint and costs. Another finding that is especially valuable with processing types that are offline, which are type of batch or workflow-based processing, which are a large part of what we are talking about here, is that we can make better use of renewables if we can slightly delay the processing to the time when this energy or processing power may be more available in the peak time, cheaper or more green than in peak time. This is our last finding here and I would switch to Thomas for the continuation. Thank you very much. I try to be fast, hopefully not too fast, to save some time, so maybe using less than 15 minutes before people have to leave for the hour. My name is Thomas Blaschke. I'm with the University of Salzburg Department of Geoinformatics and I want to... Sorry, I was too fast. Just reorganizing my slides here. Maybe just looking backward with one slide where we come from, the situation in the late 90s. This was a very visionary talk at that time, Vice President Al Gore, who sketched the vision of a digital earth where he said every child should be able to touch on a virtual 3D screen and zoom in, zoom out seamlessly. Nowadays it doesn't sound so spooky because technology is there, but at that time it wasn't. This is how certain developments started. There was some political movement then that ended up in creating the International Society of Digital Earth, both through some American initiative initially and then China putting in quite some resources through the Chinese Academy of Sciences. I just make a little advertisement that in two weeks' time we will host the 12th International Symposium on Digital Earth. You can still register. There is a relatively cheap online... Only a registration. Just check out. I can put the URL in the chat. There will be very interesting sub-events, like one being organized by ESA together with the European Commission on Destination Earth and the European way of implementing implementation beyond the idea, the concept, the metaphor of a digital earth. There will be also other sub-events telling what's the situation in Africa, what's the situation in the Americas talking about as also geo-involved. If you have time, join us in two weeks from now for this International Symposium. It is important because nowadays as Stanislav pointed out, there is a lot of technology behind and you need to know about OTC standards. You probably need to understand what the WMF... Whatever is behind, whatever feature services, whatever... But there's also so many political programs that are nowadays dependent on the data. I just put here exemplarily, I just put a few abbreviations and that's only from the geo. And geo is one political framework obviously and maybe you're probably not aware of all those programs and all those programs have some monitoring tasks that need Earth observation plus in situ information urgently. And today I claim and I will also in this conference that I mentioned I will talk about that a little more there is something like a silent revolution. Others call it a data explosion. Stanislav already... Oh, I think Daniel already showed some slides and I think I had one slide about that. But how did it happen? I mean, you know, there's not only the couple of hundred Earth observation satellites being launched by ESA, NASA, JAXA, etc. But there is now there's really private industry taking over through the movement, through in smallsats, nanos or smallsats, like washing machines if you wish. Nanosats stuff, less than 10 kilograms say. CubeSats very standardized, 1.3 kilogram small gadgets that you see here, but they're more than gadgets because there are so many of them and they can build larger risk tolerant disaggregated sensor networks to collect images and not only images, measurements of atmosphere quality every day. So this all comes together based on lower costs for the launchers through some they call the sharing economy launchers that you don't necessarily have to have your own launcher, but you can carry many of them. So we see this private sector movement is a lot going on. Some talk about democratization of space, but I doubt a little bit if this is all, if this is really true, but there's not enough time to go into that. But the second kind of revolution and that has already been touched upon, sorry, that was the wrong slide, is this increase of data. That's not the newest slide, that's one I borrowed from Maurice Bourgeois, but there are newer ones, but actually it's only about the steepness of the graph. It's not about the figures. It is, I mean, some years ago we thought that Landsat is huge data, but it's not. And Sentinel, and in future other missions, that's what we call big data. But maybe in 10 years from now, people will love it. They say, oh, this was small data. I mean, this is all a movement. But you know what I'm talking about. And now this is bringing these other, these big changes. And this is really a paradigm shift. I refer here to a paper that I cite in these slides by Martin Suttmann's, my colleague and a couple other colleagues from the University of Salzburg, where we claim like a paradigm shift. It doesn't make sense anymore. Like in the old days, you went to a library and you borrowed a book or you copied something out of a book and then you have it on your desk. Nobody can take it away. And later on, you tried to get satellite data. And once they started to download, to make them available for FTP downloaders, they downloaded the data. But nowadays, like in the library, that it doesn't make sense to store every single PDF. It doesn't make sense to store every single image on your screen. So bring the users into the cloud kind of end and that's not only access and visualize, but start the processing, the analyzing of images and increasingly GIS data in the cloud. That's the clear messaging. And so this is the paper here I'm referring to 2020 paper in the International Journal of Digital Earth. If you have a little time, I recommend this treat. So there are, and we saw already, I have to be quick here. There are obviously approaches to big Earth observation data. They're died in Europe. We had, we know these Diaz concepts. Also, I personally never understood where we need to have five Diaz systems, but that's another story. There's Google Earth Engine. That's like, I call it a brute force mechanism. To me, it's not a very intelligent system, but if you have enough computing power, it really works, right? Amazon now can becoming a big player in this market. But to me, and this is something I want to point out here, a quite intelligent solution is, quite intelligent solution are data cubes. Again, I refer for the time reasons I refer to this paper. So data cubes, interestingly, the array databases behind, they are not really new. I mean, Rastaman, Peter Baumann always says that he started his work in 1998 with a, is a functioning array database. But for any reason, it only became popular a few days ago, and I think the reason being is the huge amount of data. But I think what's the future we are working at the university at, because that's our task to not stop with existing technology, but try to go on. This is where my colleagues, the working group of Diak Tide, Martin Submans and others, they're working on semantic data cubes by ingesting data in a certain way that you can set up queries that would allow you to retrieve certain areas based on you're looking for land use, land cover, or you're looking for islands. And this is something that you cannot do based on pixel, because a pixel will never know that it's an island. It needs to become an object, virtually or on the fly or whatever. And then we enter something where many years ago I've been working in a field called object-based image analysis. We need to start to group pixels to make them somehow intelligent, or to let them know about their environment, to put them in context. And so semantic data cubes, as here where they just maybe want to check out the center cube project, or you refer to this one short article here published in a journal just called data, I think you will see what I'm, you want to do that, you will see what I mean with by that. And then again, I'm in Russia a little bit now. So the future is GUAI, isn't it? Machine learning, everything it will do for us. Well, I mean, yes that's a future. Yes, there is a very interesting things going on, no question. But I doubt a little bit that it will solve everything. Let me try to find the next slide. Here we go. Yeah. I was involved in an article where we compared around 20 different for the same case, we used 20 different versions of CNNs and compared it to support vector machine classification random forest classification. And interestingly, only one version of it outperformed support vector machine, etc. And so most researchers obviously in computer vision, so they would actually go for the one version that outperforms the others. But our problem somehow is that we can't explain why using three layers or why choosing five by five convolution, etc. led to that result. So this is something, just an example that yes, there is this big potential with AI and machine learning, but there is a need for, and this is probably our role, there's still a need for academia to steer such a process, to ask the right questions, to be able to what, to get inside full results to help to understand the world. And as my conclusion slide I make again some little advertisement with this three universities in Salzburg and Austria and in France and the Czech Republic we're running a European masters program Copernicus Master in Digital Earth where students exactly learn that and they are really, these are the workforce, they are the workforce of the future and we always ask to produce more of those students, but we have, at the moment we have 16 fully funded students funded by the European Commission per year, but there's also, if somebody's interesting room for industry to sponsor some of those students and to have a workforce later. Thank you very much for giving me the chance to tell you a little bit and sorry for the rush to stay in time, thank you very much back to the organizer. Okay, thank you Thomas for this presentation, we will go as we are a bit late, we will go very quickly to the conclusions these are very generic conclusions we have drafted that generally we see the data is growing in both size and variety, this is quite obvious. Another conclusion is that data availability drives usage and it creates the industry and it goes from the satellite to through all the stops in between when the data is there it is available scientists and other users can make things with it which is also quite obvious there are a lot of new challenges with digital earth with simulations with monitoring many new applications that need the data that need the processing that need the technology happily the technology is here with this cloudification model becoming omnipresent and this is also a good thing, it opens up the whole system to new players it opens new possibilities and of course out of this possibilities come new things new storage new processing, analysis artificial intelligence visualizations many different new methods and technologies that will bring new earth observation data applications that can do new things that we cannot just right now imagine. So this would be as the conclusion I would just quickly ask Daniel and Thomas would you like to add something to these conclusions? No very briefly thank you, I mean we see the challenge this is very interesting to see the science world coming with innovation with the analysis with a new way of extracting information out of the data the ICT world information technology being able to sustain with new technology those analysis effort and also with the Copernicus program the assurance that there will be a flow of data we are multi-year programming and of course for the next seven years the budget for having those data from Copernicus and preparing for generation. Thank you. Thank you Daniel. Maybe just adding here I would also try to get a message across and maybe to re-emphasize is that the good news is that if a scientist is interested in a certain phenomena in the world it is the cost of access and data that is less and less a barrier. For scientists in Africa for scientists in countries with less resources as long as you have a good internet connection and we like the Copernicus program which is all open access data that really is the scientific community actually helps researchers without big resources to do their research and this is something which obviously is a private company here so that is probably not something that has been emphasized here because you obviously emphasize the technological challenges and the opportunities and they are huge but I mean they are always huge I mean they were huge they had asked a computer science person 10 years ago or you will ask a computer science person in 10 years I mean technology is always new and exciting and is growing and fast developing but again there are some paradigm shifts in society that we now really see happen based on not only the Earth's observation information itself but on the access on now this wide access to Earth's observation being made available through these cloud based infrastructures thank you very much thank you very much Thomas I think this was very interesting I know we are behind the time and I know many of you have to leave I think we will skip the Q&A session for this reason unless someone wants to stay with us beyond the end of the webinar we will of course be happy to answer I would just advertise a few upcoming events there is digital Earth organized by University of Salzburg by Thomas and we think it will be a great event to attend there are a few webinars that we organize that are coming you will receive all this information with the slides that will be available for all of you to use so so I would like to thank Thomas and Daniel it was really great to have you in that webinar I would like to thank all the participants it was great to have you and welcome to our following webinars and of course we stay with you for the Q&A yes I have got one question from the audience that came to me through the chat that was about how to avoid multi-step pipelines at architecture level and as an example of of course in a generic way it may be difficult but there are situations where you can really avoid it and one example would be the organization of storage system that is meant to deliver data to a user so if on one extreme type based long-term archive in order to process data from that archive you need to order data from the type library the type library needs to to make it available in a through a temporary data storage area the data needs to be decompressed transferred to the user and then or transferred decompressed and then processed on the other hand if you make that same data, if this data is reasonably often used if you store that data in directly accessible uncompressed form accessible through something like NFS or file system mounted an object storage mounted via file system it avoids all the steps that need to be performed in order to process access and process that data so this is just an example many other examples could probably be given by others okay we could if you have some questions on the I don't see questions new questions appearing on the chat if you have any please come to us and we will answer them offline you are generally welcome to contact us and to to ask things thank you once again Thomas and Daniel for this webinar it was great to be with you thank you to all the participants thank you bye bye thank you