 Mi je Frančesko. Zdaj sem tudi o zelo, da se zelo vsega zelo v Klametčenju, Paidonem in zelo v exanrei in duske. To je zelo v XKCDE z klametčenju. In je odličen temor in pravdes 20,000 dovolj. vs. način, ki je vzljeli vzljeli od 1961 in 1990. Vzljeli smo, da v svojo 20,000 ljudi tempečno je uživljati z 4°C in v svoje 40 ljudi tempečno je uživljati z 1°C. Sjantisti vse modeli zvršili, da vse situacije izvršili, za drugi hodin, in da vse vršili od 3 zvrščenje scenerijov. Prv nekaj izgleda, da imaš imediativne vrščenje, da imaš imediativne vrščenje, ali je zelo, da smo biti ljudi v seneriju. Den optimističnem scenarijom vzASTRO kaj je bilo v 20.000 rov. Kaj do monitoru vse situacije, imamo veliko data. Svetične obzervacije in inzito obzervacije, zato stacije na obzervacij, radioson, vse data je prošljena z modeljih, vzeločnih, zeločnih data. Tudi glasbo-grednje data je reanalizacija, vsočenost, vsočenost vsočenost in vsočenost vsočenost. Reanalizacija je medeorologijne modeli, vsočenost v pasti. E-RA Interim je vsočenost z ECMWF, vsočenost od 1979. Vsočenost je vsočenost na vsočenost. Vsočenost je medeorologijne modeli vsočenost na 10 dnevrili in svočenosti, vsočenosti na 2 sezonje, vsočenost v zemlju. Neljah imam vsočenost that we already spoke in the first slide. I have several scenarios. I represent three scenarios in this chart. We can see the same situation as the first slide. Let's investigate on the sides of this data. For analysis data from ECMWF we have a special resolution of 80 kM per 80 km and a vertical coverage up to 64 km in 60 levels. kratkovala od 1979 dovrga, 200 variabljev, in kratkovala do vrga 12 od 4 od hradi. Zato imamo povzifljenje in zvukamo 100 tera biti, tako vidimo, da povzifljeni to, z kajim vzivnih zapot. Python je zelo veliko zelo saintevri in numerikali. Vse je zelo vse je začetil, da je zelo začetil. Češtje, da je zelo začetil, je zelo začetil nimi in koordinacije in zelo začetil in numerikali. Vse je zelo začetil, da je zelo začetil, da je zelo začetil, There are a lot of data of climate data distributed by ECMWF at this link. And we can download an ECBF through a Python interface as shown in this slide. In this case, we selected two variables to meet the temperature and total precipitation rate from the era interim database, and in the period from 1979 to 2016. Once we have the net CDF file, you can open it with XRA as a dataset. XRA dataset is an XRA multidimensional equivalent of a pandas data frame. And if we print the dataset, we can see that an XRA dataset has coordinates. Latitude, longitude and time in this case, and data variables that are precipitation and temperature. Every data variable depends on all the coordinates. The dependencies are explained in the parenthesis. There is time-later-long in task that is temperature at the surface, at tp-rate that is total precipitation rate. We can select a variable from this dataset, and we will have an XRA data array that is an implementation of a labeled multidimensional array. Now we can see the attributes related to the datum that in this case is standard name, air temperature, long name, two meter temperature, and the units that is Kelvin. Endring the data is very simple. XRA dataset and dat array have all the methods and attributes of NumPy, and a lot more methods to perform operations on data. One of the simplest is the selection. We can use the name of the coordinate to select. In this case, I selected January 2016, and now we have a map because this data is a monthly data. If we select one month, our data depends only on latitude and longitude, so we have a map. If we use the plot method of XRA, we recognize that we have a data that depends only on lat and long, and it will automatically plot a map. We can select also a point using lat and long, and using the method, we have to specify the method to interpolate the data. I chose the simplest nearest neighbor, and now the data depends only on time, so we have a time series, and XRA will plot a time series. We can perform also more complex operation data, like climatology computation. The climatology is essentially an annual cycle. So, starting from the time series, we can use the group-by method to group data by month, and then we have to perform the mean over time. XRA will create a new coordinate called month, and if we plot, we will have a time series. We can see... I chose the remaining coordinates, and we can see the difference between winter and summer. XRA integration with dusk is crucial to handle large data as climate data. Dusk divides this array into many small pieces called chunks, each of which is presumed to be small enough to fit into memory. Like numpy, which has eager evaluation, operations on dusk arrays are lazy. Operations queued up a series of dusk mapped over blocks, and no computation is performed until you actually ask values to be computed. For example, to print a data or to plot the data, or to save into the disk. At that point, the data is loaded into memory and computation proceeds. The actual computation is controlled by multiprocessing or thread pull, which allows dusk to take full advantage of multiple processors. So, to open a dataset using dusk, we have just to add chunks keyword to the function openDataSet, and we have to specify over which coordinate we want to chunk. In this case, I select lat and long. Dusk will create a chunk every 200 values of latitude and 200 values of longitude, and in this case, time does not appear, so only one chunk will be used along this dimension. We can represent the workflow that dusk built, calling dot data dot dusk, and using dot graph function by dusk. This is the net CDF file, and these are the chunks that dusk create to open the dataset. If we select an exact latitude and longitude, dusk will import only that chunk, and the others will not be imported. In this way, we have a huge savings of memory, because dusk load only one chunk on memory to perform the computation we want. The Copernicus program is the world's largest ERTOV observation program directed by European Commission in partnership with ESA. It aims at achieving a global and continuous ERTOV observation capacity. In this context, the climate data store will be a dissenter of the Copernicus climate change service, and it will provide climate information on past, present, and future in terms of essential climate variables and derive climatic indicators. The climate data store will be a distributed system and it will simplify access to climate data through a unified web interface. It will contain observations, reanalysis, climate projections, and seasonal forecast. It will provide a software platform called Toolbox that will allow to develop applications to the users using all the information in the CDS in the climate data store. Services are designed to meet the needs of several types of users like policy makers, expert, and scientist. As biopent solutions, we are in charge of the development of the CDS toolbox. This toolbox has essentially three types of users, the developers that are current developers and future developers of the system, experts that are experts in analyzing climate data and they will actually use the toolbox to create and publish custom climate application using XRA and custom climate tools that we developed, and end user of the web application developed by the experts. They don't interact with the toolbox directly, but only through the application developed by the expert. The application are submitted to the compute servers provided by ECMWF. This is the expert interface to develop application. On the left we have the resources from the CDS, a list of examples apps and users apps, climate data and documentation. On the right there is the editor where experts actually write application, Python application using XRA and the custom climate tools provided by the CDS and developed by us. Below there is a preview of the end user application and if we run an application the application will be submitted to the servers and the results are shown below. Let's see a typical end user application. For instance, this application investigates the impact of climate change on wine production. We used the era interim data for the past and the climate projections for the future to forecast the future. We can see that this is the situation in 1979-1986. Every color corresponds to the optimal condition for the growth of a group of grape vines. So we have different color. In the north we have the champagne. The red is the typical wine that we produce in Italy and the rose is the Sicilian wine. If we go in the 1986 and in the future we can see that we have the areas are moving in the north and this is the today situation and we can see that if we go in the future the moving is continued. This is the first part between 2016-2026 and we can go in the next 100 years and we can see that this is the optimistic scenario that I discussed in the first slide and we can see that the areas are moving to the north and in the last period there is a huge production of wine from France and also from Germany and champagne will be produced by England. Thank you for your attention. We have plenty of time for more questions. Thank you for your talk, Francesco. Thank you. I use pandas and NumPy quite a lot. Could you give a little bit more information about the differences between X array and NumPy? What are the key differences? Sorry, can you speak loudly? Can you tell me a little bit more about the key differences between X array and NumPy? X array has plenty of methods more than NumPy. The main difference is that you can choose data from NumPy only with indexing. In X array we have index, so you can have a very huge shape a very number of shape and you can select easily and you can forget about the order of the index of your array but you can simply select with label and we have a lot of methods to perform operation on index more easily on X array and the integration on dusk optimizes this operation and you have not dusk on NumPy you can use dusk arrays that are very similar to NumPy arrays with dusk but X array has this feature more. Thank you. More questions. Thanks for your talk. As far as I understand dusk aims to be a sort of distributed computation, right? Yeah. Have you thought to use for instance spark instead of dusk or if you can tell us what's the advantages of using dusk with respect to spark thanks. Dusk as a already integration with X array out of the box so we we use the only dusk and we are developing in dusk so we will try maybe spark but we are pretty happy about dusk Thanks for your talk. It's really good to see climate change somewhere on one of our presentations. I just wanted to ask how accessible is the data to casual users like myself who may want to go in and do some experimentation. IzMWF provide some free data on March I have the link. This is the link and there are free not all the data from IzMWF reanalysis data are free and you can you have just to I think have an account on IzMWF the name is March the environment to download data. More questions. This is actually a comment on the previous answer I'm working with Francesco so the data is available both the reanalysis that is the past what we know about the past and the projection that is what we think the scenario we are think will happen in the future it is hard to use them because first you have this kind of large size and then because they are very different from one to the other even in things that are the same for example the surface temperature you need to know how to how is it cold and how is it measured there are a lot of subtle differences and this is the reason the Copernicus program is trying to get to give free access to data and to have a kind of a homogenization layer so that you can easily go from one data set to another that have been done by someone else and even completely different but you can move easily and as not really a casual user because these are complex things to do but as a researcher should be once easy operational which should be from later this year to completely operational begin next year all of this treasure of data should be very accessible much more easily than they are today so I got one more question how large is the computational infrastructure you need to analyze this data like in numbers of servers hard disks the actual infrastructure is not ready now we we are the unique developers are the unique users of this infrastructure so it's not I don't know in the future what we are the but right now how many machines that you need to generate those plots for instance no, no, no this is a normal machine our laptop excellent are there any more questions then let's talk Francesco again thank you