 So my name is Pilar and I come from Spain. It's sunny and beautiful in Spain, and so my English was perfect, but I will try to do my best. Okay, so I work for the European Space Agency, and I want, first of all, to thank you, all of you here, because I know it's very difficult to choose between these four parallel sessions, and also this gives me the clue that of course maybe you are probably interested in astronomy, or at least that you like to watch the sky at night. So the idea is that I'm going to show you this new tool, the ISA Sky. There is a science-driven portal that will allow us to study the universe in a different way, which has been done before. So the scope of this tool is signed to some professional astronomers. But it's also very, some people that just like to, if you like Google Sky, you know Google Sky, maybe, yeah. So this is going to be a more complete tool, and I will explain it also later. So just to, all of this tool has been powered by Postgres, so that's why I'm here. I also have to thank the organizing committee for accepting this paper on astronomy, and from Europe, that was not so common. So the idea, just to put you in context, what is the European Space Agency? I'm sure you all know NASA, sure. Okay, so the European Space Agency is like NASA, but in Europe. Actually, sometimes when I, people just ask me where I work in Europe, they tell me, okay, you work in NASA, no, I say ISA, because we call it ISA. And so we say, is this European NASA? And so they can understand better. So that about the, that about this is here, it's not a scratch, this is a castle, and this is where I work. Is that because the European Space Agency is participated by most countries in Europe, but at the one, it has different establishment around Europe. So I work in the one in Madrid. And here is where the establishment is called the European Science Astronomy Center, ISAC. And here's where we process all the, make all the processing of the different satellites, missions, and so on. So just to understand what we are going to see, we have to review some physics. I'm pretty sure that you have studied physics sometime in your life. And do you remember wavelengths? Yeah, good, and the electromagnetic spectrum. So this is very good. Why? Because it's important, very important to remember that humans can only see in the visible spectrum. So unless we are X-Men and we are not. So who? And then we have, then there are some animals that can see also in infrared, but also if we make an X-ray plate, we can see the bones, right? But this happens the same in the universe. So we can see the different objects in different wavelengths. This means that we will study different characteristics of the objects, like galaxies, in a different way. For example, anyone here knows which galaxy is this? No clue. So this is Andromeda. And the Andromeda galaxy is our nearest galaxy. So, for example, if we go to see Andromeda in the right-hand radio, we just can study better the outskirts of the galaxy. But if we see it in infrared, we see the dust and the gas more obviously. Then in the visible, we can see the stars that are in the arms of the galaxy, like they are our sun size. Then in the ultraviolet, we will see the stars that are giant stars, hot stars, and in X-rays, that is the more, the frequencies as well. They will show them the high energy objects in the X-rays. And this will tell us that here there is a lot of the stars just being created. So, the European Space Agency, like any other space agency, what we'll do is to launch satellites to outside. So NASA has also many telescopes on the ground, but ESA is more focused on sending outside to the universe, to the different points in the universe, satellites, because only there you can see in different wavelengths the characteristics of what you want to see. Many of them are collaborations between space agencies, because this is really, really expensive, of course. Do you know Hubble? Hubble Telescope? Yeah, so Hubble is a collaboration between ESA and NASA. So it has been flying for 25 years and it has generated lots and lots of data. So this is Hubble in the visible. There are many others. For example, Planck and Hershel are also collaborations with NASA. Planck is dedicated to study the big bang, and Hershel to study the dark energy. So, for example, and the last one just launched this year, ESA, ESA Pathfinder, very successful satellite because it has demonstrated the existence of gravitational waves. There were also other satellites, like, for example, when I entered to work in ESA, I was working as a TVA in Gaia, because with Gaia, ESA just entered into the big data world, because we are going to the petabyte of data. But with Euclid, that will be launched in four years, and it will go to a hundred times, a hundred petabytes. So this is really a big amount of data. And it is to analyze all of these in the different spectrum to see different things that we can learn from each of them. So, what is the motivation then for this tool? It's just to... I work inside the ESA, inside ESA into the European ESA Science Data Center. That is the archives team, because the idea is that from all of these satellites, we get the data. We have to have an archive all of this data, because this is public, this is a public institution, and then the return on all of this is data, is catalogs of data, maps to the users. So, scientific users, PhD students that will do a lot of things with this data. And so the idea then is to have a tool that can mix all of the archives together, but not only ESA, because this tool was presented last year, the VETA release, that is the one that I can show you, was presented last October in the ADAS, that is an astronomical conference, and there were other space agencies that came to the team just asking for putting their data in this tool. So the idea is to show in this VETA release this multi-wavelength exploration to allow the singular multi-target search, and with what? We will use data astronomical data, the ingredients will be maps, pre-processed adaptive maps, that are called HIPs that I will explain later, based on Hilpix, and all of this, the important thing to remember is that they are real science data, and of course the catalogs, that will be on the list of the sources, so stars, asteroids, and so on. But this is a sky, it's for astronomy data, so there's no planetary issues. So we have the goal, we have the ingredients, and then we need to cook it. But this is not going to be a piece of cake. So I will just to put you also in context of the amount of data that we manage in this version. So we have more than 29 million sources, and more than one billion observations. An observation is what the picture that the satellite does to a certain area. And this is also what is called the footprints. Okay, so now I'm going to show you, I will do the demo that is here. So you can always, you can Google this is a sky, and it will appear this page. This is another galaxy. Anyone knows this galaxy? This is the Whirlpool Galaxy. The Whirlpool Galaxy, or called also M51, is a spiral galaxy, can be seen easily. It has an accompanying galaxy. Here this is the M51B. And we see different things in the portal. So first of all, is that we can track the map, we can zoom in, we can zoom out. We see here the coordinates and also the field of view. And so if we see the objects, here we have a menu for skies ordered by wavelength and depending on the satellite that has taken all these observations. So this is seen in the visible, but if we wanted to see it in the far infrared, we just see like the image that I showed you before, we just see this kind of image that will show us that we can study the dust. And we clearly see in here these black areas, these dark areas. But if we see it in XMM, that is a satellite for X-rays, then we see this big activity of the objects inside. So this allowed us to see in this multi-wavelength resolution. But it's also important that for example, all of these maps, the technology used that I will explain later has been developed, these hips, for the CDS that is the center, Doné de Estasburg. And the first maps, the first hips maps, looks like this for X-rays, but after the ISA astronomers just calibrate them and flatten the image and everything, they gave a very good quality data of the maps. So this is the good thing of this that it has the stamp of ISA. So there are other things that we can see in this tool that we can upload a list of targets. So we have the search field to go to all of them Let's see, for example, in another catalog that will look nicer, for example, the digital survey. And so you can play and see the different things, so we can put one after the other to see the differences and everything. And this is resolved by a tool that is called SIMBAT. SIMBAT is resolving coordinates and names of objects to allow to see all the different positions. It also can be seen in two kind of coordinates, an ecliptical and galactic one. And also here we see two histograms. So let's stop this. Just to make a last part of the demo, that will be to go, for example, to Galaxy M82. That is the Cigar Galaxy. So every time that we open this panel, we see two kind of histograms. So we have the ISA observations and the ISA catalogs. So ISA observations, you have seen that it's quite fast, actually, and this is good. And this has a, we have applied a strategy to count the number of observations and the number of sources because this panel, this panel of the histograms is project agnostic. You just see here, there is a base on the wavelength, not on the project. Then if you click, of course, you go to the mission because in space we call it missions, not projects, but it's like this. So what is the idea of this? It's like, for example, if we look for observations, then I can click, for example, in x-rays and then the footprints, so the observations for an instrument will appear. Let's go. So we can see the number of observations that the satellites have made for this object. Also, we can put the ones in radio and they will appear as well in all these field of view. Then we have a list of these footprints and if we click in one of them, we will go directly to the archives for this specific mission and the image will be shown to show more data. For example, I can just shut down this and this and show you in the catalogs if I zoom in, zoom in, zoom in, zoom in and see, for example, in Hubble, we have a mosaic of the different maps compound and we can then see the sources. I don't know if you see a very small one here, observed by Hubble. This was for integral, sorry, but if we go for Hubble, then we will see much more. Well, it says that I have to go to a closer to show all of them because there are too many results. So, once we go, we have to put an algorithm to show the brightest ones to order and limit the number, otherwise it will be very difficult to retrieve in a reasonable time to have all this information. But, I'm finally just to show you that if we go, for example, to plan and we zoom out, zoom out, zoom out, zoom out to see the whole sphere where the universe is mapped. We can go around. We can go to the galactic plane as well and if we are enough, zoom out, like now, we can see, for example, all the footprints will appear here that has been made through all the life of the satellite. The same thing, for example, for Hubble and this will tell us how this is covered. But this is too much information. This couldn't be done with the database. Actually, this is using one thing that is called the multi-order coverage to make this map. And then we could click on every part. So, I hope you think it's a very interesting tool and you will use it sometimes because sometimes we used to play with it more time than we should, but it's really giving a lot of information and really important because it's something new. So, just technically speaking, how do we do this? So, sorry. This is what I want to do. Okay. Just to explain the maps. So, as I said, they are based in these HIPPS maps. That is the Hierarchical Progressive Survey based on the desolation of HIPPS. I don't know if any of you here have worked with HIPPS. So, the idea is that we divide that sphere, for example, in 12 diamonds equal areas and each of them is divided in four more. So, for the level one, we have these 12 diamonds, but for level one, we have 48, for level two, 182, and so on. So, every time that we choose one of them, then the resolution increase and this gives us more information. So, that's why they are progressive maps. The same thing can be also applied for your spatial and it's also now being more and more used. The HIPPS technology is an international virtual observatory alliance note that this alliance is like the W3C, so it follows standards. And now you can also then go to see this note. So, the maps, so the HIPPS are very good, but, well, you have seen it before, but... As Akira says, HIPPS don't like, they do, actually. Why? Because we see that when doing the demo sites, so we can see the CCD borders, we have duplicated sources because the astrometry maps doesn't make the perfect matching between the two sources and then it has to be calibrated again or we have different calibration and spatial type because you have to think that the images are made by the telescope. I like, you know, the reflex, the CCDs, so it's photography, right? So, this has to be flattened, the image, to have a good quality image. So, just to see how much data, so just to, for Hubble, as I said, we have the largest number of footprints of observations per instrument. And this has been divided then in the number of polygons for observations, the total number of polygons, so you can feel this a lot. And the same for catalogs. Then the number of catalogs, so every observation has a number of sources, but the observations can have the same sources. That's why the number is not exactly the same. And as I said then for Hubble, we have more than 29 million sources. Then, as I told you before with the histograms, the idea is that when the field of view is very large, we can apply what is a counter strategy that we call. So, to make the bars to update very fast, we pre-compute the number of sources and then if it's very small, we just make the select count into the database. So, yes, we use PostgresQL and we are very thankful for this because we like open source and this is an open source database that we choose in the archives team. Most of the archives are using Postgres. There are some old archives that we're using in Sybase. For processing in the European Space Agency, other databases have been used like Oracle and then moving to, for example, for Gaia to inter-systems cache, but the scientific data model will run in Postgres. So, it covers actually what we need and this is good for us. So, how is our hardware architecture? So, this is a typical three-layer. So, we have a web browser, as you have seen. Then we have a thing that is called the Table Access Protocol Server. That is a standard also for the virtual observatory. And this accepts requests written in ADQL. That is the Astronomical Query Language, very similar to SQL. And then this will send the request or to the database, the Postgres database, which contains all the metadata, or it will send to a data distribution system that will go to get the files to the different archives. And also, these Hips maps now are redundant and our load balancer resides into different servers. So, as front-end, what do we have? So, we have an HTML5 CSS3 compiler, Google Web Toolkit, with a wrapper for Aladin Lite. Aladin is the tool used for visualizing astronomical data made by the CDS, the Central Donate de Estrasbourg. And also we use the high charts for data visualization. Then use the International Virtual Observatory Alliance Protocols because we like standards. We want to go to the standard way as much as possible. And so we use the tab and the Table Access Protocol and the Spacetime Coordinates describing this complex field of views that we have seen. As I said before, for the field, for the search field, we use Symbab to resolve the data coordinates and the angular size resolver. So, as the back-end, we have a tp server to serve the... to serve the Hips request and a Tomcat to resolve the request. And also to help with the data metadata node. So, Database is Postgres. We use... I will go now deeper into this with the foreign data wrappers and the materialized views. We use a spherical data types library to resolve all the polygons and searches and everything. And also, PGSphere, Q3C. And now we are investigating post-gis for resolving some FMRI days, sometimes serious issues that we are going to add in the next release for adding also some solar objects. And for the footprints, we use spherical data types. So, just for this beta version, actually it's pretty amazing because this started just in 2014. And there was only two persons working on it. One developer and one just looking for the Hips maps. Then, as we are in the archives team, we have decided to go horizontal. So, we go... So, I'm expert on databases while working in the different archives and everything. Then other persons just go with their technology for the web tool hit and so on. So, we dedicated part of our time only to this tool. But it became very good. And so, for this beta release, we have only one node, but it's a quite good machine. So, with a ram of 512 gigabytes, with big storage, actually, we don't need so much storage so far, but the idea is that it's going to grow. And we requested the machine and they gave it to us to say, okay, it's fine for us. And then the post-gis release was for 943. Because we wanted to use materialized views with concurrent refresh. And so, in development, we had a virtual machine. So, we cannot do benchmark test in the development machine because it has nothing to do with the other one. So, we have to go to the one. In the new release, it will be different because this has changed. So, the post-gis QL extensions that we use, as I said, Q3C, PGSphere, for the G-Syndex scene and the support for spherical objects, we are using for this release PGSphere 111, but Bartonov just told me yesterday and told me that there is the 115 now out. And he told me that it's way, way faster. So, we are going to test it as soon as I'm back there. And because he said this will help us a lot. And then the PGHilpis, of course, to make this count strategy. So, just to explain about this, the count strategy. What we do is to divide these density maps in areas to pre-compute the number of sources that will be in each of these areas. And then, as I said, if the field of view is larger than three degrees, we go directly to check these pre-computed values. That are done like this. So, we create a Hilpix ID for the area. Then we index it. And then we populate a table for querying the counts. So, well, it's very difficult to see the results here, but it will appear that for a Hilpix index, then we will have the coordinates for the area and then the number of sources that will be for that area. So, there was a lot of improvement done with the materialized views and the foreign data wrappers, because at the beginning, what was done, it was all the data from the different catalogs was being ingested from the archives to the Isaskai database. And then, all the complex queries were done in the Isaskai database. And so, this was really slow. And so, I came with my Oracle mind, in db links, materialized views, okay, Postgres has to have the same. I'm sure that this... So, we find a way to use the Postgres QL foreign data wrappers that also have allow us then to use it with other archives in other databases like MySQL and Oracle. Then we created the materialized views with the indices on the materialized views. But you can see there are these indexes. And here, we created the radios for the hippies. But still, some queries to resolve. So, we have some queries that is not possible for a web portal. They take more than five seconds. So, even they are much longer than that. And we need to find a way to see how to resolve these kind of queries. And so, there is many work pending still. So, for performance, we use PG Bench and PG Tune. The PG Bench was run last August. Now, our final release will be in maximum two months. Then we are going to run again. But now, with everything that I have seen also this morning, with the PG Badger, the PG Glue, I think we will do more work on this. So, because now, I'll show you a better release. But the final release, as I said, is going to be, yeah, to be open next month for 200 users for the XMM people. There will be more for in June and so on until it's public for July or so. And then now, we have two same notes or same capacity to try to have the high availability. Because what we cannot accept right now is that if this is not good enough, people will not use it and astronomers are very picky. And they say, okay, this is not working, so we are not going to use your tool. So, we need this to be uptime the maximum possible. We have moved to 952 and the database size, as you see, it's really small, it's 54 gigabytes. Although, we expect to grow in five years with the catalogs for Gaia that is 50 gigabytes. And yeah, we were thinking, okay, this is so small, we can put also in memory everything. So, there are different tests to perform. And one, yeah, of our questions here and maybe you can then recommend you should go this way or the other way is that we have been thinking of using the PE Pool 2 replication. We didn't have the time so far to test it, but so the PE Pool will be in charge of the load balancing and the replication itself. What we have tried is the hot standby. It was very easy to set up. It was easier than even the Oracle Data Guard. So, this is working very good, very well. For Gaia, it's already working for the archive. And here, we put it just to test it between one operational node and the development node. Yeah, that worked pretty well. And in this case, what we have for load balancer, we were thinking about putting an F5 to do the job. But so far, as we don't have time and we have to release for the XMN people in one month, what we will do is that we will have two application servers. It's going to each database, to each database node. And then, so we will have to update any data, we do it parallelly in both. Because the updates, it's true that they are not so frequent, but they happen because the data is always reprocessed. So the catalogs are not the same, so they have releases as well. So if you think that is better one solution or the other, or we can try any other possibility, yes, please let me know. I will be here today and tomorrow. So the release timeline, as I said, the final release will be for mid-2016. Then we will continue with the technology roadmap. So the idea is that in this first release, we will have the high availability and the load balancing. Then we will go for time series, solar objects, and virtual reality. There is a nice project going on on this. For the integration roadmap, every time, so you have seen these nine catalogs, but then they will grow really a lot. Also in this next version, we will add some from the Japanese Space Agency, JAXA, the Suzaku catalog, and then they will come more. And so in conclusion, the ISA Sky is a visualization web tool. Of all the ISA and other space science data, from only one interface, that is the entry point, that is the important thing. There is built on top of the ISA science archive, so this is real science data, and access to the final best data products from the ISA space science mission through the virtual observatory protocols. And all of this, of course, using PostgreSQL. So as I just learned last week that Bruce Princeton was playing this Saturday here in the Barclay Center in Brooklyn, and I'm a big fan of him, I just decided that I will finish this talk, just saying that, so we will continue all that heaven will allow. So thank you very much. So if you have any questions, your feedback is really appreciated. Just please put it here, tweet it if you like it, or you have any suggestions, or directly to me, this is my email address. So thank you all.