 Hello everyone, welcome to the next webinar about the Jupyter notebooks. My name is Stanisław Krzyżanowski and I will have the pleasure of introducing you for the next 60 minutes to Jupyter notebooks on Kriodias. Firstly I would like to suggest a couple housekeeping rules. First of all, if you'd like to ask, if you have any questions or remarks or you would like to share any thoughts with me or with the rest of the viewers, please type them in the chat. I will answer all the questions in the end of the presentation. We will have a short Q&A session. Secondly, today's presentation will consist of two parts. In the first part I will introduce you to, I will answer the question, what is Kriodias for those of you who are not familiar or would like to refresh their knowledge? Yes, there is a question whether the video will be available after the workshop. Yes, it will be available. It will be available on the CloudFerre's YouTube channel. You can find it from our website. You can find it by typing CloudFerre into the YouTube search engine. Coming back, today's workshop will be divided into two parts. During the first part I will answer the question, what is Kriodias and what are the most important components of it? I will try to show you how and why it is a very useful platform in this day and age. During the second part I will walk you through the Jupyter notebooks on Kriodias. I will show you what the notebook is from the very beginning. I will guide you how you can utilize it and we will hopefully end up with a very simple example of data, let's say sentinel data processing using the notebook. Let us begin. First question, what is Kriodias? Kriodias is one of the five at all and four commercial Copernicus Dias platforms. A Dias is a data and information access service. It's each of the Dias and Kriodias in specific is a cloud platform that brings processing to the data. What the data, you may ask, I suppose, that you know because you are here. We host lots and lots of observation data. Our main data broker is the Copernicus program. We will come back to it later. But now let's focus on what are the major components of Kriodias as a platform. The first very important part is the EU data repository and catalog. As I've mentioned, Kriodias is a platform that is hosting and disseminating Earth observation data. A majority of the data is sourced from the Copernicus program. We are talking about Copernicus program, which is one of the biggest Earth observation data brokers in the world. We are talking about influx of data at a range of 25 terabytes daily. Currently we are sitting at, I think, above 21 petabytes of Earth observation data hosted online in the Kriodias and the EU data repository and catalog component is extremely important. It's our backbone when it comes to handling and disseminating the data, downloading and disseminating the data to the users. It consists of a couple of substantial elements. Firstly, we have the data repository in which we keep the online products. Then we have the data catalog, which stores the information about the data. In order to be able to access the data, you have to be able to look for it when we are talking about such massive quantities of it. We have a set of tools called ingestion engines. Those are applications that automatically download and index new data into the system. We have another set of tools for accessing the set of the repository. We offer a wide variety of those tools. From NFS through HTTP, we also offer, we also host the web map service-based applications. I will talk about it a bit later via our cooperation with Sentinel Hub. This is the data component. The second important component is the cloud services component because only having the data online would be not enough because in order to derive added value from Earth observation data, usually you have to interact with not only two products, but bigger sets of products. That will require you to download it and then to have the processing power. That's why we offer cloud services, which are placed within the same environment as the data repository is. Our users can access the data immediately and process it and then share it with other users. We have the cloud services that allow for it. We base our cloud services on the OpenStack technology. We offer all classical cloud services such as computing, meaning virtual machines, networking security appliances. All those things you may require when building your own application and value chain on our platform. But it's not all. We've built also a set of tools and applications and platforms to make it easier and make it possible for Kreadias users to interact with the data. We have the Kreadias portal, the Earth observation data finder, EO browser, cloud dashboard is the graphical interface management tool for the cloud services and other third party applications. Today we'll be talking about Jupyter and notebook, but I will show you also the Earth observation finder to be able to compare different means of accessing the data and interacting with the data. I was talking a lot about the Earth observation data themselves. So let's talk about them a bit more. Here you can see a very pocket table listing some of the products that are available in the Kreadias environment. As you can see, a very backbone of our data offer are the products produced by the Sentinel satellites. Depending on the type of the product and on the processing level of the product, they are either available fully online. That means you can access them immediately from the environment. Or there can be orders to be generated for the user or accessed from ISAS long-term archives. Additionally, apart from the Sentinel data, we also host the coverage of Europe for the Lancet satellites. We have some additional satellites such as the ANVSAT. We also recently started offering data, very high-resolution data from a number of satellite operators. If you're interested in that, you can check out that data on the Kreadias portal. We'll go to the Kreadias portal so you will see how it looks like. Additionally, what is not listed here, because I wanted to make the presentation short to switch to the hands-on part as quickly as possible. Additionally, we also host a number, I think four out of six. Copernicus services on the Kreadias. Copernicus services are quite wide, wide sets of data collections. Those data are usually derived from satellite data themselves. Sometimes they're combined with on-site data or there was some processing performed on it. So this is our data offer. It's wide. Currently, we're sitting on over 21 petabytes of data. As I mentioned, we are growing by 25 terabytes of data daily, on average, of course. We are doing our best to hold all the data online, or as much data as we possibly can, because it's the easiest for our users. It makes the data the most accessible in all cases. And as I mentioned, when I was describing the platform itself, we have developed a number of user tools that are helping our users to manage the environment, to utilize the data, to leverage the cloud services that we offer. The main tools that we have in Kreadias is, first of all, the Kreadias portal, which serves as an information and management hub. You can find links to all the other tools from the portal. You can find our extensive FAQ base and knowledge base and number of guides on how to use different parts of the environment. And there is also a forum using which you can interact with our user support, our user support, which is very responsive and will gladly help you with any questions. Then we have aforementioned Cloud Dashboard. Cloud Dashboard is a graphical user interface tool for managing your cloud resources. As I mentioned, our cloud is based on OpenStack, so you can manage your cloud resources not only through the dashboard, you can also manage it through the API. It's a standard OpenStack API and through the common line interface. Then we have two important tools for browsing and data search. As you heard a couple of times by now, we have lots of data. In order to interact with the data in an efficient manner, one needs proper tools to do it. First of all, we have the browser, which serves as, let's say, the easiest tool for accessing the data. It allows users, even non-registered ones, to choose certain data products and visualize them on the map. It uses Sentinel Hub Engine generating web map services. Then we have more advanced EU Finder. This is our main tool for data search. And we will take advantage of the EU Finder API today because we'll be looking for some specific products. And finally, we have the Jupyter Notebook, which is the topic of today's workshop. Jupyter Notebook is available only to registered users. However, it is free of charge. It is hosted on our virtual machines. It offers each user up to two gigabytes of RAM. It offers up to two gigabytes of storage. And it's probably the easiest and the first tool to use when you want to try the environment, get to know the platform, get to know how it works. And when you want to actually learn something, how to process the data using different programming languages, I will now break my own housekeeping rules because there is a question very in line with what I'm talking about. Paulina Bartkovic is asking, is it possible to use our language in Jupyter Hub? In fact, yes. The CreadIaaS Jupyter Notebooks support our language. Today, we'll not be talking about it, but it's fully usable and it can be done. So please follow me to the CreadIaaS portal. As you can see, we have now ended up on the main site of the CreadIaaS portal, which should be your hub for anything you need within the environment. Today, we'll be talking about the Jupyter Notebook. So we will head over to the tools and then we'll select Jupyter Hub. And as you may remember, I have mentioned that in order to use Jupyter Hub, you have to be registered. Because I was pre-registered, so there was no registration step, but should you try it, you'll have to set up an account and then log in. So first of all, I would like to talk a bit about the notebook in general. What you can see here is the main page of the notebook. We shall now create a new notebook. Today we'll work with Python 3, but as mentioned before, you can see there is a number of available programming languages for the notebooks. You can also use the R language. Also, Julia is supported as well. So let us go into the notebook. First of all, what you can see here is a standard notebook interface. You can work in two modes in this interface. You can either interact with the cell itself. This single line is called a cell, or you can interact with the notebook. The difference is indicated within this color here. When it's blue, you're interacting with the notebook. When it's green, you're interacting with the cell. We have two main types of cells in Jupyter notebook. We have code cells and so-called markdown cells. Markdown cells are plain text, basically, with some formatting options. Our code cells, that's where you prepare your code to be compiled. I will now very quickly show you what you can do with markdown. From my experience, it's very useful to learn a couple of shortcuts. First shortcut is M. This way you can change your cell type from code to markdown. You can also do it using this drop-down menu. This is code, this is markdown. We show what we are talking about. As I mentioned, we have two cell types. We have code, we have code, markdown. I'm using different formats to show you what you can do with markdown as a whole. In markdown, you can do a couple of things. You can list things, you can bold text, you can italics text. You can create also code blocks. An example of code block would look like this. As you can see, I'm now typing and formatting the text. In a minute, our notebook will process it. Now I'm showing you an example of code block. We can do the most typical hello world. If you want the cell to be processed, you can either use the run button or use a shortcut. You can push control, enter. Then we have one cell ready with some description of what we are doing. Next useful shortcut is a shortcut for creating additional cells. You have actually two shortcuts. If you want an additional cell below the one you are in currently, you use B. If you need a cell above it, you use A. Easy as can be. If you want to delete a cell, you use X. That was the markdown cell. Now we have a code cell, and this one is a code cell. It's indicated with this parenthesis here. We'll once again use the most basic code to show you the basic principles of Jupyter Notebook. We have created a variable X, and if we process the cell, it will say hello world. Additionally, we can also create another cell with our B shortcut. We can ask the notebook to give us the output, so now basically in this cell, I've asked the notebook what is X, and it tells me that X is a variable with such string. To summarize it, maybe not to summarize it. Some more information about the markdowns. We can have bigger headers. Now I have forgotten to change the cell type, and I can use my shortcut M to change the cell type and process it, and then I have the header. We can have smaller headers. Now I will use the drop-down menu. We can have smaller headers, header 2, and if we process it, then it goes. There is much more to the markdown and the means of formatting. If you need anything, I would suggest you should just Google markdown cheat sheet, and you'll get a comprehensive list of all the functions that are available in markdown. It's very useful. It's commonly used to make a comprehensive understandable for everyone notebook. Now I would like to talk just a bit about kernels and variables and Jupyter notebooks. How do they act? Let's talk about kernel and variables. First of all, all the code cells are interconnected. They are only a front-end. Behind them is one Python kernel. Should we ask for a list of methods and variable for this Python, we'll get a list with our X at the end, because we have defined it previously. I will show you something. We can also define another variable, let's say we'll have variable Y, defined edgily as goodbye world, and we'll process it. Now, if we perform the same action as in cell number 4, we shall get a list of variables, including Y. Now we come to an important point about keeping your order in the notebook, because you can possibly add a cell above it and then you can actually delete one of the variables, if you like to, and should you do it, despite deleting the variable, the information in the next cell still has it, because it was processed before the variable was deleted. That's why this command should be at the very bottom. I hope it's quite clear. Another useful trick for working in the Jupyter notebooks is collapsing. Let's say we will list a long result, and now it prints numbers from 0 to 100, and it starts to look basically unreadable, so a notebook allows you to collapse a certain cell and scroll through it, as it's a neat little trick. But I promise you that we'll be talking about not only collapsing variables, but also that we'll talk about kernels. So kernel is the programming language that is behind the notebooks frontend, and you can do a couple of things with it. First of all, you can interrupt the kernel, and why would you like to interrupt the kernel? Because you don't want to crush your engine. So let's say that we give a command to print ones every half a second. This command would eventually crush our server. Not very quickly, but it would. So if we... I'm sorry, that's embarrassing. Of course, while true, I'm sorry. So if we give that command, it starts to print ones every half a second. So if we want to stop it, it will never end. So this indicator, because this indicator is important, in the parentheses you can see that the cell is still executing. So unless in this parenthesis we have a number and the cell is still executing, and we don't want it to execute forever, so we can select kernel, interrupt, and now it has stopped. It has printed quite a large number of ones in there, and we can collapse it as well. So finally, the final thing I want to show you about Jupyter notebooks is that you can actually use bash terminal commands from the node level. How you do it, how you can do it, you use the exclamation mark. Let's say we want to see where are we. So we can see that we have, in our current directory, we have the untitled notebook. This is the one that we are working in right now, and the TIFF Creator, and this is the one that will be our example. In a minute, we can also finally get to the Earth observation data at last. So let's say, and now it's important. This notebook is set up on a virtual machine that is connected directly with the EOData repository, and that's why I will now list all the satellites that are available, or the satellite data that is available from this notebook level. It will take a bit. I was hoping to list it. Let's give it another try. Well, I will show you this in the terminal. But with this, we have went through the basic activities in the notebook. I would like to talk just a bit about the toolbar, what you can do with it. So you can do all the basic stuff, such as saving, renaming, making copies. You can download the notebook, the standard notebook format is here. All the others require some changes. But it's useful, it works. You can transfer it to Latak and then share it. But the most useful way is to share it as a notebook. You can perform all the activities that we have performed using shortcuts. You can perform also using the tools from the edit toolbar. You can toggle some things, such as line numbers, quality of life changes, whether you like them, you don't have to, it's as easy as it can be. You can manipulate the kernel. Now I can actually restart the kernel. And this clears all the variables that we have put in. I can also restart and clear all the output, which has deleted out from the cells shown here. I have mentioned that shortcuts are useful, so you can learn them from the Hub button because there is a whole table for keyboard shortcuts. I wouldn't invite you to learn all of them, but some of them are really useful if you spend a bit more time in the notebooks. Now we can exit this one. We have come back to the view of the directory in which we set up our notebook. As you can see, there is a difference in the indicators between those two notebooks. The TIFF Creator is black, this one is bright green. Bright green means it's still running, so the kernel behind it is still running. We can turn it off, switching to the running tab, and we can shut it down. So yeah, now it has been shut. I promised you that I will show you an evidence that this programming environment is connected to the EOData repository, and I will show it to you using a terminal. So now we have opened the terminal. The main directory of the Earth Observation Data is called EOData, so if we go there, we can ask the directory to list itself. As you can see, we have the satellites listed. In each of those folders, you have all the satellite imagery produced by that selected satellite. Today, in my short use case into which we will go shortly, we will be working on this in the middle of two data. So let's go to the actual use case, what you can do with the notebook in regards to the data. I hope now it's a bit more readable. We have prepared a very short example of what you can do with the notebook to process the data. So first of all, we shall select a single satellite image. We will be working on the Sentinel-2 L1C image. It's a standard Sentinel-2 image in the indivisible range. But to do it, we have to interact with the CreoDiance EOData Finder. The Finder is an engine that allows us to search for products accordingly to some defined parameters. In our case, we shall query, as you can see here, we shall query for imagery covering Warsaw during the last winter, actually, in 2020. And we allow our cloud cover to be between 0 and 5%. And we shall then list 10 of the products with the lowest cloud coverage and select the one with the lowest. So let us do it now. You can see we have listed the products. And now what I've mentioned, the collapsing and decolapsing, I would say, of the results. We have listed products and we have selected one. If you are familiar with Sentinel-2 data structure, you can see that first of all, you don't have to be familiar to recall that. I have mentioned that all our data sits in the EOData repository. And the output of this cell is the path to a certain product. Now we have selected it, we have to open it. For this case, we've decided that we'll open bands 2, 3 and 4 to create a close to visible image as a geotiff. In order to do it, you have to familiarize yourself with the structure of Sentinel-2 data. It's available in ESA's documentation. Pretty straightforward, we use it here in order to be able to open selected bands. So using the product identifier, we open the bands we would like. It has been done. Then we shall create a geotiff in which we will write down our bands. To be later displayed as an image. This geotiff shall be saved in the parent directory of this notebook. So after we execute aforementioned cell, in this directory, we shall see truecolor.tiff file. Which will take approximately, which approximately will take probably 30 seconds. As I mentioned, the computing resources assigned to a single notebook are a bit limited. So if you start using Jupyter notebooks on Creadias and come up to the point when you decide, well, there is not enough resources for me. I have used, I have done my max with what is given here. You can set up your own Jupyter notebook on virtual machines in our environment in the Creadias Cloud. And they will have same access to the EO data as shown here. But in the meantime, we have produced our geotiff. As I promised, here is the truecolor image, in this case, file. So the last step for this very simple use case will be displaying the image. You can do it in main number of ways using number of tools for this. We shall use rasterio, simple and pretty simple library for this use. And it's done. We have selected an image from our Warsaw, we have processed it. I mean, we have selected the product, we have opened it, we have saved it as a geotiff and now displayed it. And it was what is most important probably, it was all done within the Creadias environment and it didn't require us to not download the data anywhere. So we basically skipped that big obstacle of downloading lots of data. We have done it with one product. It can be done with a dozens of pictures. You can make time series with it. You can do it in the free to use notebooks. You can do it using virtual machines, installing notebooks on virtual machines. It's a really powerful ability. So this summarizes my hands-on part of the workshop. Now I shall stop sharing my screen and we'll come back to the presentation. I'm sorry. That wasn't intended. Here we have the Sentinel-2 product structure. That's what I mentioned. When I mentioned that we are opening the certain bands, that's what we have done. We went from the product to the granular to tile folder all the way down to the image data. We opened bands 2, 3 and 4. Having this in the background, now I can answer your questions because we still have 15 minutes to do so. We have a question. Is the notebook online to let us follow your example? No, this notebook was not available online. However, it can be made available to you and we can share it to you after the presentation, I'm sure. Next, we have a question. What is the memory limits for Jupyter notebooks? So for the RAM usage, there is a strict limit of 2 gigabytes. As for the storage, there is no strict limit per se, however, should we notice that there is too much storage being used by the notebooks, we will probably limit the storage of the user. You can freely expect to be able to store a couple of gigabytes of data there. It's not a problem. Do you have any other questions regarding either the Jupyter notebooks or what we have been talking here or any other matter that you think I may be of help? I would consider, is there a way, okay, there is another question, is there a way to import other data? Yes, there are a couple of ways to import data. You can either upload the data into the notebook directly from your working station. You can interact with external databases within the web, however, this ability might be limited depending on the database you selected. But should you face any limitations, you can also contact us and we can probably lift them. And you can interact with the notebook from a virtual, from your virtual machine within the creator's environment. So it is possible to set up a virtual machine that will be storing certain data sets and sharing them with your Jupyter notebook and it can be done in a similar manner to how we have accessed the EO data. I can see that somebody is also typing, it was thanks. You're welcome. It's my pleasure. There is another question, is it possible to analyze data free? For example, NDVI, if I understand your question correctly, yes, you can prepare a script that will calculate an NDVI of a Sentinel to a product in the Jupyter notebook. It should be fully doable. We have the next question, is it possible to share the notebook with other users on Creeodias? Unfortunately, currently it is not possible to share notebooks, however, it is being planned to create a shared space or shared library for Creeodias users to share their work notebooks as well. Thank you so much. Have a good day. Till the next time.