 COVID-19 is undoubtedly the news of the year, if not of the decade. While thousands of health workers are working tirelessly to help those who fall sick, data scientists are trying to explore the virus through data mining and machine learning. In this video series, we will show you how almost anyone can take the COVID data and explore it. Let's start by getting the data from John Hopkins University's GitHub page. You can find the link in the description below. Just copy and paste it into the file widget's URL line. Press Enter and voila, the data is here. A quick glimpse into the data table shows us how the data looks like. Please note that this data is updated regularly so your results will differ. Now let us see the data. Line plot is the perfect visualization for sequential data such as ours. The curves represent the countries and the x-axis in our plot is the date. From this plot, we cannot see anything really since we have so many countries and the lines are thin. Let us select a few lines by dragging across them and inspect them in the data table. Unsurprisingly, these are the US, Italy, Spain and so on. The countries with the highest number of COVID-19 patients. Okay, we know which are the countries with the highest number of COVID patients. They're all over the news. Surely we can dig a little deeper. Orange comes with a handy data sets widget which by chance happens to have an HDI data set available. HDI or Human Development Index is a data set with many country statistics. So let us add it to the COVID data. Place data sets on the canvas, load HDI and use merge data to merge the two data sets. Make sure to match country slash region from the COVID data with country from the HDI. Let us inspect the result. Data table shows us there are some countries with missing HDI. These were likely not matched properly since their country names differ in the two data sets. To fix this, we will use edit domain and insert it between data sets and merge data. We first transform the country variable from string to categorical, which enables us to edit the country names. In the data table before, we saw that the United States was not matched correctly. So look for it in the edit domain. Double click its name and change it to US. You can find all the other countries and change their names too. Press apply once you're finished editing the data. Great! Now we were able to bring in such data as population and number of doctors per million. Let us construct a pipeline that will tell us how many COVID patients there are relative to the entire population. First, we will remove countries with an unknown population. I will connect select rows to merge data and set total population to be higher than zero. This will come in handy in the next step, where I will use feature constructor to compute the cases to population ratio. I will select the final data from the COVID data set. In my case, it is April 9th. Then I will divide this number by total population. We remove zero population in the previous step to avoid dividing by zero and tearing a hole in the fabric of the universe. Perfect! Now we have exactly what we need. Connect scatterplot to feature constructor. Select cases per million for the x-axis and physicians for the y-axis. This will show us how COVID spread is related to the number of doctors in the country. Let us label our points with country names to make sense of the plot. Apparently, Qatar, Cuba and Greece have a low number of COVID cases with a high number of doctors, which means their healthcare system is in a good position to handle the crisis. Luckily, no one is in the lower right quadrant, but Luxembourg, Andorra and Iceland seem to have a worse cases to doctors ratio. Of course, we know nothing about the availability of the respirators, protective equipment and so on. This is just an approximation for the health system capability and should not be taken out of context. We can inspect other variables too. For example, the number of old people in the population, which is a variable called dependency ratio old age. It is easy to find a variable by selecting the drop-down in the scatterplot and merely typing the name. Now it is up to you to explore potentially interesting relations or even bring in your own data now that you know how much data works. If you wish to dig even deeper into the COVID-19 data set, have a look at our blog, whose link you will find in the description below. And don't forget to follow us for more videos on analyzing COVID-19 data with Orange.