 Good morning, good afternoon, good evening, wherever you are and welcome to the AI for Good Global Summit all year, always online. I'm Charlotte Kan from the ITU, the International Telecommunication Union. Like most of the world, the AI for Good Global Summit has gone digital with weekly online programming, allowing us to reach even more people in 2020. AI for Good perspectives offer expert insights, global visions and shared solutions for the AI for Good global community. Today we bring you the first episode of Etri's AI for making a better tomorrow on the subject of space-year temporal algal bloom prediction using deep learning. Etri is the National AI Research Institute of the Republic of Korea, so let's take a look at this short video about Etri. Three revolutions changed human history. And now, another great revolution is about to transform our daily lives. AI, it's the age when imagination becomes reality. A new era in which everything is connected, where artificial intelligence becomes commonplace and where a reality beyond the bounds of human imagination is realized. We at Etri are preparing for this new era. Faster than the changing world, we at Etri are leading the core technologies of the fourth industrial revolution through innovative research and development that reaches beyond everyone's expectations. We are creating a better world. Etri, the National Intelligence Transformation Research Institute. There are three different meanings, I mean interpretation of AI for my own experience. Firstly, many people are considered AI is a technology, especially one of the software or information technology, very specific. And the second meaning, second interpretation is AI services. Based on the AI technology, we can also use some computing powers or some network technology, then we can invent or we can also provide some nice AI services. Some examples are from a Go playing game between the AI, AlphaGo and a real person, instead of. Many people are regarding this event as AI technology, but when realized, when implemented these AI services, the Google has a big size of computing system behind and also the high speed network has been doubly linked with this AI computer and the space, the hotel where the game has been organized. That means if you implement some AI service, you need AI algorithm and data and also super computing infrastructure and also high speed network. This is the second interpretation for me. And finally, AI is some paradigm which give a great impact not only in the technology, but also in social and environmental impact through the AI paradigm. So many politicians, including our president, when he invited some Japanese businessmen, the first is AI, the second is AI, and third is AI. In this case, AI is the paradigm. So I can give you three different meanings or interpretation of AI in this context. Now, let me hand over to our expert host, Dr. Miran Choi. Dr. Choi is principal researcher and standardization specialist at Etri. Dr. Choi. Thank you, Charlotte. Ladies and gentlemen, I'm Miran Choi from Etri Korea. Thank you very much for joining us in our perspective session. The first episode is on the subject of spatial temporal algorithm prediction using deep learning. This project uses deep learning to predict algorithms a week in advance. Then it makes possible to suppress the algorithm earlier and efficiently at a low cost. Currently, this project has been applied to the Techeong-ho Lake in Korea. Next, let me introduce the speaker of this session. Dr. Ji-yeong Kim is the managing director of a smart data research section at Etri, leading one of the AI projects. He received the MS degree from Seoul National University in Korea, and his research topics include deep data analysis, machine learning, and IoT. Now, let's watch the presentation. Hi, everyone. My name is Ji-yeong Kim. I'm leading the research project of the spatial temporal algorithm prediction using deep learning. Before this presentation, it will be better to play the introduction video. After that, I will spread our research more technically and deeply. Techeong-ho Lake contains pristine nature and has a mysterious view of the cliff. The place where migratory birds, winds, and people took a break turned green. The cause is blue-green algae due to climate change, global warming, drought, and so on. Algal blooms spread across the country. The algal blooms are threatening us with dangerous toxic substances. Algal bloom has already emerged as a serious global environmental problem. The biggest worries for our people is the safety of drinking water sources. To handle this, the Korean government has established new technology policies. By 2022, Et Tourie has been developing spatial temporal AI blue-green algae prediction technology based on direct readable water quality complex sensor and hyperspectral image. There are two key technologies here. First, real-time high-density algae measurement. Currently, it takes two days from sampling to getting final results for algae monitoring. Et Tourie aims to measure in real-time using direct readable probe sensors. We increase measurement density in two ways. Fixed sensors mounted on buoys and moving sensors mounted on water drones. The big data on water quality is collected from the fixed and moving sensors and hyperspectral imaging systems. By analyzing this big data, we improve the accuracy of algal bloom prediction. Previously, it was difficult to predict or analyze algae due to the lack of data. Now, we can make the accurate prediction of algal blooms using deep learning. In Et Tourie, we aim for 90% accuracy in predicting algal blooms above the world's highest levels. This enables early suppression of algae through preemptive measures in advance. Et Tourie's sensor network-based AI prediction technology can be applied to various industries such as environment, weather, transportation, and agriculture. With a total market forecast of $220 billion by 2025, we can expect to gain global market competitiveness. Et Tourie will provide practical solutions to reduce public anxiety about algal blooms. Through advanced ICT to realize what is imagined, Et Tourie will open a bright future. In this section, I'll explain the introduction shortly. Some explanation is already mentioned in the introduction video. As possible as I can, I'll skip the repetition. What is algal bloom? An algal bloom means rapid increase or accumulation in the population of algae in the water system. It directly affects water quality and makes people in danger with toxin substances. In many countries, in addition to Korea, algal bloom makes problems. This situation comes from climate change, global warming, drought, and so on. It is a critical environmental problem that makes water pollution and economic losses. Our research goal is to predict an algal bloom earlier and exactly. If we predict algal bloom earlier, we can suppress the bloom by efficient way and low cost. There are several efficient methods for suppression. For example, algae harvesting ship, water surface aerator, flood gate opening, your ultrasonic algae control log, yellow soil spraying, and so on. In spite of all these useful methods, late response to algae bloom takes a lot of cost and time for recovery. In the total algal bloom control system, my project team is focusing on in red box. We are trying to predict algal bloom trajectory using deep learning with various kinds of data collected. There are two main challenges in forecasting algal blooms. First of all, algal bloom data set shows severe data imbalance characteristics. The target event, algal blooms are rare, occurred a few in Korea. What happens, it makes severe problems and takes time and money to recover. But anyway, it is rare, so prediction model is difficult to make. To mitigate data imbalance, we adopted several techniques such as data augmentation in order to maximize forecasting performance. I will explain more details in next slide. Another challenge is that we must forecast broad area while only small portion of target areas are available for deep learning because of the lack of available environment data. We make a hybrid model use machine learning and simulation techniques together to overcome the lack of data. Handling extreme data imbalance. Handling data imbalance is very important for accurate prediction because machine learning algorithm may be biased toward the majority classes. While minority classes are often more important in practice, we utilize two techniques to handle data imbalance. Since the data distribution of chlorophyll concentration is extremely skewed, we applied log transformation to make distribution conform to normality. Log transformation is a simple but useful technique to increase validity of the applied analysis as many of methods assume data normality. Moreover, log transformation helps the model to offer only positive values, which effectively shoot our case. In addition, we applied over sampling technique called SMOT. One of the most intuitive data level techniques is to either under sample the majority class or under over sample the minority class. We should avoid under sampling because we may lose useful information especially when the site of data set is very limited. On the other hand, over sampling can preserve all the information currently available. The simplest over sampling method would be reflicating or randomly sampling minority instances. However, it may lead to overfitting and potentially create confusion in training. Therefore, we adopted synthetic minority over sampling techniques in order to increase samples in minority regions. The main idea of SMOT is to generate new instances using gradient distance factors to the nearest neighbors. It is originally proposed for classification problems so that we adjusted to regression problem setting. Experimental setup. For the experiment, we used daily data collected from Water Quality Monitoring Station at Daecheong Lake during 2012 to 2018. The collected data consists of eight measurements, water temperature, pH, electrical conductivity, dissolved oxygen, total organic carbon, total nitrogen, total phosphorus, and chlorophyll A. As you can see at the bottom, the data distribution of target variable, chlorophyll A is extremely skewed that the algal bloom eventually are very rare. The goal is to accurately predict the concentration of chlorophyll A in seven days. A convolutional neural network model is well suited for regression type problems, capturing dependencies in and between time series. Hence, we decided to build CNM-based model. We have conducted four experiments for comparison. Blue line is the base line with no techniques. Red line uses log transformation, green line uses over sampling techniques using SMOT, and purple is the hybrid one using both log transformation and SMOT. In the result graphs, the difference looks small, but the fourth hybrid model is better in most test cases. As you can see, the mitigating techniques we applied help increase R-squared while minimize mean-square error. The fourth model shows the highest R-square and the lowest mean-square error, especially when the target chlorophyll A is above 25 milligram per cubic meter, which is the boundary line for RGR. Let's move on to the second topic, the chlorophyll A estimation for a wide target area. The prediction of chlorophyll A concentration for a wide region is a challenging issue because we only monitor and collect water quality data from several monitoring stations. Consequently, we don't have enough data to train a deep learning model and to perform a prediction of the region. Then, how can we collect the water quality data for wide regions? To overcome this situation, we make use of a hydrodynamic water quality simulation model by using the sensor data from the monitoring stations. This simulation model can generate water quality data for wide prediction regions. I will explain more details about the overall prediction scenario from the next slide. First, I'd like to explain the Environmental Fluid Dynamic Code, the EFDC simulation model in short. This three-dimensional hydrodynamic water quality model that can simulate up to 22-stage variables for water systems such as rivers and lakes. The basic structures of the EFDC model is illustrated in Figure 1. It is composed with four main components, including hydrodynamics, water quality, sentiment transport, and toxin of the water system. To perform simulations, virtual grids are used to reflect the complex shape of the river. Our study area, the Dachung Lake cell in Geum River, is divided into about 6,000 virtual grids. Among them, we mainly focus on almost 400 cells near Chuseok River regions here. In the modeling phase, the input data is the sensing water quality data and metal or logical data monitored from the surrounding station of the prediction region. Once the EFDC simulation model is generated, the model can simulate the spatial temporal changes and characteristics of water qualities and chlorophyll A concentration of the grid cells. As a result, we can collect time-series water quality data for each grid cells in the Chuseok regions. These time-series water quality datasets of the regions will be used in the deep running phase for the chlorophyll A concentration prediction. To perform a chlorophyll A concentration estimation, we use one of the variations of AlexNet, a widely used CNN model. The convolution layer of CNN can extract the spatial association of water quality data among the neighboring cells. The CNN input image was composed of the simulation result and weather data and generated by stacking them in dimensions. Consequently, each image will represent the information of our day. The detailed setup for the CNN model is described in figure below. Experimental result. This slide shows the prediction result of our model in terms of accuracy. Our competitor is the EFDC simulation result that is generated from the real water quality data on the prediction date. If from three-day predictions from August to October in 2018, the average accuracy for 90 days achieved. Root mean scale error is 1.06 milligram per cubic meters and r-scale equal 0.902. From the result, we proved that our model present of feasibility accuracy for chlorophyll A concentration prediction of spatial temporal regions. By using EFDC simulation, we can perform prediction without the burden of collecting real data for a broad region. Now, I've told you everything I want to share today. Summary. We are trying to predict algorithm based on machine learning for hourly RG suppression. We encountered two challenges. Challenge one, how can we train prediction models from imbalanced water quality data? Challenge two, how can we estimate chlorophyll A concentration for the broad target area? Future work. We are collecting more data using these sensor devices and so we hope to get better results with these data. This is the end of my talk. Thank you for listening. Thank you for your presentation, Dr. Chin. Now, I will ask you two questions on your presentation for the audience. The first question is, I'm not familiar with the EFDC model, but it looks more accurate because it uses hydrodynamic theory and topographical information. Then, what are the benefits of using machine learning techniques to predict algorithms? Right. The EFDC model produces accurate and explainable results, but various boundary conditions are needed and model parameters must be leverantly calibrated by domain exports. Such conditions and parameters change easily, so it is very difficult, almost impossible to adjust them. So, I think machine learning based approaches are more available to predict algorithm. Thank you for your answer, Dr. Chin. Now, the second question is, you mentioned that you use convolutional neural network model for the prediction. In my opinion, for the time series data prediction such as water quality, the LSTM method seems to be more appropriate. Have you tried to use the LSTM for the prediction of algorithms? Yes. As you mentioned, the LSTM achieves good performance for time series data estimation. And I applied the LSTM for the prediction as well, but it shows a similar result with the CNN model. Therefore, to ease the presentation, we only explained the CNN-based model today. Nowadays, we are developing new ideas with various types of prediction models. So, I wish I could have another chance to show the improvement in RG prediction shortly. Thank you, Dr. Chin for your detailed explanation. Okay. I hope you all have enjoyed episode one of AI for Good Perspective, presented by Etri today. Thank you for your participation and let me hand it back over to Charles for the closing. Well, thank you, Dr. Choi and Dr. Kim for the presentations and inspiring discussions. Please join us again on the 15th of October for another fascinating insight into Etri's advanced work on AI in the Republic of Korea, when the subject will be AI-based language learning technology.