 So, welcome everyone to the introduction to breakout session, Technically Climate Change, at user 2020. I am very excited that we have this session and I am totally surprised how many people showed up. My name is Oganish Vasturima and I'm a session organizer. I'm also a data scientist, but today my role is a session organizer. And I am organizing the session because I hope to increase awareness about the issue of climate change and especially how pressing it is. I hope also that we can showcase data science solutions and provide inspiration for the community. And the most important part in my opinion is to offer networking between tech people, our developers and scientific community, because I believe there is a gap over there. So, let's close it. So, why are we even talking about climate change at our conference? I would like to give some context first. So, we are currently emitting 33 gigatons of CO2 every year. And to limit global warming below 1.5 degrees. We need massive decarbonization at a scale of 10% a year. And this means the commissioning current infrastructure, which is obviously not happening. So, is there anything I could do to help? Obviously, machine learning and data science won't solve climate change, but it can help in mitigation and adaptation. And some use cases that where it already proved to be successful is, for example, forecasting electricity supply and demand to design smarter electricity grids to maximize the use of renewables. By also in reducing deforestation by analyzing satellite images or analyzing noise. And I think the area that is a little bit overlooked, but also in building educational materials, tools that can be used to educate the public. And one of the great examples is the census platform, which combines data visualization and storytelling to build the climate change scenarios. I highly encourage you to check out the tackling climate change with machine learning paper, where authors in 13 chapters discuss potential applications. So, what do we have planned for today? I would like to highlight other climate related contributions that were submitted to user this year. Then we will have a presentation from Philip Stahura, how to join AI for good movement, where he will share the lesson learned from starting AI for good projects. Next, we will have Martin Duderski talking about how data scientists can help in climate studies from his experience in working with environmental projects. And after we finish with the presentations, we will have an online discussion. And the presentation part is scheduled for around 45 minutes. We will allow for questions after each presentation. There is a dedicated, since we have many participants, there is an application and Heidi will explain in a second how it works. So you can ask the question in the app and upvote the question and we will, depending on how many questions there are, but we might only ask the questions that go to the most votes. And the next 45 minutes, we will have an online discussion. So Heidi, maybe you can just briefly mention the application for asking questions. Sure. Hi everyone from my side as well. I'm one of the organizers of this year's our user conference. You can find the link to the question app in the chat to go to the chat, go to the bottom of zoom. There you find a button that says chat. And there you can find the link to Slido, which is the app we'll be using for questions. Also, please feel free to turn on captions. We have a captioner here today. And if, if you need captions for any reason, you can turn it on by going to the bottom of zoom and there is a button that says closed caption. And if you, if that is green, then you're, you will see captions at the bottom of zoom, which might help you understand the session better. I think that's everything I needed to say for the technical part. Thanks Olga. And now let me move to highlighting the other contributions. So I highly encourage you to check out fast AI in our preserving wildlife with computer vision. So green is your portfolio tracking CO2 footprint in insurance sector and socializing the pixels by specially explicit discreet choice modeling of land cover change in Europe from 2000 to 2018. This should be available on conference YouTube channel. And I know that at least we have here today, Jan Jay, who is an outdoor of fast AI in our so there for sure will be a chance to to chat to you during the discussion. And if other authors are here as well, please introduce yourself during the discussion. So I would like to end the introduction with short call to action. Find an application for your technical skills, cooperate with experts and scientists. We usually need technical skills like yours to move forward with their ideas and start now as we are running out of time. And before I give the voice to Philip, there is one more thing that actually I should have mentioned at the beginning. As if the we are recording the session. And if you wish not to appear on the screen, you can just turn off your camera. And also, we will, we can hide your name. So this is my short announcement. Thank you. And now I give the screen to Philip. Hello, everyone. Very excited that we have that many people on this session. Let me share my screen. Okay. Let me make sure the notifications are off. All good. And we go. Space. Okay, so very excited to speak to us are to be completely honest, I'm just an replacement. Tadec, our AI for good leader, was supposed to deliver this presentation. But I am highly involved in our work and projects. So I hope I'm going to explain this, our story with ML and data science support and contributions, and also answer your questions. We also have the engine here, who is going to answer the questions with me. I am the CEO of epsilon. We are a commercial company with, I think, non obvious purpose. Another core purpose is to preserve and improve human life through technology. We are a group of engineers, software developers, machine learning engineers that feel that we can make a greater impact and that we are actually responsible to try to contribute. We recognize that we are not going to make this change on our own, of course. And it was, it became quite obvious for us at the late 2018 that we should not only contribute to open source. And not only be part of that are and software community, not only speak at the conferences and share our experience and knowledge, but also actively invest to try to make a difference and to contribute. As a group, we've decided that climate change and environment is the problem we want to focus on because we feel that this is the place we can have the most impact in. And second to that we also are the most afraid of those changes. Meet 2019 we hired our own internal air for good leader, Tadek, and we contribute with pro bono work for projects that are best aligned with our purpose and our core competencies, but also help other with reduced rates, and we try to get and secure grants to contribute even more. One non obvious effect of that change as an organization is that we actually increased portfolio of commercial projects that are better aligned with our purpose. And that also influenced some other decisions in other areas of our company and also our projects. And now just several lessons from our experience. We made some mistakes, and I want to share what, in our opinion, what worked for us in the end, because it might be useful to you as well. So first is to try not to reinvent the wheel. This seems obvious now, but it's easy as an engineer or technologist to think about the new clever way to do something. It's much better actually to speak with people that already work on those problems. They will tell you what didn't work for them and what they have tried. The second advice is to try to identify the gaps that are closing now, try to contribute with your core competencies, and try to address problems that were very hard several years ago, but now due to technological progress, just got easier or easy. Keep iteration small. This is important because you are going to learn along the way, but this is also important because you are going to share your results, and this is important for the fourth point. So when you are going to share your results, you are going to be able to network with more people, get to know people from other organizations, and make yourself easy to find. The fifth lesson is on the work. For the first half of 2019, we struggled. We had spikes of intense work on AI for good, but then we got busy with commercial work to have another spike. Once Tadek joined, and Tadek had a completely different experience than us. He actually worked on the ground. He worked with NGOs, he worked with government administration. He actually spent time in rainforests and in Africa. So he shared a lot of different experience and perspective with us, and he was committed to make regular, continuous progress in this area. And that was very important, and this actually made a huge difference. And six is to secure the work. So what I mean by this is to secure the funding. Time of engineers, our time is expensive, and also we have limited resources. In our case, we decided to just pledge a significant amount of our income and profit to fuel this mission and those engagements. The other way is to try to secure grants, but keep in mind that the competition is extremely hard, so you might need to look for alternatives as well. And we, since we started, we've identified two main personas here, and I want to give some advice to both of them. So one is engineering developer, machine learning persona, someone from the tech community. So first advice is to network, network, network, so this is also what Olga has mentioned. Get to know people from administration, from academia, working within different NGOs on other continents as well. And then listen to them. Ask about their experience, ask what's hard about those stuff that they do, what they have tried. Educate them of what is possible and what's not. I think it's very important to say what's difficult or impossible with technology. And then offer, and only then offer the support. And for the sustainability community, I think it is important to explore existing applications in your field. And also in similar domains of how data science, how machine learning proved to be useful already. Then also get to know people from the tech community, go maybe to a tech conference. And also very important request support, ask for help, ask for explanation of some things, some solution that you think can also trigger for many people on the other side to actually get involved. This kind of request can be highly beneficial. And I think many people are going to help you. And last advice here is to, and this is advice for both sides actually is to clearly define the needs. Because if you are not going to do this right, you are going to deliver possibly only part of the solution. And even if this part is going to bring some progress, it might not bring any impact. And impact is what we aim for. So model itself is not going to help if it's not useful, if it's not driving some decisions, if it's not helping decision makers. So the project needs to be complete. At this time, we also engaged with different organizations and we strongly encourage to do that as well. These are some examples here, you can get to know other people that also care and actively work on those problems. You can team up as well. So go to those websites, try to get involved, get to know people. Some of our project case studies, everything is nice in theory. But I think learning about project that are happening or happened might make this discussion more useful to all of us. So here's one project where we use two of our competencies. One is deep learning model and this deep learning model was delivered in Python. And the other one is decision support systems with Shiny. So we've embedded the model to identify buildings and then classify post disaster damages to those buildings so that people can within 12 or 24 hours whenever new satellite images available, identify the damage, for example, after the flood or hurricane or other natural disaster. So this is able to help people to direct help to the right places and act much faster without the need to work on the ground. So here, similar project, we've helped IASA organization productionize tool to help decision makers direct funds after natural disasters in Madagascar so that people can distribute resources that they have to the right places. So we think that this type of tools are going to be more and more important so helping decision makers make better decisions also in the public administration. Another project involving those two core competencies, one is another computer vision work. So here you can see a computer vision model identifying wildlife animals on the images, photos from camera traps. So these images are from natural parks in Gabon and this project is collaboration with the University of Stirling with researchers from the university. And without computer vision, they have just vast amount of data that requires manual work, but that's a lot of manual work. So most of those images are not used at all. With computer vision, this is automated. So we simplify that work, but we also improve the results because as you see, those models are able to identify animals during the night so we have night vision. So we also get more reliable results that we can trust. But this project is an example of something that would not be useful if it's just the model. We actually had to build a tool for people in Gabon to work without internet connection. So we've embedded the machine learning deep learning model with an electron app so that they can use it on Windows and they use fast AI model on Windows hardware, on Windows with legacy hardware without internet connection. And they can detect animals but they can also explore and make insights based on those classification results. And they can learn what kind of species were identified, how big is the population and also what is important, how many rare species were identified. So these types of insights are crucial to secure biodiversity in this region. And we hope that after this project is going to be successful here, we can also implement this solution in other regions as well. And last recent example. So we've been all influenced by COVID pandemic. This is the tool from Epicentra. They're COVID dashboard. This is a shiny dashboard built in R. But without optimization, this dashboard was very, very slow. So we've helped them to increase the performance of the tool. And on the right hand side, you can see the dashboard after performance optimizations on the left without. And we all know that performance is important because people can get more insights at the same time they spend with the dashboard. But that's not only this, they also spend more time with the tool. So they get even more insights within the time they use the tool. We've learned that after those changes, the average session duration doubled. So I think that even those types of projects are important and we can collaborate to improve that work we do on both sides. So this is our experience. This is our story. We encourage everyone to collaborate with people on the ground. We collaborate with NGOs, with government administration, with academics, and this proves to be successful. We encourage you to get in touch with us if you think that there is a space for us to collaborate. We also hope to inspire others to join this growing global movement. And I hope that our story is going to be useful and maybe even inspiring to you all. I would be happy to answer your questions. Thank you. Thank you, Philip. So I know that we have one question already in the application. I just want to mention that you can ask the question to Philip through the application. The link to the application Heidi posted in the chat. So actually we have two questions. So Philip, can you briefly tell us what you did to optimize the Epicenter Shiny apps? Do you have slides or documents on performance optimization? So I know that we, this might be actually the good question for Jordan. I know that we have blog post plans. I'm not sure if it's published already. We actually have a series of blog posts about improving Shiny applications from very beginning. So just cleaning the data to the infrastructure level. So we have a series of blog posts coming in collaboration with RStudio. So we'll send those to the organizers and hopefully get them out to you. So can you say where exactly you can find the blog posts? Because I'm not sure if this is clear to everyone. They will go live on RStudio's website. And if there's an email list for this group, is there, do we know? No, we are not collecting emails, but maybe we can share some materials on the through Twitter afterwards. Yeah, they're not live yet. They're going live towards the end of this month. And they'll go live both on RStudio's blog and Absolon's blog. So to at least give some answers here. I know that some part involved replacing the player which is highly useful, but not always optimized for performance with alternatives. So that was one. The other part was doing some work on the JavaScript side instead of R. So sometimes it's better to preprocess the data at the client's side. And that proved to be very useful as well. But that's just a fraction. We also worked with high-chart JS implementation to improve its performance. Sadly, I was not technically involved. So this is only what I've learned after the project. The good part is that this project is actually open source as well. And it is on Epicenter GitHub. So you can actually go and see our pull request and contribution. And we can share this on Twitter as well. Thank you, Philip. And we will do two more questions. So please upload the questions you want me, the last one that you want me to ask, Philip. And the next one is how is your work funded? So mostly self-funded. We were successful enough to have enough of commercial work so that we can reinvest our profits into doing this work. I must admit that getting grants is very hard. We hope it is going to get easier with our portfolio. But up to this point, we only were able to get money for cloud infrastructure, which is a lot as well. Cloud infrastructure, if you train ML, is expensive. But our time investment is currently on us. Okay. Thank you. And the last one is how do you decide to work on project with a local NGO in a given country? What are the criteria to start the collaboration? So we actually want to simplify the process. We want to help people potentially in batches, different organizations. We see also a large need for education and closing this gap of what's possible, how to use the data, how to do some easy stuff with the data or the infrastructure that they have. So we don't have right now strict criteria. We strongly prefer projects that require help with R or Shiny where we are experts or computer vision where we are experts as well. So for example, if right now someone would ask us for help in NLP, we would of course look into the project and how we can help. But it might be a longer process that with computer vision where we are able to build a POC within a day or two and know that we actually improved the current status quo. So two things, if I can add to that, whether it fits with our core mission, like what will be the positive impact of the project and whether it fits our core competencies, because we might not be the right people for a given project. So mainly those two things. More questions? Olga? If you lost Olga for a second. Sorry guys, I am back. But my daughter actually took the internet extender from the socket and I lost connection. Anything that can go wrong will go wrong. Sorry for that. So Philip, did you manage to hear the third question or maybe someone took over and read it? Yeah. So let's move to Marcin. We can continue with the rest of the questions during the discussion, but let's say we are already a little bit off the schedule. So now Marcin is your turn. Okay. Can you hear me? Yes. Hello everyone. My name is Marcin Duderski. I'm a biologist and a biologist from the Polish Academy of Scientists. I am an environmentalist. In this Venn diagram, we can see hacking skills, statistic skills and substantive expertise. Most of environmentalists are not good in data analysis, especially in data science. Because as biologists, environmentalists, we are looking on the relationships between different parts of the total environment. You can see this at the scheme here. There are a lot of components we need to look on each relationship and we have to explain every pattern. For that reason, we have to be more focused on the patterns, on the observations and explaining them, then on the predictive power of our models. This is a limitation. Not many biologists have experienced in data science. And I would like to show you some of our problems, some of existing solutions which are adopting from the data science and what can we do to expand this. The most specific needs and problems are accounting for site specificity, context dependence and local context, making a good generalization, but this generalization cannot be injustice and omit some groups. Another is more focus on the pattern explanation than on the just predictive power of the models. And as naturalists, we always have a problem with data. Too much data, too small data or scarce data. Examples, we are talking now about era of data in biogeography, biodiversity. There is a lot of data. Much more than we are able to analyze, but data are not perfect. For example, in the global databases of species occurrences telling us where on the map the species is, we have an underrepresentation of global south. Most of observation comes from the global north. In global south, we have the most important and most threatened species, global biodiversity hotspots. So this is one of the problems, but even when we look on the Europe, we can mark a line of the better and worse data coverage. Here you can see the occurrence records of English walnuts, very common three species in Europe. And you can see that in central and eastern Europe, there's not too much data. So we have no clear presence, absence data for analysis, but machine learning provide us a good tool. This is Maxent, Maximum Entropy Models, developed especially for presence omit data, which is generating pseudo absences, and this is used for species distribution data. You would ask what does have common with the climate change? For example, we can predict species distribution using the climate data, which are available for the whole world, and predict which three species are the most threatened in Europe. In our study, we showed that nefarious three species, especially norway spruce and scott spine, which are the base of forest management and wood production in central Europe, are threatened. You can see this at the red color that in this area, in next 50 years, these species will be in a not suitable climate, so we'll retreat. This is okay in Europe, but we have still a problem with global south, especially with species which are expansive invasive. There are a lot of possibilities to fill this gap. There's a lot of data in Google Street View, satellite images, citizen science services like iNaturalist, or even YouTube can provide a good data about species distribution. However, it is hard to dig this data, and here data science can help us get more valuable data, validate it, and prepare the better risk assessments and plan for mitigation. Another problem is data processing. We're always laughing that 80% of time is data processing, 20% is complaining for this. We often have a lot of data, which is in fact not data, because many observations are describing one study plot. Sometimes one data point costs hours of days of our work, and as we love spreadsheets, it's hard to cope with this. For example, we would like to see carbon cycle, because carbon dioxide from atmosphere is accumulated in the biomass. We can go to forest, take some simple measurements, and have dimensions of forests. To know how much carbon dioxide from atmosphere is accumulated in such forests, we need to add some trees, drag them out, wax them, and prepare a simple power equation showing the relationships between three dimensions and biomass, which we can simply say that half of biomass is carbon stored in the biomass. Okay, eight trees, not too much, much to achieve reliable fiscal results. We had to cut 3,500 sample trees, wax all of them, and this project took three years. 420 study plots, and in every study plot, six separate spreadsheets because of data format. The solution of data science was a diverse and marked down, because the bureaucracy required from us reports for each three months. Filtering, processing, conditional feed of particular models allowed us not to die in the depth of the data and data quality tracking. This is one of the results of this study, models showing how much carbon is accumulated in the particular forest, according to particular forest dimensions. So we can use these models like to forest data bank, which is for us big data, because it's three million rows. It's not big data in fact, but for us, this is amount of data which requires from us a lot of patience. So we call this big data. How to expand the application of these tools? We have now models. We cut a lot of trees to reach the good models and to know how much carbon we have in the forest. But we can match this with ground level measurements, published data, satellite images, forest inventory data to provide spatially explicit site-specific models of forest growth and optimise forest management to maximise all exo-co-system services, especially in presence of carbon accumulation to know how much carbon is stored and what to do to not release this carbon dioxide and to not amplify the climate change effects. What I say at the beginning, revealing patterns. Sometimes it is impossible with simple models and machine learning is very seducing opportunity. However, without explaining tools, this is just a black box. So we cannot tell whether the model is correct or not, whether it falls or no, because of logistical assumptions. For example, we have a forest inventory data from five multinational parks in Poland. We would like to see what is more important, geomorphology, climate or forest characteristics to forest biomass, to particular tree species occurrence. We have a lot of noise in this data. We also have site-specific patterns. So explaining this by simple models is impossible. However, fitting a good, for example, random forest model allows us to derive patterns, but machine learning is not the good answer for us, because we would like to see what is behind the black box. And we can see how each of the predictors is working. We can use Daleks and great tool for explaining machine learning to see how assuming all other predictors at the constant level, the predicted variable is changing. So we can conclude about effect of particular variable separately. You can see here the website of the Daleks and its philosophy. It's the tool which changed my work and my working life. Machine learning is helping also when we have unbalanced sample. This is a classic of our biological studies. For example, we would like to see what is the survival of this invasive species, which is reaching more and more habitats due to climate change. We were tracking fate of 5600 siblings with labors, which you can see on the photo, by three years. We expected that maybe 10, maybe 20% will survive, but only 262 survived. Okay, this is a result, but it's not enough. We would like to know why, but the sample is unbalanced. Most of the classificators will be okay with the error rate of 4%, so we'll show no trends. However, there are a lot of data science solutions. For example, a small algorithm which down samples and up samples observations and allows for conclusive predictions. Here you can see the example of patterns derived from so biased data, but leading to the biologically reasonable solution. Machine learning is good, it's a revolution, however, it requires from us integrative approach. As Philip said, the networking is very important. Now we know that we have to increase the range of variables which we are using. Social, economic and environmental variables together can explain more and help us better project, better work. Another important thing is data science education, which is at the above level. We have to put a lot of effort to understand and be able to do new technologies. Also, the important part is common language because gap between scientists, stakeholders, data scientists needs to be bridged. And the language of Daleks is especially good because it overcomes problems with formal statistics, values and other formal things which makes communication harder. Coming up my presentation and my experiences of data scientists and environmentalists, data science provided us a lot of tools to solve our problems. Methodological advances bring new solutions and make some analysis which were earlier impossible. However, as we have no data science training and lack of time to know it all, we need to collaborate. Our local meetings and conferences are the best place for networking. I personally learned a lot from people who are working in the post-19 Polish art environment. So cooperation is more desired. We also need to stress that diversity of minds brings productivity. This is not only biological relationships but also it works in data science because I learned the most from the geographers, climatologists and mathematical statisticians much more than from biologists working in art. What can we do as data scientists? Support researchers needing help in your nearest vicinity, in your environment, in your local group. The platform for contact between data scientists who would like to help and environmentalists needing help is especially needed. A good thing which should increase the collaboration is avoiding jargon and using easy language for both sides because we have problem with communications and the ideas of using data science in environmental studies which are connected with climate change are easy to take just as start and do something. So I hope that we can go to the better work by collaboration between data science and environmental science. Thank you for your attention. Thank you, Martin. Let me check if there are any questions for you. So we have two questions. Have you felt some resistance to purely predictive models from a scientific community which is generally more interested in explanatory models? Yes, of course. In some papers I had to more develop the hypothesis behind the variables used in the model even in predicting carbon mass accumulated in the forest which is obvious how the bigger tree the more carbon the reviewers are very hard on this. Also in other models using the landscape patterns we had to write more about how we used machine learning. In three papers I had to tell why I use machine learning instead of classical methods because it is better. Thank you very much. The last question. You said you use Daleks to interpret machine learning models. Is there any functionality missing that would help you understand the models better? For this time Daleks is fulfilling all of my desires and it's great. I can't see the gap now but I think that in the nearest year it will appear. Thank you very much for your presentation. So we are done with the presentation part and this is also the part where we can stop recording and move to the discussion part.