 Welcome, this is the English language translation of the Open Sensor Data Talk by Felix Erdmann, live from the 36K's Computer Congress in Leipzig. All talks at Congress are live translated between German and English and to one additional language. Visit C3lingo.org for more details and information on how to access the streams. We appreciate your feedback. Please use the hashtag C3T, your translators for these talks, RQV, VH1 and Pete Pioriti. So please a round of applause for Felix Erdmann. Hello, together. Thank you for coming and thank you for this introduction. This nearly covers my whole first slide. I will be talking about Open Sensor Data for everybody based on open source, open hardware and educational resources. It's an open sense block project by the University of Münster and we started this a few years ago with small Arduino kits and went into the citizen science project field and first I would like to tell you something about myself. My name is Felix Erdmann. In 2012 I was a student intern at the Institute for Geology Information Science and I had first contact with sensor data and Arduino and different kinds of sensors, environment data and I put this small module on a drone. That was basically my first point when I started with this field. After graduating from high school I studied this field and graduated from university and now I'm a teaching assistant at the sense block project at the university. What is sense box? Who has heard of that? Sense box is a DIY toolkit for stationary and mobile sensor stations based on open hardware, especially on the Arduino platform. We got different versions of the sense box that we developed and we distributed them to cover different fields and areas so that every citizen can become a scientist and the other field is digital education. We want to connect with institutes of education, high schools and we want to show students that coding is not as difficult as it sounds and we want to establish a certain understanding of coding. The other product is the open sense map, which is a web-based platform where everybody can register his or her sensory station and look at the data collected there. We have different tools there to visualize data. One field is citizen science and everybody can become a scientist and participate in science. There are different levels for that. The lowest level is for users to simply gather data up to the highest level which means that the user not just gathers data but also interprets data and can publish scientific findings. This is why we developed the sense box home. This here is in Sao Paulo. It's a plug and play sense station. You do not have to assemble a lot, just plug it in with a USB port and you can gather environment data with this. It can be transmitted via Wi-Fi or other means. The other equivalent for education is sense box Edo. We have a microcontroller here and different sensors. Usually it's temperature, humidity, pressure. You could also measure particles in there. It's also plug and play. But here there are a few more components, a few buttons, display, and you can connect that manually and enable different projects. And then you can program this box. We use Google's programming tool. Like a scratch you put together different pieces. And with this interface you can easily assemble your own code. It's meant for younger students who do not have any experience with coding, and more experienced students can then just enter code like we see on the red side here. All this is based, like I said before, on Open Hardware. We've developed our own module based on Arduino. It's the celled sense box MCU. And the reason I've done that is because we are using Arduino Uno, which we started on, that you couldn't use all sensors at the same time. And you can upload the code. The memory capacity wasn't high enough, so therefore we added this module and modified it. We have different ports, ITC, analog, digital and serial ports. And they have plugs, which you can only plug in one way so that you can't get a short circuit with the sensors. So it's rather easy to build your own sensor station. It has two XP ports, which you can transfer data from, be it via Wi-Fi via Lora or an SD module where they are saved. So you can do it offline and collect them with a data card. And then have a look at them at home. All the circuit layouts, the copper files and libraries, they're open. They're all available in GitHub. Anybody who has the means of building it, they can solder and assemble it themselves. As I said before, the open sense map is the backbone of the project. So everybody can register their own box, send the source code and then can upload their measure data. As you can see here, on the whole world, we are distributed, of course, in Germany. In Europe, there's the most, but they are on even crazier locations. There are boxes which send their data onto this platform and the way you can view them from. In addition, there is more analysis for the data. When you click on a station, you can see details. You can see information and you can diagrams from data. For example, somebody uploads a picture. You can see the temperature on the temperature curve of this box. As I said before, this is all open here. So the open sense map, not just sense boxes, can send their data there, but everybody can. So if you've got something else, or if you Raspberry Pi or an HP, you can send it onto this platform. We have an open API with REST, so it can accept all kinds of data. Some examples are the seed box where it can measure the water flow and distribution direction of the flood. You can measure air data rather easily and send it into this platform. Then there are sort of more unusual ones which are being built in the garden, which measure wind direction and speed. Smart citizen devices can also send it, and obviously the seed box, and even self-built stations and systems. And then we're not limiting this to a phenomenon, but you can measure the water temperature, humidity, sound, all sorts of gases. As long as you can measure it, you can upload it. So the source code is freely available on GitHub, which is the sketch, the code that's on the Arduino, the Ansbach MPU, the front end, all the API and the services, which we developed. You can download them and look at the source code. The website and the teaching materials, this is all available on GitHub, and the API is also open, so everybody can upload their data. So these are all educational resources. They are available for teachers and institutions because the hurdle to get into using the seed box can be quite large. So many people don't know how to use it, don't have the time to get used to it, and we want to lower the hurdles of entry for digital education. So I want it easier to develop projects that you can move along. So theory and practice, for example, it's here, a traffic counter with an ultrasonic distance counter, which you can use to measure distances. And in this example, it shows you theory on how to check the access and how to connect the sensor, what you can do with the data. One example that was used in school, and then the students, they could check the amount of traffic in front of the school, and then they really measured how many cars were driving past the school in about a quarter of an hour. And then we could really put facts onto the problem, so when we say there's too much traffic, but nobody really knows, in this case we can rather easily get the data and then analyze what we can change. Just a little bit on the timeline. So for my personally, the project started in 2012 when I was a pupil doing some volunteering at the university, and then the first models were used in teaching, all based on you doing uni in all sorts of combinations with devices. And I put this kid on a drone, flew it around. The real start was 2016. We got a support from the grant from the German Ministry of Education, and we get all sorts of results on the open sense map for all sorts of modules that you could interpolate data to estimate it between different data, so temperature for example, from the hardware. We did a 3D printer to build cases to ensure the sun wouldn't go into the sensors that easily. We tested under extreme conditions in Swiss mountains. There's super cold temperatures, all of the snow, and that was the project for testing. That's what finished, and then we have a no, it's called Sandblock Pro, which just gets funding also from the German Ministry of Education, also for three years. And we want to use professional users and use really high cost senders. Right now it's just low cost. Senders are good and nice, but the real professional users can't really do something with them, so we can use different sensors and check with the industry if the industry is interested in this. On the software side, on the open sense map, you can see different analysis methods, so we better compare the data. We asked ourselves at some point, it's a citizen science project, so open science, it's called open science and we want the data to be reproducible, but we asked ourselves who really participates here, because we say everyone can participate, everyone can take over different tasks, but what kind of people are there? So in Master's thesis we did a survey and tried to figure out who are the participants. It's mostly me, so who would have thought. The age is between 30 and 55, it's the highest portion. What's interesting is that a lot of participants have academic backgrounds, they have a bachelor, a master or a diploma, and from this area there's a lot of participants. We rather thought that everyone can participate, but apparently a lot of people are not interested in, maybe the hurdles are too large, but it's mostly academics that participate in this project. For some background, so the users we asked are all users that are registered on open sense map, so all the users that registered there station once and uploaded some data, and it's not people who just download the data and we couldn't get the data of these people. So what's the motivation of the participants to participate here? So you can say more or less that everyone likes to measure environment data, collect them and publish them, so they don't just want them to themselves, they want to share them and maybe even influence political decisions. In the area of communication, we want to support the community or one wants to support the community, one wants to help in order to solve certain problems, one wants to maybe even encourage other people to participate, but really to meet other people, that was a big part or a goal for these participants. A lot of the people thought that more should be done with the data, a bigger analysis, but to participate in the scientific process themselves and to write even a publication, people didn't really want to do that. You can also see this from the main usage of open sense map. Most people just want to share that data, to measure the data and just to participate to contribute, but in the end they just look at the data, create diagrams and add sensors, play with the boxes. The numbers of the data through interpolation, for example, is a very low priority and is not used a lot. We collect quite a big amount of data, so we're already in the big data area. We have about 5,700 registered boxes, five to six thousand measurements per minute, and we have about 3.9 billion saved measurements. Everyone can download all the data by themselves and work with it. It's not really a problem because you only save the raw data, not the validation of data, but the data that the users really measure. On the whole project, especially in the infrastructure, we have some problems. We have quite the hungry servers running in the cloud on AWS, but it's going to be moved to the open stack at the University of Münster. They require quite a lot of memory and hard drive space. The database is MongoDB with four collections, and there we have the problem that since it's really a huge database, the indexing and the quest of statistics takes a long time because the whole table has to be parsed in order to grasp the data and to compute statistics. Basically, one reason, because we kept the original architecture that originated from a bachelor thesis, so we started with a prototype and just grew and grew, more and more data came in, and now we have these 3.9 billion data points, and now we have to start thinking about a more efficient way to store this. With that, we only save the raw data, so we don't really modify the data, and the data is also not validated. So from this, other problems originate if you want to analyze the data. So this is a screenshot from this morning from the interpolation here in Leipzig. The temperature, the calculated temperature between different boxes. I chose only boxes that are set up outside, and we can see that everything is nice and green, but in the top right, there's a box that's orange-red, and that means it's more than 20.7 degrees Celsius, and I thought this can't be quite right. A box that's outside and measures 20 degrees Celsius, and it looked a little bit closer, and it turns out that it's constantly around 20 degrees, so you can deduce from that it's probably inside, and so we can't really do the interpolation here because this falsifies the measurement. On the other side here in Hamburg, this is another screenshot, there's a sensor broken in the middle somewhere in Hamburg, constantly measuring about minus 150 degrees Celsius, so this interpolation here also doesn't really work. That's another goal that should supposed to come in the future to detect these kind of exceptional data points and to take them out of the data set that you want to analyze. So in the future, the development is done by the REEDU Corporation and the university and at the university within the support of the BMBF project, so more materials will be added, more projects for the educational area, the hardware and software will also continue to be improved, trying to add new sensors and add a Wikipedia for all the sensors to have a uniform system, and you get a summary of what kind of sensors there are and what kind of deviations can come from this. So one of the sensors we are using, for example, is very affected by humidity, so it's just things to keep in mind when you look at your data. Our goal is that this project is mostly community-driven, so the further development of the software happens on an open-source basis, so everyone can participate if they want to. And this under the umbrella of a common company and independent support, so we started a Discord forum where everyone can register and exchange ideas with each other and answer the questions of other users. By the way, questions. So this is the end of my talk. Thank you very much for your attention. And we have a few minutes so that I can answer questions. Thank you very much. Thanks for listening to the English translation of open-sensor data for everybody at the 36K Communication Congress. Your translators were... People already? VH1 and Queen. We appreciate your feedback. Please email us at hello at C3lingo.org or use the hashtag C3T on Twitter. So now we've got a question from the audience asking if they're afraid that the sensor is flooded and the answer is yes, that could of course happen. That would be a worst case, but we are of course working to install measurements that that doesn't happen. Then we've got a question from the Internet. The question from the Internet is for one, can you recommend us what kind of sensor? I've tried this before. I've found one that has three poles to the power and one that just gives me back a resistance. And then it isn't really that hard and you've got to calculate the resistance in a wind direction but I don't have clear recommendations for this one. Another question from the Internet. Another question, the answer is can sensors do power over Ethernet? Are from the motivation of educational work and scientific work. Are there any other possibilities of interest, any other fields of interest? I could imagine for example communities that people want to know how their environment is doing if they're close to an airport. It's recognized that it's too loud over there and we need valid data for those institutions if we go public. So isn't there management necessary to to validate all the results? In the current situation everybody can just put a sensor into your own garden and collect data with it. But at the current time of course if you have a tree and you put your sensor just half way below it and then the sun is coming over then suddenly the amount of sun suddenly reduces during the day when the shadow is coming over and this is of course what you should be reducing. So if you look at the current set of PVD, of course currently this is a very nice open coast so there it is at temperature without any of these kind of error sources. So this is something that really needs to be taken into account. On the other hand however the sensors and the components that we're using are quite cheap and therefore it's quite relatively easy to just kind of make a measurement setup and to get your first data and then maybe later when you've got first results then you can go to official places for example at the airport and then you go to the local or the county government and then maybe you give them that data and then you can progress. Hello, is there a historical background why MongoDB was used in another data bank? Well there's no clear reason for this. So during my bachelor's thesis I just used MongoDB as my first database. I know this wasn't just done as now but some time ago and some of these are quite older versions of these but a nice influx database something like that would make much more sense, I agree but not just MongoDB. Yes, we have a question from the internet. All questions are cleared. A miracle with this internet. It seems all questions on the internet were solved. So now I just want to give a big thank you to Felix Erdmann for this talk.