 So I am Kosab and I'm from Politecnico di Milano in Italy. And I think I am one of the fewest people from the university attending this event. So I'd like to start off with a few words about myself. I'm a master student. And the work that I'm going to present today is like a sneak peek into my master's thesis. And this is my first time at an open source event, both as a speaker and an attendee. So I hope you will bear with me through the talk. And I would certainly like to add a few more lines here. It looks a pretty blank slide as of the moment. So going to the topics at the end of the talk, I would like to answer these questions, like what are the requirements? Why do we need cloud platforms for the Internet of Things? What are some of the cloud platforms out there? And given a use case application, how would we use a cloud platform? And a million dollar question for today is like do open source platforms really stand out from the closed ones? Now, since this is going to be a long talk, I have divided the talk into a few sections so you can keep track of how it's going. So I'll start off with a bit of introduction. And I don't think it's quite important, but still go through with it. So as we know, Internet of Things is having applications in e-health, autonomous transport, and robotics. So it's growing pretty big. And one of the contributing factors is the devices themselves are getting smaller, the sensors, the antenna. So this is making it more ubiquitous like it was supposed to be. Now, if you look at the evolution of the Internet of Things, how it all started, there are many contributing technologies that can be termed as an enabler for Internet of Things. And wireless sensor networks is one of them. With wireless sensor networks, we had multi-hop devices with limited processing and sensing capacities. And the data was stored on a routing or a gateway node from which it was gathered. Gradually, as the size of the networks grew, the requirement for storage grew as well and the requirement for processing. So when we connected the gateway device to the cloud, we had more processing, more storage, and more importantly, we had remote access to the data. So for example, if I had an elderly person in my house and I wanted to monitor the parameters, I could do so from the comfort of my office in the same city or from a different part of the world altogether with remote access. However, this comes with a cost of latency. So you see, when we are connecting the devices to the cloud, the data is actually getting delivered from the device themselves to the data center in which the platform is being hosted. So for example, if I deploy my sensor network in India and my data center is in Japan, so it incurs a lot of latency. And with wireless sensor network applications where the sensor and the actuator are co-located, for example, if I want to take a traffic signal and I want to control the period of the traffic signal based on the number of cars that are on the road, this is the latency-sensitive application. So if the latency is high, this is quite difficult to do with the cloud. However, there are technologies coming in like the mobile edge computing and form devices, which is trying to address this problem. But cloud platforms with the advantages have a higher side than the downside here. Now going on to the next part, I would like to talk about the motivation like why I wanted to talk about the cloud platforms. So if you look at the architecture, we have sensors on one end of the architecture and the data is sent to a gateway, which is then forwarded to the cloud platform where we can process the data. And based on this processing, we can do something based on the data through an actuator, which is situated on the edge as well. Today, I am motivated to talk about the cloud platform section of this architecture. Now how many of you have been to Rome? Almost quite a few. And coincidentally, we are in the Rome room as well. So this is Circo Massimo. It's a heritage site in Rome. And how many of you have watched Ben Hur? So in the end, where the chariot race scene was shot, this was the actual location of the scene. So this place is currently in ruins, and I will show you more in detail. And there are archaeologists from the University of Trieste who are monitoring this site. And they came to us to improve their monitoring with the help of internet of things. So they had very basic requirements. They wanted us to place sensors there to get the data from the sensors. They wanted basic processing on the data. For example, we were monitoring the vibrations in the site. So they wanted to see how it was throughout the day. And they wanted to share the data with others who were also monitoring the site. And they wanted to get visualization of the data because, well, they are non-technical people. Now if we look at their requirements and what a cloud platform offers, I already stated these three features. That is, they offer storage of the data, processing of the data, and remote access to the data. Optionally, we need to have visualization as well. So I cannot feed or I cannot give the archaeologists the raw file which has timestamps on the data. I need to help them to see what the sensors are doing. On the other hand, we can have triggers which would perform sensing actuation applications. For example, based on the amount of value that the sensor is generating, I can do something with an actuator. And we need libraries for the Internet of Things devices to actually connect to the cloud platform. Now if you look at the options out there, there are a couple of options. And you have players like the big players like Microsoft. You have IBM. And then you have standalone players like Zively, SenseIT, and you have some open source platforms in the form of FANT and PERS. So my professor tasked me with the job to find out how can we choose a particular platform given an application we are applying. Now let's talk about some design choices before we move into the cloud platform part. So in the first one, we can have a gateway-based architecture in which the data is generated from the devices and is reported to a gateway. Then the gateway pushes the data onto the cloud. The advantage is that this kind of architecture offers that primarily to. The cloud platform is agnostic of the technology that the device is used to send the data to the gateway. So for example, I can use Bluetooth or 802.15.4 behind the gateway between the devices. And the gateway should be able to speak in both IP to send the data to the cloud and also to get the data from the things themselves. Secondly, the other advantage that this offers is the fact that the gateway is a single device and it hides the number of devices that are actually being used behind the gateway. So the cloud can only see one or two devices in the form of one gateway or if you are using multiple gateways. The disadvantage of this architecture is that the gateway is a single point of failure. So even though my devices are working and my gateway fails, I cannot have the data go to the cloud platform, so reliability comes into a question. And in the next one, we can completely bypass the gateway altogether and the devices nowadays are equipped with Wi-Fi shields. And so the data can be sent directly to the cloud platform. But the advantage that this offers is that I can see the data with its full granularity. So if my first node is generating, for example, temperature data, I can see from the cloud platform that my node one is generating temperature data. But if I use a gateway in between, this granularity is lost because only the gateway can see all the devices. Now, the disadvantage is that given the data is being sent directly from the devices, it takes up quite a lot more energy and there is an issue of latency. So if you are deploying the sensors in a sparse environment, there is a chance that it might be far from the router or the signal strength might be low, which might contribute to different latencies for different devices. Now I would like to come to the parameters which we would use to look at these cloud platforms. So I'll go through these parameters one by one. And first of all, we look at the protocols that are used to communicate between the device or the gateway and the cloud platform. So the first kind of protocols that we can use are request response protocols. Here we have a particular endpoint which points to a resource on the cloud where we can store our data. So devices which are generating the data or are sensing and creating some data can use the endpoint to post the data. So for example, if I have a device which is monitoring temperature, I would take that cloud endpoint and I would use the crude paradigm which stands for create, read, update, and delete. We can update the value on the endpoint and then some other user or application or another device which wants to know how the temperature is would do a get request on the same endpoint and would get the data. Examples of some of the platforms that are using it are Zively, Sensitivity, and a lot of other platforms as well. Now if we look at another model of this, message passing model is the one where we have a message producer and we have a message consumer. So in our case, we have sensors which actually produce the data, so are message producers. And we have message consumers who want to see what the data is being generated. So this is done in the form of certain topics which the endpoint in the cloud controls. So for example, if we look at the same previous example and we change the architecture here, we have the publisher, the devices that are generating the data, publishing the data to the topic temperature in Berlin. And the message broker has some subscribers in the forms of other users or devices which the broker forwards the data to. This interaction model is more of a push interaction model where the cloud platform is pushing the data onto the subscribers or the consumers while in case of request response model, the user or application is pulling the data from the endpoint. Now if we look at openness of cloud platforms, we have primarily two flavors of platforms. In close platforms, we have proprietary platforms where the platforms are hosted by the company themselves and they offer the services in terms of subscriptions for which we need to pay. Examples are mostly the ones which are proprietary, like Amazon's AWS IoT, Rear, Xively and many more. Now if you look at the open source platforms, these platforms are available for cloning or download and you can use these platforms on your own server and you can modify the platforms as well according to your applications. Some of the examples can be SparkFun, Six Sense and POS. We can also classify the services based on the, based on the service, I'm sorry, my throat has not taken the building weather too nicely. So the service can be offered in various flavors as well. We can have a platform as a service where we have data being sent to a platform and we can build an application on top of the platform to process the data. Some of the examples are Amazon's AWS as your parse and most of the platform that I've already talked about. Next we can have services offered as a software. Here we have a software which is hosted on a server and we can treat it as a black box where we send some data and we get an output from the SAS software. So an example is Element Blue and Deviceify which are companies which offer software as a service which are different for different applications. For example, they have different softwares for monitoring water flow and different softwares for monitoring smart grid networks. And finally I come to infrastructure as a service in which the company which is offering the service offers the hardware, that is the devices that we want to have in software and the platform as well. The example is IoT Sense which is a company in Spain dealing with smart cities. Now if you look at the cost involved for the platforms for the open source platforms, we just have to pay for hosting the platform on our own server. We do not have to pay for the services that are being offered by the platform per se. However, for the closed platforms we have to pay based on different models. So now I would go through them one by one. We can pay by the number of messages that are being exchanged between the device and the cloud platform. For example, Amazon I think charges based on a million messages that are being exchanged between the device and the cloud platform. You can also be charged based on the amount of data you are storing on the cloud platform and as your charges you in this manner and you can also have the number of devices based on which the cost is incurred. Here gateway plays an important factor because if we're using the gateway as I told you before we can make the platform see that we're only using a single device in the form of a gateway instead of like five or 10 devices which we are using behind the gateway. And finally we can have a payment structure based on the visualization of the data that is the number of variables that we are visualizing. For example, if we are visualizing temperature, humidity and other physical quantities from a sensor network, we treat each of these as a variable and UBIDOTS as a platform offers visualization for up to five variables beyond that you have to pay up. Next I'll come to the authorization of the data and the resources. So with the cloud platforms we have the data in the form of resources and we do not want everyone to access the data. So how do we contain people from accessing the data? So the first kind of authorization mechanism is the traditional username and password in which we hash the username and password and send the data to the cloud platform when we are communicating with the cloud platform. On the other hand we can have cloud platform assigning API keys for reading and writing to various resources. So if the device has an API key which is assigned to the it by the cloud platform we can rule out rogue devices because we need to attach the API keys when we are communicating with the cloud platform. Thirdly, we can have authorization certificates assigned to a particular device with a certification authority. For example in AWS IoT we need three device certificates to communicate with the cloud platform. So when we write up a code to communicate with Amazon's AWS IoT we need to point to these certificates on the device itself. And finally we have access control lists which is a matrix in which we have a read write matrix and we have devices on one hand and we have resources. So if we have an association between a device and a resource only then can a device or a device can access that particular resource. And finally I come to libraries. Libraries here play a major role in making internet of things cloud platforms ubiquitous. For example, if I have the endpoint and I have the documentation for a particular cloud platform I know how to program so I can simply write up a code to do that. But for example if a non-technical person wants to do it they want to have something which they can for example plug and play. So they can use these libraries and simply initialize a platform client and push the data using the functions on offer. Next I would like to talk about some of the cloud platforms that we have considered for our use case scenario. First we have Amazon's AWS IoT which offers the option for multiple protocols in the form of NQTT, HTTP and web sockets. With Amazon's AWS IoT the data goes on to the platform and is then in the hands of the user in terms of what you want to do with the data. So we can write a script to store the data on a OSQL database. We can visualize the data using CloudWatch and we can monitor also the services that we are getting from Amazon with CloudWatch and we can also store the data in the raw form of files on Amazon's S3 bucket. Now in terms of cost each of these modules are priced separately and are on a pay as you go kind of subscription model. So the more of it you use the more of it you pay and based on each different module. Now if we move on to an open source cloud platform SparkFun has an open source cloud platform called Fent which is a short for elephant that never forgets. So for the open source platforms we have here SparkFun which can be used both from the hosting that is done by SparkFun on their website in the form of data.sparkfun.com and we can also have their repository we can replicate it and host it on our own server. Two things that stand out for SparkFun is that it is one of the few platforms which can be hosted on your own server and we can also use a server implementation from SparkFun. And secondly, when we are using the website implementation that is data.sparkfun.com the data is always public. So here you need to consider a trade-off for example if I want to keep the data private for example my Boolean monitoring whether my house is locked or not I do not want that data to be public so I would not be storing that kind of data on the SparkFun platform but if I'm for example monitoring the temperature of my house I can store that data in a public domain. And finally here we have the limits on the number of requests that you can make to the SparkFun platform. Now this limit as I told you before about the cost is not based on the cost but the limitation of the platform. Now if we are hosting the platform on our own server we can do certain modifications and we can also change this limit the way we want based on our application. The third kind of platform that I want to talk about is Parse. Parse used to be a closed source platform and in 2017 they're closing down their services and they decided to contribute to the open source community by making their platform open source. So with Parse we have the data sent in the form of HTTP requests and in the form of JSON objects. So we can report the data onto the Parse Cloud platform and we can store the data and to process the data we have these modules. With cloud code we can get the data and we can run functions on the cloud which can then process the data from the cloud itself. These functions can be invoked from the devices and from the cloud platforms as well. And we have a live query which we can use to which we can use to subscribe to particular objects. For example, if I have a temperature object I can subscribe to that particular object and when that object is updated I can get a push notification to my device. So in terms of the interaction model where we have to keep querying to get the data from an object we can convert the interaction model into a push kind of interaction model in which we get notifications once the object is updated. Now I would come back to the use case scenario and I would like to show you a video of the actual use case where we had deployed our sensor. So this is in the basement of Chocomassimo where we had to monitor and as you see that the place is quite in ruins and we had to place the sensors in various parts of this place. I'm sorry. So here we had primarily four problems that we had to consider. First we were not allowed to drill or modify the structure in any way. So to place the devices we had to find out our own mechanisms and we had to place the devices in various holes like the one that you can see there and we had problems because we wanted to get the data for vibrations and if we place it simply on a hole it's not pretty significant and the data was kind of not really good. Then we had an internet connectivity issue. Initially we had planned to get a router and we would place it in site. However, the place is guarded by an iron gate and it kind of acts as a phallide cage and it completely blocks out communication from the outside. So when we were doing the testing in our lab we were having 3G 4G connectivity but when we actually went there and tried to deploy it we were getting intermittent connectivity and it was 2G and it was really, really bad and these are kind of the situations that are unforeseen when we are doing real deployments. Thirdly, this place is completely unmanned through most of the week. So if something goes wrong we don't have anyone to address the issue so we had to deploy the sensors and come back all the way to Milano and if something went wrong we had to make a trip from Milan to Rome just to fit some device correctly. And finally, we had to consider battery because these devices that we used were running on battery and we planned to only go down when the battery ran out and we wanted to change the battery and replace them with new ones when we needed to. So now if you look at the devices and the architecture that we used here we used the libelium mode to get the data and we were using various sensors in various configurations based on the amount of battery it was consuming. The configurations I'm coming to in the next slide and here we communicated with a libelium gateway and the libelium gateway was connected serially to a Raspberry Pi which was acting as our gateway device between the sensors and the cloud platform. So here one issue that we had to address we had the Wi-Fi shield on the libelium mode so we could have basically connected the device directly to the cloud platform as well or we could have used something else like ESP266 and we could have used the Wi-Fi shield to get the data directly. But the issue that we had was with the connectivity and if we didn't use a gateway in between we had no place to store the data when the internet was down. In this way we had to use a gateway in between. Another issue that we had is that even with 802.15.4 we were having packet losses up to five to six percent so we were getting like 94 to 93 percent packets and since we were monitoring the vibrations in the form of FFT values for your fast Fourier transform values if we were losing packets in between it was very hard to get the actual essence of the data so we were sending the data in the form of seven packets from the device which was using the FFT and if we lost one we did not have the data from middle which was very hard to extrapolate later. Now considering the floor plan of this place we had placed these sensors in the different locations of the heritage site. Now our gateway is not pointed here our gateway was between the node 7 and node 5 somewhere between the green and blue dots and we had these various configurations based on the amount of battery these sensors were consuming for example we placed only a carbon dioxide sensor and nothing else because it was quite energy consuming and for example nodes 3 and 4 they were having accelerometers which we used to get the data in bursts so that was consuming a lot of energy as well and in terms of packet drops we see that our gateway was between node 7 and node 5 and from node 9 which was in the corner of the building we were getting high packet losses because of the distance so we had to consider these factors as well when we went back the second time to replace the sensors. Now these are some pictures from the actual deployment and the one that is in the sensor in the center sorry is the one that we used to get the data for accelerometer so we glued the sensor device onto the wall and on the right we see the gateway connected to the Raspberry Pi used as a gateway connected to the libelium gateway and the light coming from the right side is from the gate which was there so it was kind of near to the gate and we were getting the best connectivity from that place so we decided to place it there and now if we go back and look at how we were sending the data to the cloud platforms we go through them one by one. Now I would like to ask you have any of you used the parse platform? Okay, so we had four requirements from the archaeologist so we wanted to get the data from the devices we wanted to visualize the data we wanted to share the data and finally we wanted to do some processing so how we achieve this with parses we used the HTTP protocol to get the data from the devices to the cloud and then we did some processing when the data reached the cloud by writing code for the platform on the cloud code option that I was talking about before then we used an open source website called Freeboard which we used to plot the data from the devices and we could use it to share the data as well among other users and to do the basic statistical operations to get the mean and then the average and we wanted to do the calculations for the accelerometer values as well we did it from the functions we wrote on the cloud code next we had SparkFun which we used to get the data from the devices to the cloud platform here we used HTTP as the protocol as well and the data was reported to data streams for example, if we were monitoring the temperature we had a particular data stream for temperature if we wanted to monitor humidity we had a different data stream as well and the data on these data streams are stored in the form of JSON objects so first I deployed this platform on my laptop and I was using a virtual box and gradually I took it to a server and deployed it there and then I was getting data from the devices on SparkFun so to get the data from the devices to the SparkFun I was using HTTP as the protocol to visualize the data the platform offers an API to connect with the Google's charts library and to access the data I used a public key which is offered by the SparkFun platform to share the data the public key can be used only to read the data so it was better to share the data with other users and to process the data we did not have something like a cloud code which would run on the cloud so we had to query the data statically and finally Amazon was the last one that I tried out and I had faced quite a few problems with Amazon first when I tried to get the data onto the storage I was using MQTT as the protocol to get the data from the devices to the cloud now after I got the data I had to handle the data the way I wanted to so first I wanted to get the data into a no SQL database in the form of DynamoDB so Amazon offers certain figures based on which you can do something with the data so I was sending the data using various topics so for example if I was sending the temperature data from node six I would use a topic like sensor slash temperature slash six so in this way I was getting the data and then to insert the data into DynamoDB they have certain modules so when I get the data of temperature I can insert the data into a particular database now for the data I was having timestamp I was having values and I was having the node ID as well but when I used the triggers offered by Amazon I was only able to get the timestamp onto the database and not all the values so to do that I had to use scripts which is available on AWS Lambda to process the data I also had to use scripts on AWS Lambda and I wanted to visualize the data with Amazon's QuickSight which is a business intelligence tool that Amazon offers but it requires business accounts which I did not have access to unfortunately so I had to settle for the cloud watch which I used to visualize the data now here the cost became an important factor because first I was paying for the messages I was sending to the cloud platform then I had to pay for the script that I wrote to process the data then I had to pay for storing the data on the database and I also had to pay for visualizing the data now these costs are incurred based on the uses that you're having so it is difficult to predict how much you would end up paying before actually deploying the application now if you look at these platforms side by side we have the open source one which we could host locally and we have more options of protocols in the closed platforms in the form of NQTT and we could also use HTTP and other protocols there and now coming to cost so when we are hosting a cloud platform by ourselves in the form of an open source cloud platform we pay up for the hosting cost and we are not paying for the services so there the cost is kind of fixed and we have an estimate from before when we are using open source platforms which is difficult to oversee when we are having closed platforms like Amazon and I would like to share an anecdote here that I was using these databases for Amazon and at some point our devices ran out of battery after like three and a half months of running and I was not actually sending data to the databases but since I was using and I had provisioned myself for five or six databases it was beyond the amount of free provision rights that you can have and I did not actually write to the databases but I ended up paying like 15 or 16 dollars to Amazon because I had the databases with me I did not do any read or write during that period so that was costly given I am a student so now going to the authorization part we talked about various authorization policies now if we want the data to be private we can use the closed platforms which offer stringent authorization mechanisms but if we want to make it less stringent we can use the open source platforms and we can modify the platforms to reduce the amount of security that is involved here for example with PARS we can get rid of all the important keys and we can just use a master key to reduce the security now coming to the conclusion if we look at the open source platforms the advantage is that open source platforms is that the cost is fixed and we have the cost from hosting the platform on the server so it's easier to estimate we can fine tune various parameters for example we can set the cache size we can choose different security modules and we can have more flexibility in the terms of that we can modify the platform according to our use and we can also add certain features if we want based on our application and finally we can have a simplicity in terms of open source modules which we can connect and make them work but closed source platforms are good in some other scenarios for example if we are using a sparse amount of data we can use their free tier and we can bypass the cost of hosting the cloud platform they have stringent data authorization mechanism so if we want our data to be private and secure we should use closed platforms and finally the last one that we have is that we need some basic expertise to host a cloud platform and keep it running if you're using an open source cloud platform so if someone is not willing to do that it's better to go for closed platforms where everything is hosted and you can simply plug and play and so in conclusion I would like to say that if you have an application you need to ask yourself certain questions for example which protocol would suit my application what kind of security I would need would I want to pay for the closed platforms given my application and based on these questions you would have an answer and maybe one answer is not the right answer we could have multiple options so we could choose one based on the one that you find to be optimized thank you so any questions and suggestions that's true that is why we are basically using a gateway device in between for example most of these protocols sorry most of these platforms offer SDKs for different gateways so we can use that to communicate in various protocols and another thing is like now we have some other protocols as well which are bigger for example Microsoft has its AMQP protocol which is really big for ESP266 and so we need different devices with capacities to run AMQP so in the end I think now MQTT is really standing out in terms of the protocol that we're using and with the public subscribe paradigm we can reach out to more subscribers so for example if I'm using an HTTP protocol I have to make the request myself but if I subscribe using an MQTT I can simply get the data from the platform itself so more work is done on the platform than on the device itself to get the data this is a very interesting point like when I was practicing this with my professor my professor asked me like we have proprietary software for example Microsoft Word that we're using on our computer and we are paying for it so in the future would we have some kind of platforms where it would be a proprietary platform we would not be able to see the code inside but can we actually buy that kind of thing and host on the server ourselves so right now there are nothing of that sort but in the future there can be does this address your query or? Okay Because we are in the future Okay I can Yeah Watson I don't know what's on here Watson? Yeah That's good IBM is what's on Yeah Yeah That's where it is Just to mention that this is our first layer which is the active layer like the nanotechnology we're doing a pretty good presentation it is more as code-based That's true, it's kind of converging it Actually the one that you're saying is like it's already kind of converged because if you look at the various the various SDKs that are being offered they are quite very similar so if you can start on with like one or two SDKs it's very easy to move on to different platforms so in that sense it's quite moving in that direction addressing your first query and in the second one I think for various applications which are more complex we are already having modules which are which are beyond telemetry for example you have as you have a machine learning machine learning module which you can apply for larger data so I have looked at the architecture but we haven't actually tested it but I know what you're talking about because for example as you or we have the other like the other interaction model where the cloud can push data on to the device and we can have some over the air over the air thing but the one that you mentioned we try to do it here and I would like to elaborate on that for example so when we were doing it we were losing a lot of battery for the sensors so what we plan to do is for example on the accelerometer nodes we wanted to have something where we can control the hibernation period using over the air communication so we were using the gateway in between and we were pushing a command onto the gateway which would then change the hibernation period based on the battery that was remaining but because of our poor connectivity it was very hard to implement yeah I think in our case we tried to use 6-loop and in the beginning to get the data directly but our nodes were dying and one problem is that if I'm not having a gateway in between for example I'm not being able to like get the data into a sort of a buffer so my Pi was programmed to keep on checking if the connection is there and only if the connection is there I was pushing the data which was very hard to do on the end devices there are questions okay thank you so much for coming to my talk I hope you liked it thank you