 Hi, everybody. My name is Ipsen. I'm a senior consultant at Red Hat. Today, welcome to the open source summit in Japan 2020. The presentation is about how to handle telematic data with OpenShift. What is telematic? Telematic is a simple concept that combines telecommunication and data processing together. Telematic relies on using wireless devices that were embedded into your vehicle and transmit the data in a real-time to an organisation. A couple of tactics include personal vehicle with certain collecting information. So what is the purpose of telematic? Well, we use telematic to increase the operational efficiency so that when, for example, when the truck is going to a different direction, different route, we identify that so that we could improve digital efficiency and identify the optimal path. We want to improve the customer base. So for example, when the customer do not receive the deliverable on time, we want to tell the customer where exactly the truck is and when do they expect to receive the deliverable. We want to protect the driver. For example, when there is an accident on the freeway, we want to alert the driver and try to avoid the accident by taking a different route or taking a different time shift. We also want to come with safety compliances. For example, when certain direction and certain route is not compliance to specific deliverable objective, we would use that to do the safety measure. Telematic data includes the wireless communication location, usually based on the GPS location data and some in-vehicle electronics. It is also used to integrate to the cloud from the automobile. And it is also used for data analysis. For example, when we could use machine learning to do different types of predictability to calculate the sales event and plan ahead on how much data, how much truck load, how much deliverable we need to deliver from destination one to destination two. An example of telematic data lifecycle, it contains the first step where the data is generated from the vehicle, from the telematic device. The data will get transmitted to the cloud. The cloud will do the data collection and aggregation, meaning that it will convert the external data object into internal object and you could aggregate the data together based on custom, based on data similarity. And after that, the data will be analyzed and sometimes it could be used by machine learning or it could be done by manual analysis to analyze the behavior of the data. And at the end, we would go into some sort of evaluation. For example, the driver's performance, the truck's performance, are we delivering as an organization? And then you can see the cycle get repeated again and it will go back into our second iteration. Commercial telematic contains onboard tracking systems that send GPS location, wireless communication, truck data, and also it contains jurisdiction and tax management. The driver would have some interfaces. The interface could support the voice communication to talk to a dispatcher and it also support the text message communication. The onboard sensor is embedded into the truck, monitoring the truck load, the capacity, the temperature of the container and the tire, and the weight of the vehicle. All these different real-time information will send to the cloud. This is an example of a telematic system integrated into a container truck. At the end of the truck, you see that there is a door sensor that will keep track of how many times the door of the container has been opened, how long it takes for the operator to load the goods from the truck and out of the truck. The door locks is also monitored so that we could track how many times the lock has been engaged. Is the lock secure? Is it protecting the item? At the bottom of the truck, you see there were monitors for the tire, including the tire pressure, the temperature of the tire, the rotation of the tire, how many times it has been rotating, the RPM and all that. The truck ID is also monitor integrated as part of the truck. Other temperature sensors and also the weight sensor were also integrated. This is a complex system. You see there was a lot of data that we could use from all these different types of sensors. Similarly, in the passenger vehicle, you can see that when we are driving, we have the emergency call. The call could keep track of how many times we are calling for emergency for roadside assistance, navigation GPS system, tracking our driving behaviors, device-to-device communication. For instance, when there is an accident happening ahead, the device could communicate to other devices embedded into the vehicle to send our alert and notification. We move vehicle access, so it will allow you or other people who are allowed to access the vehicle, access to the vehicle. Vehicle tracking and information, tracking the owner of the vehicle, the wireless phone integration, tracking your phone activity, tracking your phone number, radio system and other user preferences. This is a picture of our telematic control unit. The control unit contains different integration points with different ports. Each port will get some information from other integrated systems. You can see the microcontroller in the middle can take the information about the battery, about the memory, about the GPS, about the car internal system, and all that information will get consolidated through the telemetic controller unit. The purpose of this unit, as you can see, is to create and collect vehicle data from the ports. It manages the information from the communication interface. It manages memory and battery. It manages the communication to the cloud and also manages the communication to the dashboard in the device. This is usually embedded into your Edge devices. We will talk about the meaning of Edge devices in the following slides. An example of telemetic data category, you have the vehicle system category. This data will focus on the engine, the transmission, the OBD, the weight, the lights and the tyre of the vehicle. Now on the right-hand side, you have the category for the driver. That usually keeps track of the driver's identity, the driver's driving behaviours, the hour of services, the speed, the acceleration, the idling time of the driver. Then in the bottom, you have the operation categories. This is usually used by the organisation to plan and optimise the operation to usually reduce operational cost and increase performance and efficiency. This includes tracking the location and navigation, the dispatcher schedule, the roadside assistance and then the delivery time. Now we have a basic understanding on the telemetic data. Let's look at how OpenShift could help with the telemetic data. OpenShift is a distribution of Kubernetes. It's optimised for continuous application development, multi-tendent developments. OpenShift contains developers and operator-centric tools on top of Kubernetes. It enables rapid application and development, easy deployment, scaling and lifecycle management. Architecture of OpenShift will contain full stack end-to-end automation. On the top, you can see it manages the Git repository, the infrastructure, the databases, the release cycle and the deployment. All of these were managed by OpenShift using the OpenShift web console. In the bottom, you can see that OpenShift supports the multi-cloud platform, the hybrid cloud platform and edge computing. We will talk about edge computing in more detail in the following slides. First, before we move to OpenShift, we need to make changes to the telemetic system to make it as a cloud-native application. Usually cloud-native applications come from the concept of DevOps, continuous delivery, microservices and container. Regarding the OpenShift cloud-native app, the benefit is that the app is scalable, is self-healed based on health check and monitoring. The port-dev-off approach is continuous delivery and it decomposes microservices with API and this is also leveraging the container architecture. DevOps processes usually involve continuous integration, continuous testing, continuous delivery and continuous monitoring. As soon as you check in your code into Git, the code will get deployed to the cloud and using the health check and monitoring, we can continuously look at the container and make sure that the container is up and running. As soon as the code is checked in, as long as all the integration tests and unit tests are passing, it will be integrated into production and that becomes the next release. The Jenkins CI CD pipeline will get the latest code from the Git repository, dockerize the microservices with the Docker registry and images, hook up the unit test, integration test and automation test to the Jenkins CI CD pipeline. With the Jenkins Gator check-in, you can recheck, for example, any code change that failed the test or that did not meet a specific code coverage number. The build artifact will get pushed to the artifactory, such as JFOB. The Jenkins pipeline also supports statistical analysis, security scanning, so that it can cache for any security vulnerability as part of the code change. Jenkins also supports deploying to different OpenShift environments and then after the deployment, it will do the health check and smoke test to validate the deployment is successful. The migration to microservices is the idea to break down telemetric services into microservices by function. The microservices is to contain logical blocks of function. Each surface is self-contained, self-deployed and they work independently. So each surface you can think about it as a quad operation and, for example, we may have a location surface that deal with the location data endpoint. So it's a self-stand-alone small microservice. When there's a high peak going into this specific surface, it will automatically scale up by OpenShift and it would also scale your databases based on usage. Other benefit coming with OpenShift is the operational readiness. We talk about health check. Health check have the lifeness probe and readiness probe health check. It would be able to ping the surface endpoint and verify the dependent surfaces are running. Other instance such as the VictorOps integration allow error and alert notification to send out to engineers when there's a production issue. It also provides integration with Slack and email so that the engineer will get notified when there's a production issue. Comifias is also integrated with OpenShift. It allows us to track the API calls, the request, the response time, the error rate so that when there's a performance issue, we can use Comifias to identify the problem. For example, if a specific error or message is happening too many times, Comifias will be able to flag the alert and send out a notification. OpenShift also has sprung integration. All the warning and error logs will get sent to sprung so that you can query and look at this log in details. Yeager tracing, for example, is another integration with OpenShift. It allows us to group surfaces together using a common request ID so that we can track the surface orchestration, the surface call workflow and use that for debugging and troubleshooting. Now you understand health check and monitoring are all tied to a container and we could also use metrics, logging and tracing to get it from the container. So now let's introduce an idea called IFTAR, I-F-T-A. IFTAR stands for International Field Tax Agreement. Each day has its own IFTAR. Basically, the idea is that when you are driving in a truck, going through a different state, you have to pay tax, right? And then the tax is a lot of time based on the mileage or based on how much time you have spent in each stay. And this calculation is super complicated because each stay has its own IFTAR log. So this is an example, micro-surface architecture where you can see we are using the telemetric surface, generate the telemetric events and feed the data back to IFTAR so that IFTAR can use that information to calculate the tax for the payment for the stay. And then at the same time, you have a vehicle surface that keeps track of other vehicles and they all feed into different databases in the bottom and then on top of the database there was a layer, a surface layer to get the data from the database and feed that information back to the UI and the gateway. So the IFTAR calculation is complicated. For example, if you are driving from Trinidad, Colorado to Threadford in Texas, you could easily go through four different stays based on your GPS calculation. But this calculation is changing at real time. So for example, if there's an accident on the freeway or if there's a construction in the freeway, the route from the GPS could change. So any change will result in different IFTAR calculation. You may end up not going to a specific stay or you can end up going to more different states. So this calculation is dynamic and it rely heavily on the real-time telemedic data. So for IFTAR calculation, we need the real-time data from the telemedic surface to perform the actual calculation. Any change in the GPS calculation could result in different mileage and time span in different jurisdiction. A lot of time operation would be focused on how to optimize the operational cost by going through the state with the lowest tax and lowest fuel tax. In this situation, the edge computing could help. So now we understand the big picture of OpenShift. So how does edge computing come into OpenShift? Edge computing is the idea to place the workload as close to the device where the data was created and the action was taken as possible. So the idea is if you put your calculation and computational power and resources closest to your device, you would get a much faster response from the calculation so that you can take action a little bit faster. The telemedic data gathered from the device will be made available to the cloud. So how does that work? We leverage the 5G network to open up the computing capacity into the devices. This is a picture of edge computing using 5G network and in the bottom of the page you see there are different edge devices that were embedded into the truck and different trucks were driving. As soon as there is a 5G network available, the data will get transmitted to the edge node. Edge node is a layer, is a sub-layer on the cloud that's specifically configured to interface with the edge devices so that you would do do the computation at the lowest level and as fast as possible. And then the edge node also integrate with the rest of the cloud system by submitting the data to the cloud. The cloud will do the data backup and do further data analysis and generate the user dashboard and interfaces for the operator at the organization to look at the data. So this is a high-level architecture. Sometimes you have more than one tiers of edge nodes and we will talk about the benefit of doing that in the following slides. In the edge node this is just a piece of IT equipment built for IT workload computation. It could be a four-blade server. Usually it means to be lightweight. You have a small server with limited or some small computational power and you want to do the calculation fast and lightweight. In terms of the edge devices you would have a piece of IT equipment built for gathering different telematic data in your vehicle including the sensors, the GPS navigation, the radio, camera, internal card system and its CPU. So on average we have about 20 to 30 different CPU in it. So you see there was a lot of computational resources. We could take advantage from within the vehicle. And for the edge devices the computational capacity has increased significantly in the last decade. Many of them were running in Linux. That means we could deploy, container-wise, work low into the edge devices as long as the edge device support Linux. So now the next question would be how do we manage these different environments and ensure that the white workload will deploy to the edge devices and the edge node at specific time? The answer would be using OpenShift. OpenShift builds the workloads as containers. Containers are deployed for scaling and availability. It enabled the Kubernetes to run on the cloud and then we basically use the same concept to manage and deploy workloads into the edge devices. Some technical problem that we have seen the number of edge devices are very high. It's estimated all together 50 to 100 billions of edge devices in the fields. These devices are coming into many different forms and configuration. Security is also a concern because the edge devices are outside of the boundary of your data center. Therefore, the data associated with the edge devices need to be protected. And it is not possible to do any manual deployment or configuration on the edge devices because there's just so many. So we need a way to distribute workload to billions of edge devices in a massive scale. So how do we use edge deployment with OpenShift? OpenShift ensure the uniformity, consistency, security and reliability of the edge node and edge devices. OpenShift define and provision standardization in edge node and edge devices. We could deploy the appropriate application resources based on their purpose. So for example, the GPS has a specific purpose of gathering and tracking the location. So we would only deploy workload that were related to locations to the GPS device. When we deploy new edge location, OpenShift would ensure that the new primary standardize compliance and secure so that it would keep the custer to be uniform across all business footprint. So uniformity is really the key. You want to make sure that the same container-wise application could be deployed to all these different platforms using a container-wise architecture. So earlier we had a diagram about the edge deployment. So there are actually more than two layers. Sometimes in this situation, you have five or six different layers of edge tiers for deployments. So on the far left, you have the device layer. This is where you deploy the container-wise application to devices. The thing about this device layer is really limited computational power with sometimes you don't have any availability. You have one instance that was deployed to this edge device. So on top of the edge device, you have the edge node layer where you could have some simple infrastructure configuration. You could set up high availability, scalability, geo redundancy on that layer. At the provider layer, this is the layer where you would communicate between the no-edge device and your data center. It has different provider layers set up. For example, the provider layer for the far edge, the provider layer for access edge. For example, you have different types, different tiers of the data. You have different accessibility. This would be the way the level, the layer that you set up the access edge. At the end, you have the provider aggregation edge. You aggregate these different data together into an aggregation layer so that you could do layer surface orchestration. In the core circle in the middle, you have the provider enterprise data center. Your data center could be a regional data center or could be on the cloud. This layer is also deployed using OpenShift. So looking at this picture, you can see that as we are moving from the edge cluster layer to the right to the device layer, this configuration would change significantly. At the edge layer, if you deploy with OpenShift 4.5 it automatically goes with three master nodes and three worker nodes. These are set up for high availability and if one of the worker is down, the other worker will step up and continue to support the workload. As you move to the right, you see the remote worker node will be configured a little bit differently. You have one worker with three master. As you go further to the right, you have a single node single edge node server. It is basically one worker and one master. On the device, you don't have any more replication. You only have one single container deployed into the device because of the limitation of the computational resources. From the left to the right, you can see this also depends on the network connection. The edge cluster with three nodes have more reliable network. As you go all the way to the right, the edge device which basically depends on the 5G network has really weak connection. The idea is that if the connection is not available, the workload won't be able to pass on to the left. So some retry logic needs to be happened so that you can cache the data locally on the edge device and as soon as the 5G network is available, it will push the data up to the next level. We have an example project where it takes the highway, the road condition and send out alert based on telematic data. The truck camera will keep track and capture the road condition as an image stream, as a video stream. Video stream is a collection of images. Each image is tagged with the location let long and the time frame. When the image was saved and gets submitted to the edge node, we have some machine learning program to run and flag the images with hazardous condition on the edge node. So for the images that were flagged, we will trigger the alert and notification based on the image location and based on the image timestamp and send the alert back to the edge devices which are within the location and the timestamp. Once these different edge devices get the notification, it will trigger the GPS to recalculate the route so that it can take a different route to avoid the accident. And then similarly, the machine learning layer will also communicate to the cloud layer and talk to OpenShift when OpenShift notify about the images that were flagged for accident. It will update and notify the operator the telematics dashboard will also get updated so that the operator can go in and manually configure the driver to take a different schedule or go to a different destination. Basically, the alert will allow the operator to optimise the activities and minimise the operational cost for the day. The telematic image with OpenShift is based on the telematic image from the edge devices and as you see, the world condition will get analysed at the edge node and alert will get triggered to notify other devices. So the edge computing solves the computational resources and latency. The image will be able to transmit through 5G network when the network is not available, retry will happen. If after a couple retries and it's still not available, the edge device will cache the image and wait for the next availability of the 5G network. The telematic image will be back up in OpenShift when it receives the data from the edge node. From the dashboard, you can see this is an example where it would update the accident indicator based on the image, based on the location and the timestamp. From the operator point of view, the operator can come in and look at all these different incidents from the dashboard and then try to plan ahead for the day. So as you see, our application is using containerized application OpenShift Support GPU graphical processing unit. So the latency of the image data going from the edge device to the edge node is 400 ms. The latency for the machine learning for scanning each image is less than 300 ms. We have a bandwidth of about 5 mbps for upload of the telematic image. This is a layer of machine learning configuration that we use. This is basically an artificial neural network. Each image coming in, we can use some image feature extraction and based on the feature, we can predict the true or false whether the image is an accident or not. These are some of the factors that you could consider when you are doing the image extraction. You could look for the accident cone, you can look for specific accident sign, look for police car, ambulance and so on. One means the image signifies a prediction of a potential accident. The alert will get triggered based on the image location. A score of 0 would signify the image does not associate with an accident. The cutoff value we set off is 70%. If it is lower than 70%, it is not an accident. Based on the model, we can see that the prediction could reach 97% accuracy. The last piece that we want to cover is the real-time telemetric. We want to execute jobs within a minute or a mile when the real-time data comes into the telemetric system. How do we do that? The idea is that we could use change data capture on the applications. This is a design pattern for event-driven cloud native application with OpenShift. Change data capture continuously identifies the incremental data change. The real-time data replication across databases and WebPica were done through the transaction logs. We triggered events based on data change. We captured and propagated data in the events to the microsurface within the system. Based on the change data capture, it is based on the upstream dPCM project. It integrates really well with Spring Boot and OpenShift. The purpose of using CDC is to increase the data collection where the entire database becomes really expensive as your data size is growing. Backing up database to a full scan on a database is also expensive. We would make the real-time data replication possible. The idea is that we have the telematic database on the left. We have different query coming in to do the insert, update and delete. When this query is finished, we have the transaction logs. These logs will be standing and waiting at the event bus. We push it to the CDC relay, which is a dPCM and then the dPCM will notify the Elastic Search engine on the right-hand side which contains the telematic index. Once this data is updated, it will trigger the notification to the dependent services and it will update the telematic dashboard, the operator dashboard, the email notification and all that. You can see that this is a really effective architecture because every time there is a new change coming in and if the changes are insert, update and delete, we will do some action based on the query type. To integrate the dPCM into Spring Boot, all you need is to open the dependency com.yaml file and update the dPCM dependency. The CDC Listener is basically a Java program that is set up to take the constructor to load the configuration and set it all back on the handle event method. The handle event method is then getting invoked when the transaction is performed. The embedded engine is a web part of the connector that manages the connector lifecycle. This is an example of a CDC Listener that will perform on the telematic connector and the telematic surface and it will notify the handle events when the transaction happens. The start method is called when the dPCM engine is initialised and it started asynchronously using the executor. The end method will be called when the container is destroyed this will stop the dPCM and merging the executor. Handle event will be called when the transaction is performed on the telematics table and handle event method will identify what operation takes place and call the corresponding telematics service to perform the create, update and delete on the elastic search. This is an example of the start method on the CDC Listener and the telematics connector will listen to the change from the telematics table using the Postgres connector class from the dPCM. It will have the offset storage to keep track of how much data has been processed from the transaction mark. It can resume from the failure point when an error has occurred. Usually this type of error means there's a connection error, the database went offline. When it goes online again, it will be able to resume. The file offset backing store is used to store the offset in the local file so that you can have a cache copy of that offset. The connector will record the offset in the file for the new change. Then the dPCM engine will flush the offset based on the flushing interval. This is a dPCM connector configuration based on the name of the connector, the flush interval database name part number, username and password. The telematics surface maintain the read model method and the update, insert and delete on the telematics data. The telematics repository is an interface to perform the quad operation on the telematics database. The telematics surface if the operation is deleted, it will call the telematics repository and delete the transaction based on the telematics ID. If it is not deleted, it will save the telematics object. At the end today we have gone through a lot in this presentation. You learn about OpenShift, edge computing, telematics data, and CDC. You can see that the telematics data integration with edge computing on CDC worked really well with OpenShift. It enabled easy, highly distributed event-driven, transactional, log-driven, stable microservices development. The edge computing with 5G network offers fast and reliable solution. The event stream created is based on the lock change. The application could listen to the event and perform action based on data change. That makes it super useful to deal with real-time data. It also performs consistent data manipulation using edge computing. The data will be processed at the edge layer before it will go to the OpenShift layer. You can see the distributed connector cluster also provide high availability, scalability and improve the performance of the overall system. With edge computing and OpenShift and CDC the architecture is open source supported by the open source community and support any programming languages and development framework. This is really useful. So, for the next slide thank you again for attending this conference. We enjoy talking with you. If you have any other further questions please reach out to Red Hat Consulting. We would be more than happy to assist you in any architecture related to OpenShift and edge computing. Thank you again. I hope you have a great day at the summit.