 Thank you very much for your time. So today, yeah, and so this is the very last session of OSSJ. And I know folks are tired, but please give me 40 more minutes. Yeah. Sorry, thank you. Yeah, yeah, and today, so I'd like to talk about, so end-to-end observability for connected vehicle services, including 5G, cellular network, U-plane travels. And I'm Masanori Ito from Toyota Motor Corporation, and my co-speaker is... Hello, I am Kota Endo from KDA Corporation. Thank you for joining this session. Yeah, yeah, yeah, and today's my talk is composed like this. So at first, so I shared the overall background and also our challenges and also motivation. And then, so I will explain so our related works and also former works, but this is a little bit too much. So I will spend roughly, roughly 20 minutes or so until here. And then, and I state, so the problem... as I define the program statement and explain what we did this time for the POC. Okay. At first, background. Nowadays, so in the automotive industry, the trend is called the case C-A-S-E. C stands for connected and A stands for autonomous. And S, so S can be two-fold. One is shared and the other is service and then electric. Okay. And regarding data, so we have to handle so data from vehicles, for example, sensor data, so-called camera data and also... and how we process the data, so depending on the data or use case, real-time or batch processing or so. And overall system architecture, server-side architecture is, so simply speaking, hybrid cloud, and the connectivity is various. But mainly, so today I'm focusing on the cellular network, not Wi-Fi, but with X or so. And the bottom half, so I put the overall diagram. And I think so most of you are feeling that Toyota guys would work mainly on the vehicle side, so bottom left side only. But actually what doing is, not only just the vehicle side and also center side, but also network side too. Yeah, okay. And here, challenges and motivations. And as I stated right now, so far as the vehicle side or the center side, we can address issues or troubles by ourselves. But on the network troubles, there are not so many things we can do. And while, so during the troubles, we receive lots of customer complaint calls and it's a little bit hard and painful work. And here is the motivation. If we can work with network service providers, MNOs, like KDDI folks, we can do additional things, okay. So this is a primary high level motivation. And the next, so requirements from our side, connected vehicle service side. Simply speaking, so although I listed up many requirements items, so simply this can be summarized as 5W1H. Where, which and what and who or when happened. But as I explained later, actually what we want to know actually is which vehicle were affected on the network failure. And later I will focus on the reason and also what I did, what we did for the POC. And as I stated, so we are working with broad area, technology area. So that's why and also it's almost impossible to resolve everything only by ourselves. So that's why we are doing, so we are carrying out various collaborations, including open collaboration or partner collaboration like this case with collaboration with KDDI folks. And also, of course, open sourcing apps. Okay. And among the, yeah. Yeah. And I will walk through, so picking up three activities among these ones. And the first one is ACC. This is actually related work, not formal work. ACC stands for Automotive Edge Computing Consortium. And this, so at this organization, we are promoting edge computing use cases, focusing on automotive use case. And not just publishing documentation, but we are carrying out several POCs. And do please have a look at the website of ACC. There are several interesting POC results. Okay. And the next one is partner collaboration. And please. So I want to let you know that I am interested in 5G network automation and optimization. For example, end-to-end network storage management. To realize it, observability is an important role. Okay. Open telemetry have strong points of correlating telemetry such as logs, metrics, traces, and so on. And last month's CubeCon, we instrument control plan of 5G system by open telemetry and utilize UE identifier and vehicle writing to take correction for analysis with UE granularity. Okay. Please check the link for more detail. Okay. So today's discussion should be useful for user plan observability. Okay. So, okay, briefly. I'll hand it over to Minstaito. Thank you very much, Endo-san. And in the third-line difference, so that is the URL where Endo-san's session. And there, so you can download Endo-san's presentation and also recorded videos. So do please have a look at them. So that's very good summary and also presentation. Okay. And so far, right now, so I explained a bit overall activity at ACC and also the 5G C plan things. Okay. Activity. And as a connected vehicle service provider, we are also doing the application layer observability, too. And this is former works, too. So, number two, okay. And my focus point is later, a little bit. Please wait a little bit. And this is an overview of a PO system for the connected vehicle services. As you can see, as you can see, on the on-premise side, so I put two systems. This image is in Japan, Tokyo and Osaka and like that. And as you can see, the left upper side, through the vehicle, and also the traffic generated there would go through the mobile, so 5G mobile networks through the edge location of Tokyo and Osaka. And also, and finally, reach to the public side application system. Okay. And this overall thing is very complicated, right? And don't worry. So I don't work through everything. And what I'd like to share is the red dashed line, so a part of the POC system. And this is the, so as you can see, as you can see around there, so actually this is a camera data processing system for the object detection, okay? Like this. And if I break down these red dashed boxes into functional boxes, the system is like this. From the left side, there is a vehicle traffic generator. And so at the front end, and the front end subsystem, so MTRS process will be done. And also then finally, the data would put into the pipeline, so object detection process pipeline like this. And here, do please note that so this is just a functional box and in case of actual deployment, there are multiple Kubernetes port and also even in one port, there are normally multiple containers. So in short, and in addition, they are running distributed manner across multiple availability zones or so. And here the point is even for this simple, simple service monitoring is not so easy always. This is the point. And then what is the resolution distributed tracing? And this time we used open telemetry. And here I'd like to share some key points of the open telemetry. And regarding the conceptual component, there are not so many ones, so application itself and the trace data collector and also aggregator. And there is a pipeline of the data propagation. And here, one more point, data flow. I think so I wrote no-south traffic. This is just reporting trace-span data from application to the aggregator. This is normal, I think. But in case of distributed tracing, we have to take one more traffic. So I wrote here east-west traffic. And this is required for applications to let the query know who is the caller. Otherwise, this is called context propagation. If we don't have context propagation, and this is an example of the distributed tracing of that application that I explained. Like this, now, we can see the application call flow like this, like Gantt chart. And also, one span is connected like the executed order like this. And here, the context propagation is required for the query side in order to chain the spans like this. And also, some more practical design point is we have applications. And in order to output this kind of trace data, we have to add some processing. And in case of us, at first, we modified our applications by using open telemetry SDK. But there is, about this way, it's not always preferred, right? So that's why there is a so-called automatic instrumentation. And this is like debugger or profiler, or those kind of things. Enabling, so capturing some process and exporting trace data and also adding the context propagation necessary data to the next query. Okay. And here. So as I explained, this is an example of the distributed tracing data result of our application, this one, okay? And not only just so visualize, visualize how the overall application worked. So we are working on anomaly detection or anomaly prediction based on the observability data, including this kind of distributed tracing and metrics and logs. And this result would appear somewhere in next year. Okay. Maybe some academic conference. And so far, so I worked through our related works and also former works. And from here, so I will focus on the POC of this time, okay? The Ukraine observability. And let's recap the requirement. Problems to be addressed is like this. So application layer troubles are okay, but need to address network troubles including Ukraine. And here, I think, so we can assume two things. Network service, so MNO, so MNO network service providers, people that's health monitoring of C-plane and U-plane network functions. So it's obvious, right? And also our side, user side, our side, connected vehicle services side nodes. Identifier of UE, here UE means user equipment and simply speaking, it's like smartphone. And also in the world of the connected vehicles, so one vehicle normally has data communication module like smartphone, okay? And here, so of course we know the identifier of the onboard communication device. And also, so each vehicle's current location and also route plan. For example, destination or schedule or current status. So how fast is it driving? Or which direction the vehicle is driving? Or so, okay? And so here, a bit real down motivation and also the reason why we want to do that. So if we can get, so U-plane travel information with UE, so UE, I mean the smartphone or data communication module, granularity earlier, okay? So here's the point, earlier we can take proactive actions. For example, sending, receiving necessary information to the vehicle from, or to the vehicle, so information or command to the vehicle in prior. Okay, okay. For example, and also even if it's difficult to forecast failure or troubles itself, normally failure locations and vehicle locations is apart. So that's why we can predict, so we can predict a vehicle could driving, could be driving into the very place where the network problem is happening. So like this way, we can take actions proactively. So that's why we want to know the UE granularity information that are affected by some network side U-plane trouble. And here, some important point is, so on the contrary, so I said identifier, but what about the location was so in terms of the, in terms of the tracking area or server ID was so, but as I stated, we already know through the connected vehicle service. So that's why identification, so identifier information is the most important one. And also, so it's sufficient, this is the point. Okay. And here, before diving, so before diving into the POC, the POC that we carried out this time, so 5G network U-plane quick overview and regarding the details, and also the details of the 5G things, cellular network things, do please have a, so again, do please have a look at Endo-san's slide. So Endo-san gave us very good summary of the 5, of the cellular network things. And here, so here the point, so I have mainly two points. So actually in the 3GPP, 3GPP 5G network world, there are many network functions are defined. And here this time, so I would refer, so SMF, SMF stands for Session Management Function, if I'm correct, and also UPF, and that's all. So regarding other network functions, you don't need to care about, so at least for my presentation, okay. This is the first point. And the second point, until our smartphones are online and also, so we can generate traffic and communicate with the remote side. There are two phases. One is UI, user equipment registration. And this is mostly done in Cplane. And Endo-san's work is actually focusing on this, yeah, this phase, okay. And the second phase is so-called PDU Session Establishment. And at this stage, so IP addresses would be assigned to each smartphone or data communication module, and then, yeah, remote communication would be enabled. And so at this stage, what we have to do is get lists of PDU Sessions. So lists of, sorry, yeah, get lists of PDU Sessions per UPF or G-node. So here, I mean the network functions that got troubles, bases. And also, Endo-san extracted our own identifier. So SUP and IMZ, so 5G or 4G, so user equipment identifier term. And simply speaking, roughly speaking, it's a phone number, okay. I extract from them. And so in order to do this, so initially, I got two ideas, and one needs, so simply speaking, using REST API from the network functions of defining the 5G 3GPP spec. And the second idea is if it doesn't work, the second idea is use OpenTelemetry, so automatic instrumentation. And regarding the implementation of the tracer, so I have two choices. One is from the OpenTelemetry upstream community, and the other is Grafana Labs Baylor. And the reason why I chose automatic instrumentation is, I thought it's better to keep the existing network functions implementation, normally commercial ones, so intact. So in other words, so I didn't want to modify the network functions running in the main focus environment. And by the way, so in Endo-san's session, KDDA team took manual instrumentation approach to address for non-HTTP 5GS communications, such as NJP or PFCP. These are not on top of HTTP, so that's why I needed to special, yeah, yeah, care, special cares. Yeah, okay, so use standard API or just tracing tool without modifications. Cool, don't you think? But the reality was, so neither of them worked as I expected. The first one, free 5GC network functions implementations do not return sufficient information actually. And also for the second one, so the implementation we use is free 5GC, and this is written in GoLangage, okay? And the GoLangage automatic instrumentation has limitations, at this moment, limitations, and even the, so conventional, so HTTP context propagation didn't work, so this was the reality. And so that's why, actually, so I used a kind of workaround resolution monitoring log files of network functions, and especially as the IP address assignment was handled by the SMF, so this time, so I focused on the SMF logs and extracted and extracted necessary information from them in real-time manner. And in order to do this, so I wrote a custom log parser of free 5GC, SMF, and also implemented some logic to export or ship the trace information to the aggregation server. And here, the POC setup is like this. Yeah, so as I stated, so regarding the 5G stack implementation, we use free 5GC, free 5GC, and also the UE side, so with UE Lensim, and also, and UE Lensim covers base station, GenoDV2, okay, and the base operating system is open to the latest LTS. And regarding the 5G network deployment, so I built two slices per tracking area, and also, at first, so two tracking areas. And also there, so in the networking world, there is some effort to deploy these network functions on top of Kubernetes, such as ONAP or Nefio, but at this time, for simplicity, so I deployed these network functions just using simple virtual machine deployment. So in other words, no containerization at the moment, okay. Yeah, and the procedure of the setting up the UE, so virtual vehicles is like this. But so simply speaking, so free 5GC and UE Lensim community are provided so good documentation, and just by following them, you also can set up the working 5G network stacks, I think. And here, so I'd like to share one tip, yeah, one tip. So as I explained, the UE setup procedure can be divided into two phases, unit registration and also PDU session establishment. And actually, so UE Lensim has CLI, so-called NRCLI, and by doing some trick of the configuration and also the command, we can separately carry out the two phases, unit registration and also PDU session establishment. So by the way, so this time, so I have, so we have one more, so one more colleague of mine from Toyota and there is, so Kua-san, so she did, yeah, this POC work, sorry, so she also worked on this POC and greatly contributed to this project. Thank you very much, okay. And here, okay, and so far, so as I said, so to do is get this to PDU sessions, okay. And what I did is extract the necessary information from the SMFlog files. And this is an example of the SMFlog. And don't worry, I do not explain the line-by-line things. So this is, so this is actual example of the SMFlog and looks like complicated, but so actually it has some structure and there are some tips to analyze these logs. For example, one log line can be divided into multiple portions and yeah, yeah, and timestamp log level message and category and source network functions. And additionally, additionally, some log lines have, so PD session ID or the IMSI, IMZ or SP identifier or so. That we want, so that I want, okay. And yeah, and by using these tips, so this is the very log lines that are, so I, so yeah, you're focused on. So as you can see, the first yellow line, selected UPF is UPF1-1. This is the identifier of the user plane function, user plane function, and also allocated PD address is IP assigned to the virtual smartphone or so. And as you can see in the third line, in the third yellow line, so SUPI is in 4G term, so IMSI, and also as I explained, so as I explained, so it's like phone number. So by collecting these information items, we can extract the vehicle identifier, the map of the trouble, sorry, the map of the user plane functions and also the affected UE information. Okay. And this is monitoring system, the overview of the monitoring system prototype. So as I explained, at first I tried so API-based or trace-based ones, but actually, so I wrote log watch, so log watch tool to find out those kind of lines. And also, so this tool ships the trace data to the trace aggregator here. And then on Ukraine failures, so we can look up the trace aggregator and extract the suffered UE's, so by using the information of the UPF function notified from the network operator side. Okay. And this is the sample log, so how this portion, log watch tool portion is working. So if some of you have experience to use OpenTelemetry, so OpenTelemetry can export not trace data, not only to the remote trace server aggregator, but also the console too. And actually, so OpenTelemetry log data is like this. And in the yellow dot lines, you can find SUPI and PDU session ID and also UEIP and also selected UPF. Okay. And by looking at the selected UPF and we can map the trouble information from the MNO side and also PDU session information too. And at user side, so connected vehicle service side, we can use this information to carry out necessary actions in proactively. Okay. This is the point. And the result is, yeah, so I also did some experiment and so after receiving a UPF failure notification, so I confirmed it's possible to receive, sorry, so it's possible to extract the mapping information to identify the suffered vehicles. Okay. Within the second. But it's obvious. And yeah, and also I think normally, so detecting a UPF failure, so it would take roughly, so tens of seconds up to one minute or so. And extraction, so that extraction is about seconds or so. Okay. And yeah, and also as I explained, this is a worker run and so in this sense, there are some more limitations at the moment. So at the moment, because of the restriction that are available from the Free5GC and the UELANCE implementation, at the moment we can handle, so G-NodeB, base station information. And also we, and in the reality, so UPFs can be tiered, but right now, so right now, so I can handle this. Okay. At the moment. Yeah, okay. And here is a list of working progress and next steps. And first, so at first, so of course I have to brush up design and implementation of the prototype. Yeah, for example, resolving restrictions or adding dashboard or something. And second, this is a kind of middle-term thing. Explore more sophisticated information retrieval way. Here I mean, so using standard API or so. And third, improving, so as I explained, so I just did, so experiment using small amount of system. So small number of vehicles or network functions, but so I have to carry out scalability and also PUC. And also fourth, so this could be a bit long-term. So after establishing, we can do, so we can do this kind of, the early error detecting or forecasting system proposed to telco equipment vendors or integrators or even of people, like ADD folks. Okay. And in summary, so as we proved, it's possible to notify users, your granularity, your plain failure information. And with this notification, at least we can take actions proactively if the failure is ahead of the vehicles. And also one merit is without any modifications to the 3GPP protocol at the moment. Okay. And also in general, so as I worked through observability, distributed tracing is quite useful and also important. And also, so as I explained, we used open source component and during our PUC works, we already reported and posting fixes. Yeah, in other words, so contributed to the absolute community. So like this, Toyota and also KDI will keep working on the open source community too. Yeah, thank you very much. That's it. Any questions? Thank you for your talk. It is very impressive to see that, to collect the information and then just collecting the information from the mobile network and just convert the raw data into useful information for debugging. I would like to ask, are there any security consideration during the entire system into like inside the vehicle and then also via the network? Right now. Yeah, good point. Yeah, okay. Right now, so considering security is a part of the next step things, but so do please know, so this system would be integrated or installed inside the MNO system. So that's why, so basically, so I think the system is isolated from the outside and in this sense, secure, I think. Yeah, but any very good point, I think. And the consideration would be not to use so dangerous information for the contact propagation. So for example, so passing, so IMSI, phone number is a bit dangerous. So instead, so better to use IMEI, so in my opinion. Like this, so, so step by step, so we have to improve the overall system. Like that way, we are working on security. Make sense? Thank you. Any other questions? From Anderson, do you have any additional comments? No? No. Okay, okay. So if you like to ask me, so in Japanese, so I will restate your question in English and also answer. Yeah, and I will be here for a while after this session. Yeah, yeah, please catch me later. Thank you very much.