 Hello and welcome. My name is Shannon Kemp and I'm the Chief Digital Officer of DataVersity. We'd like to thank you for joining this DataVersity webinar, How Good Data Quality Enables Panasonic to Keep Roadway Smart, Safe and Efficient. Sponsored today by SOTA. Just a couple of points to get us started. Due to the large number of people that attend these sessions, you will be muted during the webinar. For questions, we will be collecting them by the Q&A or if you'd like to tweet, we encourage you to share questions via Twitter using hashtag DataVersity. And if you'd like to chat with us or with each other, we certainly encourage you to do so. And just to note the Zoom chat default to send to just the panelists, but you may absolutely change that to network with everyone. And to access and open the Q&A or the chat panels, you will find those icons in the bottom middle of your screen for those features. And as always, we will send a follow-up email within two business days containing links to the slides, the recording of the session, and any additional information requested throughout the webinar. Now, let me introduce to you our full House of Speakers for today, Lauren Cordova, Kell Linstead, Rene Phillip, Kevin Bennett, and Alvin Schling. Lauren is the head of Data Science and Analytics at Panasonic. She established the Data Science and Analytics function for Panasonic Smart Mobility Office bringing together four domains of data engineering and analytics, engineering, data science, and quantitative research with the mission of turning data into insights. Kell is the business development executive at Panasonic. He works on the business development team for the service by Panasonic Connected Vehicle Platform, helping transportation officials across a complex and rapidly changing technologies ecosystem for connected vehicles. Rene is currently a senior quantitative researcher on the Data Science and Analytics team where she is working on to quantify the impact of connected vehicle technology on runway safety and mobility. And Kevin has helped to define and build out the data and analytics engineering functions within the Smart Mobility Office and currently serves as the data engineering manager. Alvin also joining us from Soda is a Solutions Engineer at Soda. He is part of the team that creates solutions that work for the community and translates the needs of the customers and partners into technical solutions. And with that, I will give the floor to our team of speakers today to get the webinar started. Hello and welcome. Thank you, Shannon. Thanks, Data Diversity community and Soda for inviting us here today to share about our team and the cool projects we're working on related to data quality and analysis. Again, my name is Lauren Cordova. I lead the Data Science and Analytics or DSNA function for Panasonic Smart Mobility Office. And many of you may be familiar with some of Panasonic consumer products, but our team works in a different space. The Mobility Office consists of several businesses, including one focused on electric vehicles and charging, one that focuses on fleet vehicle management. And the primary one we'll be discussing today, which has developed our Ceres by Panasonic product to support departments of transportation and other roadway operators by helping them more efficiently manage roadways through data and analysis. And then into the exciting data processing features and benefits that Ceres provides. I'll give you all some background on our team and the technology we work with so that you've got good context about the devices and datasets that are feeding the Ceres product. Our Data Science and Analytics team is core to the development and advancement of Ceres serving the business with the functions that you see here. We've got data engineers building production grade data pipelines that can ingest hundreds of messages per second and make them available in near real time to our Ceres product. We also have analytics engineers building a solid foundation of understanding and historical analyses that can support future looking predictions and models by Data Science. And we have quantitative researchers supporting analyses to measure value by developing key performance indicators or KPIs and designing research studies to understand and report on the efficacy of our solutions. The graphic shown here is an example of the types of projects we work on across DSNA, but it's not meant to be comprehensive or exclusive. There's a lot of collaboration and overlap across our team, as well as with our stakeholder groups like software engineering. Also note, there are positions open on the Panasonic website if anyone's interested in an analytics or software role within the smart mobility office. We are based out of Denver, Colorado, but our team supports remote work nationwide, and there'll be a link in the follow up email to the Panasonic career portal. So look out for that if you're interested and we also may be posting some additional roles later this year next. And with that I will go into more detail about all the exciting technology we work with in support of the Ceres by Panasonic product. So many of you may be aware that cars today depend on internal computers and data, and they process hundreds of parameters about vehicle health and status and vehicle operations, such as those shown on this slide and more. And even 10 plus years ago this data processing was primary on luxury vehicles, but today almost all new cars come equipped to process much of this data. Furthermore, today's cars are increasingly able to send the data out from the vehicle to let others know what's happening in and around their location. And that communication that is coming from the vehicle itself is called vehicle to everything communication or V2X for short. So we have a number of standards defining message types and the associated data elements to ensure interoperability across auto manufacturers so that for example, a Nissan and a Ford and a GM vehicle can all speak the same language and communicate with each other. This slide summarizes the big organizations involved in implementing those standards for connected vehicle communications. There are numerous technical documents detailing requirements for entities to participate in the standards based communication protocol that I'm describing. And if you're interested in learning more the Society of Automotive Engineers or SAE J 2735 requirements document which is spelled out in the smallest box there on the right. It's a great place to start, because I'll be going into a lot of detail on one of the message types that it defines called the basic safety message or BSM. That message in particular is a critical component of the data that Panasonic is collecting from vehicles today and leveraging in our serious product. The basic safety message can contain hundreds of data elements and is broken into two parts. Four elements are required for any participating vehicle, sending a basic safety message to populate in order to be standards compliant. So those are basically the mandatory fields. There's also part two of the basic safety message that contains optional data elements that not all vehicles may be able to send. What's shown in the slide here is just a small subset of the data elements in each part to give you an idea of what type of information is possible in the basic safety message but there are many, many more fields in the full BSM. So basically based on this sample that the basic safety message provides much richer data about what's happening on the road than traditional technologies, such as radar or cameras, because the basic safety message can tell you what's going on inside the vehicle, such as whether the traction control or antelope brake system is active, or other information like whether the wipers or the headlights are on. What is specifically not included in the BSM is any identifying information BSM are designed to be anonymous so there is no data on vehicle then owner make model the vehicle or any persistent IDs that could be tracked over extended periods of time. And for the more the basic safety messages are intended to be broadcast by vehicles at 10 times per second per the standards to support vehicle to vehicle communication for safety applications. Many newer cars today present driver warnings based on different sensors around the vehicles, but in the future, basic safety messages could be used to provide much more detailed and accurate information for driver warnings. So consider a future when new cars could come equipped with this V2X technology to send BSM that is a lot of data being generated. In fact, previous studies have estimated that with 105 million connected cars, they could produce 20 terabytes of data per hour or 150 petabytes of data per year. So with that massive amount of data, there's a massive potential to revolutionize transportation. In Panasonic, we hope to collect and turn this data into insights that will provide better outcomes for all of us, such as shorter freeway and arterial travel times, shorter emergency response response times in an accelerated pace towards zero fatalities on the roads. This is an enormous but important job and will involve managing big data sets from multiple sources. So now that you have a little background on the industry and the problems we're trying to solve, I'll go into more detail about how Panasonic is building out a solution to help capture this data and turn it into actionable insights for our customers at departments of transportations and other roadway operators. We provide end to end V2X support through deployment of hardware, as well as our cloud product Cirrus. We have a deployment team that supports installation of aftermarket onboard equipment in vehicles to augment data collection and critical fleet vehicles now, like emergency vehicles or transit vehicles or road maintenance vehicles. In addition, we install roadside equipment to collect data from vehicles and send it to our back end system. And then our Cirrus product collects all of this data from vehicles and analyzes it to identify events and help traffic operators take action on the incoming data. One of the actions Cirrus enables is the ability to send messages to vehicles, which can be displayed directly inside equipped vehicles through a visual display or audio outputs. So far, we've implemented these traveler information messages or TIMS for weather related warnings like icy roads, rain, wind, as well as other road events like construction, sharp curves or when approaching an upcoming crash. Cirrus is a cloud based connected vehicle platform which integrates data from existing department of transportation sources, in addition to the vehicle data like basic safety messages in our analytics platform. We're also working to make the data available via our open ecosystem to partner organizations and even third parties to develop additional services and applications. This high level architecture diagram summarizes our approach to ingest the data from multiple sources, as shown at the bottom, then store and analyze that data and make it available, as well as the insights that we generate available to others through a data API and through visualizations and features within the Cirrus platform that may have been developed by Panasonic or by another organization chosen by our customers or by one of our partners. Up next, Kevin will break down how we are ingesting and processing this data with the great tools in our tech stack. So with that, I will turn it over to Kevin. So now that you're a little familiar with the infrastructure and technology that supports our Cirrus product, here's an overview of some of the types of data that we're collecting. In order to monitor and manage the devices in vehicles, and on the roadside, we collect a lot of device health and alert data. Face accepting messages are collected from vehicles, and they contain a wealth of situational awareness about what's happening on the vehicle. We're building up some exciting new intersection features right now that are bringing online a new data set specific to intersection operations with data such as lane geometry and intersection signal phase and timing held and were named when we describing these in more detail shortly. We also connect to various road weather stations for additional weather data, and we leverage all of these data sources to identify and report on relevant roadway events. This slide gives an example of events Cirrus detected by week. There are analysis of basic safety messages from vehicles on the I any corridor near Salt Lake City this year. With the rich data from vehicles, we were able to alert department of transportation users of heart breaking events, vehicles with their hazard lights on and rain and snow events that were impacting the roadway. All of these events were detected with very low latency, providing feedback via the Cirrus platform in seconds when vehicles on the roadway experience these conditions. Lauren discussed the volume of data that can be generated from B to X at scale. Here's a quick snapshot of where we are today through Panasonic's deployment efforts when our partners in Utah, Georgia, and Colorado. We've deployed 380 roadside devices and collected over 7 billion basic safety messages from our production environments and are anticipating significant increases in volume with the new deployments that are in work now. Cirrus has also detected 300,000 roadway events and delivered over 2000 traveler information messages to warn drivers of conditions on the roadway. Currently, we are in the pilot phase, we're conducting intersections, and you can see just how this data will scale as we reach thousands of intersections. Now that you have a general idea of the scope and scale of data we're working with, I'll share the tools and architecture we're using in the data and analytics engineering functions to process all of this data. This is not inclusive of other functions such as data science and quantitative research. First, we use Prefect to orchestrate all of our data and analytics engineering processes. This allows us to create and manage workflows that ensure seamless processing of data from point of ingestion through transformation. All of our pipeline infrastructure is written in Python, containerized, and executed serverlessly in AWS. For data storage, we use Snowflake, which is a data ecosystem that allows for almost unlimited scale for our data needs now and into the future. Here at Panasonic, we have fully embraced the ELT methodology, meaning all of our transformations occur in Snowflake, utilizing DBT. DBT allows us to write quick and developer friendly transformation code, which is they compile in this SQL and run against raw data in our Snowflake environment. For data quality, we utilize SOTA, which allows us to find, analyze, and resolve data issues. First, we use SOTA's anomaly detection feature to monitor data flowing through our data pipelines. Anomaly detection is powered by a machine learning algorithm that works with measured values for a metric that occur over time. The algorithm learns the patterns of our data, its trends, and seasonality to identify and flag anomalies in time series data. For example, if we normally receive 3 million records between the hours of 10 and 11 a.m. for a particular device, and the volumes spike to 4 million, or let's say drops to 2 million, SOTA's anomaly detection will identify this and generate an alert. We also use SOTA to define data quality tests for our pipelines and transformations. We're able to identify invalid, missing, or unexpected data in an automated way. For example, if we receive a record from a vehicle where the transmission is in park, but the speed is greater than zero, that record is flat as invalid. Once our data is landed and transformed, we create and present rich visualizations via Tableau dashboards. Lastly, we use SOTA as our data dictionary and knowledge sharing platform. All of the tools you see here fit together to create a seamless and to add data platform that allows us to live on our mission of creating safer and more efficient mobility for all. Since the serious platform is responsible for detecting and responding to critical safety and mobility events, any data quality issues can have a huge impact on users, such as our department of transportation customers or drivers like yourselves. One of the data quality concerns we've actively managed and monitored for are data volume changes. We did an evaluation of available tools and selected SOTA as a solution to help monitor and alert on any unexpected changes in data volumes. When we evaluate potential tools, there are three key things we look for. Seamless connectivity and integration with all the other tools in our tech stack, effortless scalability, and ideally fully managed services. Our team's time is best spent developing new features and products, not managing infrastructure. In a V2X deployment, like we have in several states, there are numerous reasons that data may stop being received without an actual failure on the device or in the pipeline. For example, a device might not start when a vehicle turns on or a network issue might prevent messages from being transmitted via roadside units. That's why we prepared SOTA to help us implement data volume alerting using configurable rules so that we can quickly and easily tailor data volume drivers based on each unique deployment and data type. And now I'm going to hand it over to Cal to talk about how this data is put to work in real world applications through our transportation projects. Thanks Kevin, I appreciate the opportunity. So you've heard a little bit about how vehicles are generating some of the data that we're talking about and Kevin's explained how our data pipeline ingests makes use of that information. I want to take us out to the roadway for just a second to talk very briefly about what all this means out in the transportation system. So what I've got here is kind of a rudimentary diagram of how this all works on the roadway in terms of how vehicles fit into the infrastructure that we're talking about today. What we've done is apply this technology to specific meaningful emergency services today. So we all know that congestion has been increasing quite a bit recently. And we know that that has an impact on emergency vehicles on first responders like ambulances and fire trucks who have to navigate that same congestion just like you and I do. And so what we're doing is building a technology in the field that lets those vehicles get priority access to traffic signals, using this VDAC infrastructure that's being described to you. So a very simple level emergency vehicles that are responding to a call are able to use the messaging structure of the VDAC language to actually talk to traffic signals as they're approaching and request that traffic signal give them some level of priority or what we call preemption. Now we're going to change the slide. We also are able to apply that same technology to improve transit and other vehicles like freight or even for snowfalls like we're doing in Utah. And we have different levels of priority or preemption at a traffic signal. So for example a fire truck responding to a call might get a more immediate green whereas a bus that's approaching with let's say a full load of passengers might be able to simply extend a green light for a few seconds to allow them to get through without stopping. This improves their ability to access the roadway infrastructure and improves their response times or their travel reliability. In the case of freight signal priority it might improve the economics of actually transporting those goods around the truck. And it's also going to make the roadway safer because you're going to reduce the congestion that emergency vehicles face at these traffic signals. And not only are we putting these technologies into practice in the field, but we're also making sure that we measure the performance of the functions that we enable so that we can tell what kind of an impact we're having to learn a little bit more about these particular applications you can check out our blog. And I'm not going to hand it over to Renee who's going to talk about how we measure the efficacy of this work. Thanks so much, Cal. Lauren if you could provide me control one more time that would be appreciated. Thank you so much. So just as a quick recap up to this point our team has really provided a great overview of our industry and who we are. What V2X is and how we manage these V2X data and then how we can use this technology to improve some safety and mobility at traffic intersections. So from this point forward I want to share with you a real world deployment of this technology that we're currently working on and then describe how we plan to measure the impacts or the benefits of this deployment. So today we're currently partnered with the Utah Department of Transportation to make connected intersections an even greater reality in their state. And to set some context for you, I'm first going to describe what that geographic footprint of the connected intersection deployment in Utah looks like. And then I'll dive into some details about how we plan to research how well our technology is making a difference with regard to roadway safety and mobility as a result of this deployment. And a quick heads up, I'm going to spend a little bit of a time just kind of going into our methodological approach. So you get an idea of some of the interesting ways our team gets to design a value to research and analytics projects. And then also share with you what I think is kind of a fun opportunity to think deeply about methodological rigor and data quality to really deliver on accurate insights from this massive amount of V2X data that we have before us and that will be receiving in the near future. So with that, let me just go ahead and dive right in. So you can see her on the map, some of our work for this project is currently situated in a concentrated area of North Central Utah just south of Salt Lake City in and around an area called Oram. And I'll zoom in here on the map so you can get a better perspective of what this deployment footprint looks like. And as you see, there's some blue and red iconography that's really just highlighting and delineating kind of between two focus areas of this project. So they're in blue. That's an 11 square mile area of deployment within Oram City proper. And then extending outside of that in red is the key corridors to surrounding cities that includes no plow routes as well as transit bus routes. Now, within this entire footprint, what we're doing is equipping existing signalized traffic intersections to really capture and process vehicles request for quicker or extended green traffic signals. When certain criteria met with the goal again of enabling safety and mobility for about 70 public works vehicles that are going to be frequently traveling this area, such as emergency response vehicles like fire trucks and ambulances, as well as transit buses and snow clouds. And I do want to mention there's going to be some other equipped vehicles that will be frequently traveling this area, but it's a little bit outside of the scope of this presentation today. So with that deployment underway, our research and analytics team has been really working hard to identify some key outcomes that we could target and measure to show the benefits of this technology. And what I want to share with you today is just a couple of examples of that. That is emergency vehicle response time within this project area, as well as some fuel cost savings that we think are associated with this performance. And we think that these outcomes are very important, but let me give you a brief example as it relates to emergency vehicles of why we think this is so. So some of the research has suggested that in a situation where an individual experiences a sudden cardiac arrest event, his or her odds of survival decrease about 10% for every minute that passes. So getting an emergency vehicle to an incident site to render aid as quickly as possible is something that could mean the difference between life or death. And we think that kind of facilitating this fast response time of V2X technology will be really key. Now, we have a couple of approaches for the type of data that we're targeting for this research. And I think it's worth mentioning that we really see an opportunity to utilize both because they carry their own unique benefits. So the first one is primary data. And this is data that is collected essentially firsthand by the researcher with the study in mind, but it can be quite resource intensive. There's a lot of stakeholder coordination that's involved as well as financial and time costs. A benefit here is that it does allow the researcher a ton of control over the study environment to isolate some extraneous factors that could play a role and have an undue influence had they not been considered. But if this is not necessarily something that's, you know, available or you have the time for, then alternative approach would be to utilize secondary data. And secondary data is really just data that already exists. It can be very much quicker to act upon, but the study conclusions you may find at times can be limited by the data that you have before you. So say in your model, you need to include a few really important variables and you don't have access to that. You may decide to proceed, but ultimately that just becomes a limitation of your research. Maybe it's something that you look for to incorporate in future iterations, but a big benefit here is with secondary data, if it's coming from an ongoing collection source as we have with our deployment, then this opens an opportunity to operationalize those metrics and evaluate them and monitor them in near real time for various use cases. So with that, our team has been working hard to pull together a variety of data that we will need for these studies. And for the sake of this presentation, I'm going to group these kind of into three buckets. And the first two is what we're calling B2X data. So the first one is basic safety messages. I won't repeat this. Lauren gave a great overview of what this all considers. But the second bucket is all about a connected intersection, connected intersection data. And so this includes map messages, and you can think about this as being information about the intersection geometry. And then we also have SPAT messages, which stands for signal phase and timing data. And so you can also think about this in terms of the traffic signal, is it red, green or yellow, and what's the duration of that color. And then also we have record of a vehicle's request for preferential traffic signal treatment upon entering that intersection and then how that request for preferential treatment may have been processed, whether it was granted, denied, or some other outcome. And this is what we're coming from what we're calling SRMs or what is called an SRM, that's a signal request message, as well as an SSM, a signal status message. And we also want to bring in some non-B2X data into these studies to include things like vehicle operator demographics and roadway weather conditions at the time of the study, because we think these might assert an influence on the outcome. And then we'll also need some fuel data from the Department of Energy's fuel from their website. This will give us information about the cost of fuel at the time of the study. Now, quick mention here, as we're working to pull together all these data sources, we're also going to ensure that the data do adhere to specific quality and accuracy standards for our analyses. So for instance, we're going to ensure that those data pipelines continue to provide up to date information and manage that process. We'll be handling methane data if this becomes an issue, and then we'll be ensuring that the data that we do receive in those data values across fields are as expected by checking for various data outliers and anomalies using some standard statistical diagnostics, like evaluating skew, ketosis, etc. And we think by doing this, again, this will help to minimize bias from low quality data that could be introduced during the evaluation phase. So with that said, let me dive into a couple of example studies for each one of these outcomes. So as I mentioned, enabling EVP or that stands for emergency vehicle traffic signal prevention via V2X technology could have an impact on survival. So we think this is really a key and critical outcome to target. So here, our research question is, does EVP improve intersection traversal time or the time it takes for an emergency vehicle to clear an intersection? Just to reiterate, EVP is making a near instantaneous request to change a red light to green or extend a green light for more time to pass through that intersection. So we'll first try to answer this using secondary data and have a simple illustration here of what that can look like. And what we're wanting to do is focus on a micro stage of that emergency route response, where we calculate a vehicle's time to pass through an intersection in seconds from the point of entering a map in wrestling to the point of reaching the egress lane and clearing that intersection. And I do want to call out here, this is a pretty simple approach, pretty straightforward, and you might be to kind of tell there's a little bit of a limitation here. We are looking at time to traverse an intersection, which might not fully speak to emergency vehicle response time. So in an effort to address this, we're proposing an alternative approach using a primary data from an experiment where our question changes from does EVP improve intersection traversal time to does EVP improve incident response time? So what might this look like? Well, what we're proposing to do is identify kind of a pre-selected pseudo emergency route in this project area that these emergency vehicles can travel. And then what we're going to do is dispatch, but separately, an emergency vehicle that has this technology, this prioritization feature turned on, and have it travel from this pseudo dispatch site to a pseudo incident site. And it'll be traversing various intersections, automatically making requestable signal preemption along the way. And then next we'll dispatch a control vehicle or another emergency vehicle that has this technology disabled. It'll travel that same route, same intersections, but it will not be able to request preferential treatment at those things. And then we'll continue this dispatching until we reach an appropriate sample size that we deem at the onset of the study to meet statistical sensitivity criteria. And we'll be evaluating this kind of difference in duration to arrive at that pseudo incident site to see if there's any statistically significant differences there. Now, again, just want to emphasize as we move to execution on these studies, we're going to be very cognizant as a team and as analysts and researchers about the data that we include and evaluate so that if there's any influential kind of outliers or anomalies that we handle those and move them. And I want to continue to say this because I think it's important not only to emphasize the rigor in our methodology, which I'm giving a little bit of a preview here on this presentation, but also the rigor in our data preparation as well. So we have reliable outcomes at the end of this. So with an idea of how we're approaching measuring those benefits for EVP as it relates to emergency vehicles, I'm going to move on to what we're proposing in terms of estimating fuel costs associated with this performance. And here we're hypothesizing that there will be a fuel cost savings due to EVP because that vehicle will be spending less time waiting for a green light intersection, or it'll be able to just move through that intersection a little more quickly. So we asked the question, does EVP reduce vehicle stops and or idle time at this intersection. And to do this, and to make this evaluation what we're planning to do is look at two groups of vehicles. Emergency response vehicles that requested and were granted preemption versus those vehicles that were requested, but were not granted preemption. And then we're just going to analyze differences in their average stop idle and post-stop acceleration events at the intersection from those BSN data elements I spoke about earlier, like speed, brake status and transmission state. So our hypothesis is supported that there is a difference here. We look forward to kind of moving to that next phase to say, okay, can we estimate a fuel cost savings as a result of this. And so I'm going to quickly preview what these calculations we are proposing and what they might look like here. So to start out, fuel expense as it relates to making a stop at an intersection. We think this is going to be a function of the number of seconds spent in acceleration, post that stop to arrive back at a cruising speed, multiplied by the gallons of fuel consuming during that period of acceleration, multiplied by the cost of fuel per gallon. And then fuel expense related to making kind of having a period of idle time at the intersection is going to be very, very similar. We're just kind of swapping out, if you will, one of those variables. So instead of seconds and high acceleration or seconds and acceleration, it'll be second spent idle at zero miles per hour. So as we close, I just want to share with you kind of our broad overarching research timelines so you have an understanding of where we've been and where we're going. So with that deployment underway in Utah, our data science analytics team has been planning studies as we've kind of shared with you a little bit today. We've been identifying those data sources, building out data pipelines, and then actually taking out test vehicles to simulate some of these actions that we want to measure to make sure one that we are receiving data. This is the format that we expect the values are, again, within range. Again, just doing some data quality measures there. And as the deployment is certain phases in Utah is starting to wrap up toward the end of the year, we'll look forward to beginning executing these studies, analyzing that data and putting out our findings kind of in a static report. So there are some questions where we do have data feeding from the secondary data source, this ongoing data source, we look forward to operationalizing those metrics in our connected intersection manager tool for various traffic and transportation entity partners to utilize for a variety of these cases, as well as for our data community to access as well. With that said, I want to do a quick plug up. Lauren mentioned in the follow up email you're receiving after this webinar will have links to our career portal for open opportunities on the variety team. Lauren mentioned the white paper that you can access to learn more about this, but we also have a research proposal that goes into more detail about all of this as well as applications to various vehicle types that again, if you are eager to read more about that it will be available to you. So with that said, I thank you for your time and I'll be back to Lauren for a quick wrap up. Thanks for me. I'll just do a quick recap of the main takeaways that we hope to share with you all today before handing it over to our sponsors at soda for some final comments and then the Q&A. So really quickly within our data science analytics team, we collaborate cross functionally to deliver data engineering analytics engineering data science and quantitative research insights and features to the Panasonic smart mobility office. With Cirrus by Panasonic products specifically, we work with very large connected vehicle data sets, as Kevin described, and leverage a suite of tools to help manage and QC that data processing, including soda for data pipeline and quality monitoring. Cal explained some important safety and mobility applications specifically at intersections, which the connected vehicle data supports to improve intersection traversal times for critical fleets like emergency vehicles and transit vehicles. And finally Renee shared our proposal to measure the value of these applications and how critical quality data is to being able to deliver these benefits to our customers and to vehicles on the road. So one last time, please check out the links that will be provided in the fall email for additional information. And with that, I will pass it back over to Soda for a quick demo and any final remarks. Perfect. Thanks so much Lauren and the Panasonic team. That was that was great. Really, really fascinating stuff. So I'm just going to spend a couple of minutes to provide an overview in terms of soda if you're not familiar with the tool so just two slides highlighting kind of what we focus on from a product perspective and then give you a quick overview in terms of how the data quality metrics are presented and create. So we believe that there's four pillars in order to be successful with data quality. So starting from the left there we have finding problems automatically and from Soda's perspective that means being able to take Soda point that towards the data source, and then we'll figure out what we can figure out automatically. There are many things like schema changes time series anomaly detection that Kevin mentioned earlier, and also data freshness so features that can be applied very broadly without having to have any knowledge about the actual data. The next one is aligning on data expectations. And here we're trying to empower the data consumers and allowing them to express what good data quality means for them. And doing that in the form of a contract or an agreement so it's very clear what good data looks like. The third one is being able to analyze and manage alerts, and in its simplest form that is just being able to notify someone using email slack teams or webhooks, but also keeping track of the resolution process and being able to provide a workflow workflow there to be able to provide status updates assign leads measuring measure resolution types. The one is being able to prevent data issues and this is targets more the data engineering side of things. So, part of soda is based on an open source Python application that's really easy to incorporate into existing pipelines to do checks before and after for example or during data production. And with that I'll move over to the last slides before last slide before we begin to the demo. So just to provide you overview how things work. The kind of basis for most of this is soda quarter which is that open source Python library that I mentioned earlier, and so the core is responsible for translating the contracts and agreements into optimized SQL queries that will execute on the data sources in product data quality metrics so for example, percentage of missing values or a number of invalid product ideas. And then that is being compared against thresholds defined in these agreements. And then the results gets uploaded to soda club soda cloud. And in soda cloud we store this over time, and that allows you to do change over time comparisons within soda. And that if missing values starts to increase with 5% between day one and day two, then race. On, on the other side here we have the soda agent, and that's there in order to empower more of the analyst and the data consumers. The agent is essentially a wrapper around soda core, but it adds to key capabilities so the first one is scheduling. So, for those types of use cases they're typically detached from a data pipeline. So it allows you to have an independent schedule. And it also enables the data consumers to define their data quality checks and their agreements directly within soda cloud. And then the agent will pick up these agreements translate them to, to SQL queries and then return the results back to soda cloud. So soda cloud becomes that single pane of glass for data quality processing organization. The upper end here is more around how we fit into an existing ecosystem so in a simplest form that is being able to send notifications using email and chat. Being able to maintain the resolution process either completely within soda or have it be extended to existing ticketing tools like your gress or your service now. Yeah, we also provide a number of integrations to data catalogs. So the, the thinking there is that we will augment the experience for the data steward so they cannot only find where the data is who's responsible, but also get an indication in terms of how reliable is this data. So in this case is around using bi tools together with soda. And what typically happens there is that we expose a reporting API from soda cloud, where, where the user can slice and dice that data to create custom dashboards and reports. With that I'll move out of the slides and into the demo. So get a feel for what this looks like. So this is soda cloud and this is what you would see when you log in. So you get a list of the different data sets that have been previously on board, you can search and sort based on a number of dimensions so you can look at owners different data sources, time based searches, and also provide filtering using custom attributes so domain or origin or tags here. Then if we take a look at one example here, which is an orders example. So on the top, you get a few aggregated dashboards so the first one is around coverage. So nine here that is the number of checks that is applied to this data set, and the circle being green indicates that this is a good value compared to the rest of the data sets that I've been on. So you get an health score and display that over time and health here is defined as the number of checks being performed at this, at this data set that comes back green versus the total number and you can also see the incidents over time. Let me give you one example of a feature that Kevin mentioned as well. This is the time series anomaly detection on row counts. So the way we do this is that we look at historical data in order to calculate unacceptable boundary which is the white space that you see here. And then when there's a sudden spike or sudden drop in the amount of data that could trigger either a warning or an alert, depending on severity. At this point in time, you cannot be sure that this actually was a data quality issue, it could have been that you had a very success in this case a very successful marketing campaign and then all of a sudden you get a bunch of new orders. So you can provide feedback saying that okay this was actually not classified correctly. This is due to expected monthly seasonality and the underlying reason was this marketing campaign. So click save two things will happen. So first of all, this will be sent back to the underlying machine learning algorithm that will learn to adapt to this. And the second thing is that it's a good way to indicate to the rest of the team that this wasn't actually a data quality issue. So the way that you define these data quality checks is in the form of agreements I want to show you how to do that as well. So if we create a new agreement we provide a name so we do consumer agreement, for example, and then we select the data source. So this has been previously configured to just set up connectivity to to the underlying data source. The user is presented with an editor where the actual agreement is defined. So if I look at a certain table here we can do retail orders, for example. And then we assist the user by providing examples on snippets. So let's say, for example, we want to check out the row count we want to compare this table with another table we want to look at duplicates. And then we might want to look for schema changes, for example. So it makes it really easy to kind of define your contract and the only thing here is that we need to provide some additional details so we do order the, and then we do retail, retail customers. And what's really cool here is that when you press this test checks button what will happen is that the agent will will parse this agreement translate that into optimized SQL queries, and then execute that against the data source. So this allows you to express fairly complex data quality checks, without having to write write any SQL, you could write SQL if you wanted to that's also a possibility. So here we get the results back, looking at the row count comparing it against another data set, and the duplicate seems to be fine. And we also provide kind of full complete logs from the scan for additional troubleshooting. Then the next step in an agreement is that you want to identify the stakeholders. So here we can select, let's do, let's do Janet here. And when I add her, then a review will be requested from from Janet just to make sure that I didn't add anything that would take too long, long to compute or there's duplicates with another agreement somewhere. And the next step here is to define notifications so here I have notifications for the stakeholders have teams integration set up and here I can add an additional slack channel for sending failures to. And the last step is to, is to select a scan schedule, or you can you can select a predefined one or rage your own from this. So when I click save now a request will go out to Janet for her to approve this agreement and once that's live then once that she does that it will, it will go live so to speak and start executing data quality checks. So really, really briefly as a last point here. I've, let's say we've had this up and running now for a couple of days and I want to take a look at some historic results so if I pick one here we can do. We should do could do this one country code reference data seems to be off. So based on this alert, but I can do is that I can create an incident so I would say concrete free code. Like that I provide a description I set the severity to major in this case. And then here I can join a number of alerts together into an incident. So thinking here that is that an alert and a warning is triggered as soon as something happens, but then it might take you a day or two before you have time to further troubleshoot this. And that's when you would go and create an incident so if I save this save this now, an incident will be generated for me, and I'll assign a user so I'll be responsible for this one I'll change the status to investigating. So under the integration section here we can see that this will be notified to teams channel, we will create we created a custom slack channel for this, and we also push this into Jira. So if I open this up in Jira, we can see that we, and we get notified we have a direct link back to back to soda. So in the progress here I can mark this as resolved and just say fix deploy to. Then, when this is marked as resolved this will also be sent again to the different channels provided here so if we go back to Jira now, we should see this one being marked as done here. Perfect. Yeah, so that's essentially what I wanted to go over this is a very, very brief overview of some of the capabilities of soda. Please go to soda.io and check out either soda core which is the open source part and you can even you can sign up for a free trial of soda. And please let us know if you have any questions. Thanks so much. Thank you so much Alvin and thank you so much to the Panasonic team who have given this great case study and information. Just to dive in here so just to answer the most commonly asked questions. Just a reminder I will send a follow up email by the end of day Monday for this webinar with links to slides and links to the recording of the session along with the links for Panasonic and the data that they have provided. So, how does your team interact with the data governance office or similar team responsible for data governance what kind of support do you get from it. I can take that one. Great question. Our team has variable degrees with which we engage our customers based on the scope of the agreement. And for some of our largest customers, we work collaboratively with their technical teams. So that our solutions meet any requirements from for example their IT networking or data governance organizations. We in some cases might conduct a design or validation review with them to ensure that we've got agreement and approval of any features or implementations that are provided. At the end of the day the exact support can vary based on the customer needs and we on the Panasonic side do have the ability to tailor our processes to support various organizations or differences across jurisdictions that we work with. Hopefully that answers the question. And certainly can if you have additional feedback or additional insights or to that question and you can feel free to submit them in the Q&A portion. And so earlier in the session there was a mention of a link to your remote job opportunities so I have that that again will be included in the follow up email and I'll put that in the chat to as well or if you all have that super handy we can put that in the chat for you immediately. So we'll get that to you. And on slide 28 you brought up operator demographics. If BSMs are anonymous then how do you link vehicle operator demographics to them. Yeah I can take that question really quickly Shannon so at a high level our BSMs are anonymous but we do need to understand when a vehicle is making a request via an SRM or that signal request message we will need to understand if that's a valid vehicle making that request and so that information will not be anonymized. Hopefully that answers for you a little bit better. Certainly. Thank you. And, and Alvin can SOTA process 100 million records in a batch and validate its data quality. Yes, yes we can so in terms of SOTA and performance since we're pushing down queries to the underlying data source that SOTA is never really the bottleneck so it will it will be dependent on the data source that that you execute on but 100 million records should not be an issue. Very cool. And there was a question that came in earlier so how do you ingest data from AWS to Snowflake what's your experience for real time data ingestion into Snowflake using various methods. I can take that one. Just a two fold. We loaded in batches where data stays in S3 and then we loaded on a defined schedule using snowflakes copy into command. We also do streaming using micro batches, which is a little bit more real time. Snowflake snowflake feature for that that really suits our need for snowflake because snowflake is meant to power analytics. It's not really used on our end to power real time features snowflake is an analytical warehouse. It's not transactional. In the summer snowflake then announced a new streaming API that is over post request that allows you to post data directly into snowflake without having to stage it anywhere. So that's also an option. The last option is snowflake does have a Kafka connector as well. So if you're using a Kafka queue snowflake can ingest data in real time off of that queue. And there's some really good questions that came in the chat to that I kind of want to call out. There was a question here, you know, how big is your team and how did you come to that team size. Yeah, so the data science analytics team specifically is 10 people today. And we've got about 50 people overall across all of the different teams and functions supporting the serious by Panasonic product. Thank you. And is there any opportunity to utilize a serious platform to optimize fleet operations a school buses for instance. Yes, absolutely. So fleet optimization can be applied to multiple different types of vehicles today we discussed emergency vehicle snow plows and buses transit buses, but it absolutely could be applied to school buses. It's like freight. So, so really, we can work with different jurisdictions and different entities on their particular use cases in order to enable priority and preemption services at intersections for for various types of fleet vehicles. And we also, in addition to the serious product which we spoke mostly about today, have the one connect platform for fleet management so serious can help with priority preemption at intersections but our one connect platform can help with fleet operations and maintenance like the health of the vehicles, if the vehicles encounter any particular events as well for identifiable fleet services. Thank you so much. And, you know, everybody this has been such a great presentation and demonstration of data quality really appreciate everybody, the Panasonic team joining us today and soda for sponsoring and making this webinar happen. Again, just a reminder to everybody I will send a follow up email by end of day Monday for this webinar with links to the slides that links the recording links to the career portal and links mentioned by Panasonic as well as links to ensure you get all the information about soda. Thank you everybody. And thanks again to all of our presenters. I hope you all have a great day.