 Can you all hear me? We can actually get started. So good afternoon everyone. Thank you all for coming today to attend our session. I know it's supposed to lunch. I hope you had your caffeine. Hopefully our session is interesting enough to keep you all awake. It's so nice to see everyone like face-to-face finally very very happy both of us are to be here So our session today is about how we built a computer vision-based AI ML machine software solution to secure the surroundings using just open source tools So before I move forward I need to show this slide our legal wants us to show the slide to you the notices and disclaimers This is the agenda for the talk we'll be doing the introduction I'll go over the project objectives Followed by Sam who will cover the architecture and then I'll I'll go with the challenges and learnings We had from this project Sam is gonna end with the ethical considerations So before we actually start Interaction to our team and us So this is our team the guardians team all of us are from Intel Corporation And our team was called as the guardians team. We are mostly based out from Chandler Arizona This particular project involved a couple of sprints for us to finish which included architecting Designing and then building the entire solution My name is me to Elizabeth Simon. I'm a senior software engineer working for Intel Corporation And I'm based out from Chandler Arizona My partner in crime is Sam and I'll hand over the mic to her Hi, I'm Sam. I am in the Austin, Texas area and I am a software engineer here at Intel I I While most of our org is in the Arizona office. I Am the Texas Software engineer, sorry. I'm trying to figure out how to not echo for everyone here today Hopefully that works. So Nathan and I are both in the IOT group here at Intel And so that's kind of a transition into some of the background concepts that are related to our project here today So what is the internet of things? Well? IOT is this web of numerous devices that are interacting and interconnected with minimal human intervention So there are massive amounts of devices that are getting connected more and more to the internet and that are collecting and sharing their data With this rise and the amount of things out there We're seeing a lot of growth in the global IOT market For 2022 we are seeing the global estimation for the IOT market at fourteen point four Trillion dollars That's trillion with a T Which is a lot of money and We're seeing some strong drivers for this growth to include new markets From health care and fitness to manufacturing Retail government the list goes on and there's also increasing support from really big name companies Such as us here at Intel, but also other companies such as Google Microsoft and the like There's also strong drivers things to cheaper and faster processors as well as wireless networks With this growth that we are seeing with IOT. There's also a lot of growth in computer vision based applications Cameras are being deemed that ultimate thing Because they're generating massive amounts of data and so when you're gathering so much data It's critical to derive sense and meaning from this data So computer vision and artificial intelligence is being deemed the eye of IOT Make sense when you think about the lens of a camera Whenever you're developing IOT solutions for smart city smart home Autonomous vehicles or drones You do require that smart eye or a solution to detect Track and analyze objects in real time So for the presentation that we're here today to share with y'all That's on a smart city solution that we worked on and there are a few requirements and goals of a smart city solution Such as improving city safety. This is a big one here You do want that reduction in crime and you do want appropriate notifications sent out to the appropriate safety officials such that they can create action and Lastly you do want some sort of edge analytics of whatever flavor you might be interested in Naturally with any solution there are pain points and When it comes to a computer vision application that might get deployed at the edge There's some interesting ones around the idea of performance bottlenecks such as because of inference You might also want to really consider that robustness of your architecture As well as maybe some difficulties and considerations given towards your maintenance And so now that I've kind of talked about smart city use cases. I'll pass it off to me through for project objectives Thanks, Sam So before we move more into detail about the architecture of the solution, I'll go over the project objectives So Intel is not into the business of selling software and making money Instead we build software to enable our ecosystem enable our developers to build solutions Which are more optimized on the Intel hardware? Specifically our team is involved in building these reference implementations using open-source software all these reference implementations are Distributed through this Intel edge software hub the link is right here It's a one-stop resource for all edge computing software So the software developers can actually customize validate and deploy the use case specific solutions Faster and with greater confidence. So it provides a ready to use use case specific reference implementations which the developers can then take as a reference design and then build their custom solutions on top of it So our goal here was to develop a computer vision-based AI ML solution For smart city and it provides a framework and a processing pipeline for deploying the AI assisted Multi-camera solution for vehicular and walk away traffic So this reference implementation enables the installation of an AI assisted application Which can be used as a reference design for achieving situational awareness Property security and management our solution is completely a microservices based and it was written in Golang It is a container containerized solution We use docker for that and we use the Intel Open-source toolkit called as open Vino for inferencing at the edge So how did the solution came into existence and what's the background so we were working with couple of our Partners or solution integrators. They had these components which were proprietary and custom made Our goal was to pretty much rip off those proprietary components and build them using open source tool So that we can know distribute this entire reference implementation To the open source community So the goal was basically to provide a very easy way to analyze the video coming from different sources like a security camera Capture the inference data results by using AI deep learning models Allow for a very web based user interface to view these video streams Filter and then display these inference data results Configure and manage this different video sources and then Provide a map like a world map to locate all these different video sources So we were able to achieve all these goals through just open source tools So Sam will now cover the architecture in detail Thanks, Neethu So for our architecture, we really wanted to focus on a loosely coupled design Leveraging that microservice based approach that Neethu mentioned You can see here. We do have a lot of open source tools, which is super awesome So on these subsequent slides, I will have the corresponding tool that we're talking about highlighted and yellow I also will have a hyperlink for the appropriate tool or repo that we used so that y'all can go back and check it out if you're interested The first thing that you need when it comes to any computer vision application But also smart city is you need some sort of a video source Some of the developers on our team did not have a camera to start out with and so that's why we leveraged an RTSP simple server RTSP stands for real-time streaming protocol. It's a transport You can think of it much like you have HTTP and TCP as common transports for the web So for our solution, we leveraged a go based ready to use RTSP RTMP an LL HLS server and proxy such that you could read publish or proxy audio and media video streams This enabled us to look into the scalability of our solution because we could set up however many RTSP simple servers that we wanted as well as enable those of us who didn't have a camera and To test that end-to-end functionality through our integration tests So this was us simulating that RTSP server to get that video data and We one last thing we did expect The RTSP server to provide that h.264 encoded video data We chose this video codec just because it is one of the most efficient as well as high definition Now that we have some data, how do we work with it? That brings us to G streamer G streamer is a pipeline-based Multimedia framework such that you can combine multiple components to encode or decode your media stream It's a way that you can create streaming applications For the purposes of our smart city solution. We actually used an abstraction layer to G streamer Called DL streamer So DL streamer is that abstraction layer that we used to more easily work with G streamer as well as integrate with the open Veno inference engine So it's that G streamer plug-in again called DL streamer So to quick touch on the open Veno inference engine. This is a part of the open Veno toolkit the open Veno toolkit is this big open source ecosystem full of tools associated with machine learning and artificial intelligence from all Walks of life in that respect you can work on data annotations with the CVAT tool You can work on evaluating your models accuracy and you can also optimize it for hardware So for the open Veno inference engine, it's optimized naturally for Intel hardware You can see there's quite a bit going on in our yellow box here, and so I'll separate it out There's the top flow and there's the bottom flow So the top flow is that inference pipeline and the bottom flow is our fragmentation pipeline To start with the inference pipeline This is where you have your decode pre-process branch and this is what sending that raw data into DL streamer DL streamer applies our various inference models that we do have pre-configured in our DL streamer pipeline DL streamers what is attaching those regions of interests to your video frames? Using the output of the inference models that are executed by the open Veno inference engine Our DL streamer pipeline enabled you to add different configurations such as additional pre or post Processing steps in case you did want to adjust your models in case you wanted things like input normalization or color space conversions DL streamer is what was telling our open Veno inference engine to load pre-trained weights and model definition From external files that were formatted into the open Veno intermediate representation or the IR format You can take other Machine learning framework models and convert them into this format using the open Veno toolkit For the purposes of our smart city solution. We used a few different models These models come from the open Veno model sue and they're all public The models that we did use include a person vehicle and bicycle detection model You could also get the make-and-model of the car We also had a facial detection model and you could get information such as age and gender Onto the fragmentation pipeline so that bottom flow This is where the pipeline splits the data off Transmuxes it and fragments your input video into short segments that are able to be played back via HTTP live streaming or HLS for short The video fragments can be synced up into a playlist file such that you can view the video live and Lastly the last step here of the yellow box is step four. This is very important So step four is where the pipeline synchronizes those video fragments with your inference results And so this is important such that you can go back and look at the corresponding inference at the time that it happened Really important for a smart city solution And so now that we have those inference results you can send them off elsewhere and that brings us to additional listeners For the purposes of our project because it is in the space of IOT. We wanted to bring in edgex foundry Edgex foundry is hosted by the Linux Foundation. It is that framework for industrial IOT edge compute What this means is that you can interact with edge devices such as cameras you can send them data receive the messages they might be giving off and Create different things like device services or application services as Well as leverage some of the niceties that come within the edgex ecosystem And some of these include things like notification services Security features right out of the gate leveraging vault as Well as different advanced topics and features such as service metrics So there's a lot going on with edgex And so again, that's why we wanted to enable that adaptability and flexibility since this is an IOT solution Bringing it back to our smart city solution. You could send your inference results off via MQTT MQTT is that message protocol for IOT It is a message queuing transport such that you can decouple your producers from your consumers And so you could send our inference results off via MQTT to the edgex foundry MQTT client directly to ingest by the message bus of edgex While it's great to send those results off elsewhere, you probably want to store them somewhere That brings us to Postgres Postgres is that big open source object relational database focusing on SQL compliance and extensibility For our solution we wanted to store our video data as well as the related metadata in our Postgres instance This introduces that concept of binary data storage within a database and Depending on which database you choose this means different implications in terms of storage Retrieval and access And so I'll go over the three main types and ways of storing your binary data in a database and map those back to Postgres One you have sim link metadata So this approach is where you can store your binary data in regular files and Then store the file path in your database This is great for limited scope single one-off use cases or maybe a single user instance However, there is the potential of duplication of your metadata as well as if you move your files and you don't update the database Y'all know that can cause issues Your second approach is blobs or binary large objects Postgres doesn't implement this exactly to a tee, but they do have their large object data type So with this approach This is where your data gets stored within a special heap and then you have special commands to access your data through the OID or the object ID While this does help with some of the memory requirements There is a notoriously difficult API that you have to interact with to access your data And you also are limited to four billion large objects per database. I Know four billion sounds like a lot, but when you're working with that ultimate thing and you're constantly streaming data Four billion can come and go rather quickly depending on how you structure your database Lastly the option that we chose is the binary columns approach Implemented through Postgres by tee data This is where your data is stored in variable length table columns while still supporting your typical SQL syntax and security features This can oftentimes use up quite a bit of memory And so you do need to be mindful of those restrictions in terms of column and table size However, thankfully for us developers, we didn't have to worry about that Postgres has a toasted mechanism when I say toasted. I don't mean bread that you butter toast and cinnamon and sugar. No Not that tasty. I mean the Deckler or sorry The mechanism by which you keep your physical row size below your page size, which is typically around eight kilobytes So this does apply to those variable length values and can be handled by the Postgres database server transparently and automatically Individually toasted values have to be below one gigabyte in size and you still have that four billion toasted value limitation Hopefully that sounds familiar to When I mentioned option two Right because option two Has that same restriction What this means is that Whenever we're implementing option two versus option three so that blobs approach versus your by tee toasted approach They're identical in terms of storage and efficiency access but that option three approach so the Postgres by tee toasted data is Better than option one because you remove the potential for duplication of your data So that's why we chose option three and a toasted mechanism Second Postgres also has a nice city in terms of declarative partitioning So this is where we treat our video data table as if it is one table when in fact It's multiple virtual tables partitioned out by camera ID and by time This created some performance benefits for us and Also simplified some of the data warehousing tasks if you wanted to archive or delete some of your old video data We also allowed for the partitioning of The camera partitions to be sub partition To optimize our database for that in use data because we found that most Times you wanted the more recent data for a smart city solution And so there were some performance benefits there as well as this declarative partitioning allowed for us to write our queries against our video data table directly and then Postgres would handle the Routing of that request to the appropriate partition pruning off the ones that it didn't need to access another performance game and So lastly on Postgres We had a database initialization script with certain functions and triggers Defined such that when you added cameras the appropriate camera ID partition was created along with the sub partitions and correspondingly when you removed cameras and So this was done with minimal locking and overhead. Thanks to the declarative partitioning approach And so if you wanted to expand our solution into a proper deployment scenario You would definitely want to look at automating those data warehousing tasks on a more regular cadence Through PG part man and PG cron which are functions that can do so Now that we've stored our data we can work with it through some back-end APIs which we implemented through go microservices Our go microservices were consumed primarily by our web UI However, you'll see that section 8 is highlighted to include other clients We wrote our services such that they could be consumed by additional clients such as a common streaming application by the name of VLC So that was a nicety with some of the approaches that we took for our microservices And to get into them we had three We had three main microservices and so the first one is the least complex It's your basic crud application to create read update and delete cameras You could also get a list of the associated pipelines running for it as well as its URI Second is the inference service. This is where you could get a list of Models layers and labels and you could also find the sessions assigned to a particular label by an inference model There was some niceties here that we found out about the Postgres time API So whenever you think about your inference results It would be helpful to group them by time and so if you wanted inference results for now or yesterday Then you could do so through the Postgres time API which we did leverage in our inference service Lastly we had our video service which does provide endpoints for our playlists as well as video segment data This supported HTTP live streaming as well as video on demand This did assume that your recorded data consisted of H.264 encoded video Muxed in mpeg to transport streams of consistently timed video segments and For the visualizations piece this brings us to our angular UI framework We chose this as well as most of the other tools because they're open source and kind of the right tool for the right job We also had some boilerplate code to work with from a past project That's always nice So for our angular UI we did want to display the live data as well as past recorded data Which brought us to a bit of a conundrum? We looked into what were the standard protocols and practices for Showing video in a browser served by your typical HTTP server and did not find an explicit answer There were a lot of options and a lot of options sponsored to kind of buy some big-name companies You have HTTP live streaming or HLS Supported by Apple you have mpeg-supported by Google and Microsoft webRTC which Can get a little hairy depending on how you implement it and then you have potential for security concerns And then you have your typical web socket approach So not a clear path forward Until we found out about the HTML5 video tag Through video JS. So video JS is that HTML5 media player framework that we use to display our video data in our angular UI Our UI also featured a few different configurable dials and knobs To customize your inference results because that's the key here for a smart city solution the inference results and Lastly our angular UI had a tab for Grafana Grafana is that common data visualization tool out there and For the purposes of our project we enabled you to see the edge device locations or your cameras on a world map and We had these pre-configured dashboards for this as well as for your inference results And we also leveraged read-only database credentials to populate our data source configuration files That's a lot on the architecture And here's me through on the UI Thanks Sam So I Will go over the entire flow of the solution through the UI story here So this is our landing page and we have four features here first is to live stream the entire all the cameras that you have Configured second is to see the recordings of those which are doing the inferencing The third one is the camera configuration and lastly we have the Grafana, which is the dashboard to view all the camera locations So the solution starts here. Basically, you are configuring the camera or your video source So we have a form here to add the camera configurations and metadata like the locations latitude longitude the RTSP URI for that and we also have an option there to select the inference pipeline the particular Pipeline where the model leads to run. So like Sam mentioned, we had face detection vehicle and and Vehicle and license plate detection So the user can select which pipeline and multiple pipelines can be actually selected here So once it's configured, this is the page which pretty much shows all the cameras that you have configured It also provides options to edit the these details or delete or no add more cameras if you want and This view is pretty much displaying all the live streaming of the data coming on coming in from all the configured cameras We have the recordings here and if we go some more in detail This will actually show you the exact clipping where a person is detected or where the inference data has been detected There is an option to modify the thresholds or the confidence level of the models and view how the model is going to perform here And we have all the inference data Depicted here and the last one here is the Grafana dashboard So we use this world map plugin to display all the location where the cameras are installed If we hover over each of them it will show you the metadata and if you click on it it goes to the actual recording page where It's it's storing all these inference data and this dashboard actually shows you the analytics This is just an example of how it can be used. So here we are detecting the number of males and females This is for one particular camera If we have another dashboard, which is pretty much showcasing all the cameras that we have configured the inference data for all of them So that ends the entire architecture in detail Some of the channels and learning Sam has already covered So I'll cover a little bit more of what we went through while implementing this solution So any deployment at the edge Has couple of challenges. The first one obviously is the resource constraint Two of the major things here is the storage and compute storage specifically here because we are dealing with Livestreaming of data and then whatever inference video you're getting us. We are actually storing it in a database So there is a very high chance that no edge device is gonna run out of space So there needs to be some kind of a frequent, you know backup of your data from the edge Either onto a cloud or to an external storage Device and so the other thing here is compute, which is the hardware So before we actually go in and deploy the application We have to make sure that the hardware can support it So we have to do some kind of a benchmarking to ensure we don't have any kind of performance issues there The second important challenge with respect to this use case here is the data management So these were the four main goals we had with respect to data management We should be able to reliably and efficiently record the incoming data streams Process our video frames for inferencing and store these results. There should be a way to view all these data that is coming in and also the data Analytics part of it and we should be able to support the maintenance task So at Sam mentioned earlier We had two options either to use the Postgres database or you can just store all the data on the file system But we went with the Postgres Approach mainly because it simplifies the whole data abstraction layer and it supports a very unified security and privilege management It also provides several admin functions, which otherwise we would have to actually code Depending on what the file system is underneath Postgres also supports the declarative partitioning which treats this entire video data as a one large table When in fact it is virtual table, which is partitioned by camera ID and the time So it provides better performance and also simplify some of these data warehousing tasks such as no taking back Backups or deleting the recording sensor. This approach is also agnostic to the location of the database server So server can run right next to the video processing Software or it can be anywhere else on the network even on the cloud So although this implementation expects a reliable database connection We can modify the code in such a way that we store these video fragments and the metadata actually on the file system And then we upload it to the database only based on certain conditions But when we do this we have to take care of the security as well as the data storage on that edge device and Then most importantly, they are time stamping because if time stamping Gets messed up then the data retrieval is going to be wrong and the last very important thing is security This is a open-source project. So I Wouldn't say that this is the most secure solution But we made sure to know have the basic things in in the solution there We ensured that no certain files are inaccessible to unauthorized users We did provide service authentication for our all our Golang application services Which is by using the authorization Policies and tokens we did use TLS encryption for the communication happening between The HTTP APIs and the web UI. We encrypted our RTSP URI which can no store those sensitive data of passwords and passwords and user names We used vault for storing all our sensitive data Ejects, which was mentioned earlier comes with this vault functionality and we made use of that And we also ensured that all our sensitive information is not seen by default on the UI Next is the ethical consideration and Sam will end with that Thanks, Neethu So whenever you're developing any software, especially with respect to Machine learning and artificial intelligence. You definitely need to have a strong moral compass and to have some principles guiding that It's important to note that your software can be used for good Or for not so good. And so how can you safeguard to ensure that it would be used for good? Here at Intel we abide by these six ethics principles to start out with you should always respect human rights This is something to be done on a daily basis, but also something to Keep in mind when implementing and developing your solution This is something developers should follow but also any member of the scrum team Second is enable a human oversight For this at Intel we have the AI ethics committee that reviews all project requirements as they're being built out as Well as checks off a project before it gets released to ensure that we're abiding by agreements and getting proper consent And so forth related to our project Third is an explainable use of AI AI software can be impressed increasingly opaque and complex and so you need to make sure that you can explain what you're doing and why you're doing it Fourth is security safety and reliability You should limit the scope of your solution to make sure that what you're building is what you intended to build and it's not going beyond It's intended use Fifth is personal privacy There are privacy by design principles that you should use when developing a solution and we also have a privacy impact assessment Or PIA that we use here at Intel and get checked off before we can release Lastly is equity and inclusion This is where you recognize that everyone comes from different flavors of backgrounds and experiences And you should be mindful of your who your AI software can Directly impact as well as those that it indirectly impacts and so with these principles We have a strong moral compass to ensure that our security as a service solution gets used for good We want to thank everyone for being here today I'm listening to Neethu and I as well as the open source summit for allowing us to be here in front of all of you Lovely people enjoy the rest of your conference and thank you so much. If you have any questions We're all ears So it's released internally and we're just waiting for the legal approval to release it through that edge Yeah hardware did you end up using for the experiment? So our goal was not the benchmarking piece of it We were to know develop the software, but I would say that at least a core minimum would be required Considering all the open-source software that we are using and the additional complexity here is if we add More cameras to it. We're gonna G streamer pipelines are gonna get bigger and we might need more compute we Uh-huh. We used i-fives for some of our development and it was slow at times So yeah, definitely something to be mindful of There's someone in the back At the beginning you had the Intel Openness or the the the framework of tools that you have to apply to the edge But then during the presentation you didn't talk about a tool work were a lot of those tools inside that framework or oh The Intel open Veno not open Veno. There was the the other one around No, not a checks it was there was I think another thing that you were maybe I think I missed the Question beyond which tool Believe at the start there was a Toolset that you're talking about that was developed inside Intel that brought together edge services DL streamer DL streamer No G streamer DL streamer Open Veno, I'm gonna go back you can check this like sorry Let's see Right near the beginning. I'll just that one's the first one then it's earlier than that I don't know which one because that's the start but that was your no That was your first slide. I don't think it was for the architecture. Sorry. It was before the architecture. Oh Got you got you. Okay. Yeah, I got you. Oh, there we go the Intel edge Wow Okay, so this is the place where we distribute our reference implementations So any developer can go there and just download the tool play with it and you can customize it for your specific use case Right. So when when your project is approved it shows up. Yeah, it shows up there. Sorry That took a while, but yeah Yeah, the questions What trade-offs were considered when you wanted to do inference over the edge versus over the web How did you go about benchmarking and making that judgment call? So for us our team works at the edge and the open Veno toolkit here is our inferencing at the edge That's what no our team is promoting so And even the cameras right think about cameras as your things So it the inferencing needs to happen at the edge You cannot be like sending the whole data to the cloud and then it's doing all the inferencing there and then sending it back to your System that's that's not gonna work right even the autonomous car right if you if you think about the autonomous car Where should the inference happen? It should happen on the car Shouldn't be the data shouldn't be going to the cloud and then it comes back to the thing Yeah, yeah, we were not our goals here We're not to benchmark our solution or find out what necessarily is the peak performance for any part of our project We just worked on the development of putting all these yeah all the software together So was this container like how did you handle deployment to different hardware? Or what was it deployed to or did you consider like our hardware is inter? Okay, so like Desktop CPU or does Intel have a so when when you release on ESH there is a part of ESH The ESD Q which I it's device qualifications And that's what will deploy our solution to different types of hardware to test out Different things and get some of those benchmarking metrics again. We're going through some of the Compliance and whatnot for the final release So the type of device what you asked like core is your laptops, right? But when you go to data center side, it's Xeon, but do we need that much of compute at the edge? Probably not right. Maybe we can just use our normal nooks and laptops with a Better core machine, right? It might just perform and it's also important To think about like most of the people who are in retail or working on smart city solutions often times have Already pre-configured hardware And so that's also something to keep in mind that it's not going to be the best of the best most of the time Thanks Good question. Thank you We're only sending up video clips that had a had a hit on it To the database only the ones where the inference was detected is being actually stored No No, no accident Because DL streamer already comes with couple of those post processing and pre-processing plugins and things like that Which we just included in our pipeline. Yeah, so DL streamer is that data abstraction Okay, thank you. Thank you all