 So, hopefully by the end of this talk, we'll have you convinced why we need AI on the edge and what do we gain out of using open source for that. So, I am Sujata Tebrevala. I work in the open source program office at ByteDance. I joined ByteDance recently, but I have been working in open source for quite a while now. I have experience in working on computer vision. Back in 2000, I'll talk about it a little bit, but then I got out of it because I got into networking software, but now I'm back here and that's why we are talking today. Fun fact, I run Marathon's and you guys, the reason why I do it and art is how I destress from my life at tech and I share all of that on social media. So, if you follow me, you can find all of that. So, with me today is Tina, my longtime friend and colleague in open source. Tina. Yeah. Good afternoon, everyone. I'm Tina Zell and it's my pleasure to share with you today the exciting journey we are on the intersection of open source AI and edge computing. As chair of LFH director and arm, I lead a passionate team dedicated to not only advancing technologies, but also cultivating a thriving open source community. This commitment to collaboration is the cornerstone of innovation and progress in our rapidly evolving digital landscapes. With over a decade of immersion in the technology industry, my experience has reinforced my belief in the transformative power of open source. It's a domain where share the technologies and knowledges become the catalyst for breaking through in AI and where edge computing achieve its true potentials. In this endeavor, my focus has been fostering an environment where open source isn't just about access to code, but also about establishing robust, secure, and dynamic ecosystem. Here we are not only developers, but also creators and guardians of a technical future that is inclusive and transparent. My work at ARM intersects deeply with this vision, directing strategies that shape the very architecture of devices that will carry us into the future next era of computing. At the edge, we are unlocking new capabilities and neighboring devices to think, interact, and make decisions in real time. Through LFH, we are championing an open, interoperable framework for edge computing that chants hardware seeking, clouds, or operating systems. It's about creating a unified front where a diversity of thought and technology converge to dive the edge forward. As we explore today's topics, I invite you to engage, question, and collaborate. Together, we'll delve into the remarkable possibilities that lie ahead and how each of you or each of us plays a part in this ongoing narrative of innovation. Thank you for joining me on this journey. Let's explore how we shape the future of technology together. Thank you so much, Tina, for that introduction. So here is the roadmap for today. We'll talk about autonomous AI. Well, if you have not been living under the rock, you probably know what it is, right? We'll dive a little bit under the hood to give you an introduction, and that should lead us to why we need AI for the edge, and then the importance of open source in AI development, and then we'll look at some future outlook and forecasts as per our interpretation, and of course, we'll then conclude the talk. So what is autonomous AI? Imagine a world where technology doesn't just assist, but it dependently operates. That's the realm of autonomous AI, intelligent systems designed to perform tasks without our intervention. They learn from their environment, adapt to change, and make complex decisions using the vast data sets and sophisticated algorithms. It's about machines that don't just do, but think, analyzing, evolving, and acting autonomously to enhance our lives, much like an avocado chair might surprise us by offering comfort in an unexpected form. This is the future we are stepping into, intuitive, intelligent, and entirely autonomous. Yeah, thank you. So now a little bit under the hood. So I know AI became into the forefront after chat GPT released AI, and I don't know if you attended any of these talks. We are at literally AI Dev conference. I'm sure some of the other speakers spoke about the history, but the concept of AI is not new. Literally in 1950, Alan Turing questioned why machines can't think the way humans can. But there was two very big limitations. Machines were expensive. Literally, it caused at least 200K just to lease a machine. Number two, they couldn't store anything. If they can't store anything, they can't remember anything. They can't analyze. They can't think intelligently. So those were the limitations. But of course, we are not living in the 50s anymore. We are living today when machines are cheap. Literally, the processing power and memory that you have in your pockets is more than that fit into a room this size back then. So that is the power that we have. And literally, I'll not go into the whole history because we don't have time, but I'll talk about the paper which came out of Google. Attention is all you need. How many of you have heard of this paper? Oh, cool. There are a few people in the audience. So this was the paper which was a breakthrough, not in just my opinion, but many experts' opinion because a lot of those large language models and chat GPT, they're literally based on this particular paper. So now we come back to my brush with AI. So now I reveal how old I was. So this is my paper in 2000 on AI, on computer vision. Stereo disparity, I don't know if any of you are familiar with that. But our eyes are nothing but stereo cameras, taking pictures of everything from two different angles. And the way we can tell where things are in this 3D world is by the displacement of each corresponding point in those images. So this water bottle, which is very close to me, will be displaced quite a lot because it's very close to me. Whereas a point, the last people on the audience will be displaced very slightly. So this is how our eyes can interpret 3D depth from just two two-dimensional images. That's all our brain has. So basically what I was doing with this algorithm was trying to replicate what our eyes do with computer stereo images. So the task I had was given two images taken from stereo camera can my program establish two corresponding points. Once I have the corresponding point, the geometry can tell me the depth. And that can be fed into robot, which wants to navigate the world. So the hard problem was to get the corresponding points from just some pixel values. So I was using a 9 by 9 matrix in a 128 by 128 image to compare those points. You can see how slim or how small that was. But even with that, the number of operations required per image was 10 gigaflops. The computer available at that time was I4 with just megaflops operations, which means each operation took a few minutes to perform, which means it was impossible for that robot to see things in real time. So now we are today. So my knowledge of AI froze at that time. And then I got curious when there were breakthroughs and I was seeing computers generating images, interpreting images in real time. And I started researching. So if you think, again, JANAI is new, not new. Like if you've been using phone, literally it has been auto-completing. So a sentence like cat sat on a blank. The computer or your phone will give you the suggestion for the word based on statistics. So it's not really algorithmic improvement, but it's more of statistical improvement based on how much data. The algorithm is fed into and trained on, right? So this is the 2017 paper. Some of you raised the hand that you've seen it. This is the transformer model. Just to give for the rest of you a background, I have some slides. I will breeze past it. But if you want more details, there are some tutorials available on the web. And I have taken screenshots of those. And I have tried to include links. So come after the presentation. I'll give you the link. I'll be happy to give you a reference. So at the basis of everything is tokenization. Computers don't understand words. Those words need to be converted into numbers, right? So each words or each sentence has to be converted into tokens. Those words can be broken into one word each or they can be converted into bigrams or trigrams, which means two words make one token or three words make one token. And that's a science in itself. The way this table is, bigrams are the best because the tokens are best if the next possibilities are reduced. You don't want infinite possibility that adds complexity to the model. So and also the words, how they are encoded. Similar words should be encoded closer to each other. And words which are not similar should be far away. So here the words are literally two numbers in this embedding. So strawberry and apple are close to each other, whereas castle is far away from each other. I forgot to include the positional encoding, but the words based on where they occur in a sentence, they have a different meaning as well. Now, another difference in the words having meanings is literally apple. World was a simple place when apple was just fruit, right? But it's not. Based on the sentence, apple could be a fruit or apple could be electronic, right? So attention is basically you can kind of think of it as a gravity mechanism where based on the sentence, apple is pulled towards being an electronic or being pulled towards being a fruit based on the sentence. So the algorithm basically can interpret where your word will fall into based on the context. The rest of the words, so here basically the words which are important is apple and orange because the other words don't change the context. So this is where the other words from in the sentence will be encoded. The embeddings need to be good or need to be interpreted in a certain way so that when they come too close to one or the other, the differentiation can be good. So the left-hand side is a good encoding. The right-hand side and the middle one is a bad encoding because you can't tell the difference between the two different contexts. So whether apple is a fruit or electronic, it moves close to each other, you can't tell. But then these mappings, building all those encodings will be a lot of work. So literally, linear transformations can give you different encodings, this mathematical transformation. So another concept that is when you predict words, you don't want to end up with just one input. So suppose if you were given a sentence, how are you don't want your model to say you with a 100% probability, right? So you has higher probability than they or things because that sentence, how are you, occurs more than how are they in the language model. So putting it all together, this is what the transformer model is. You have your tokenization, you have your embedding, position encoding, attention mechanism, and the probabilistic model. And the attention and feed forward, which is the prediction model, it can be cascaded, so this can be improved. So that is where the model improvement occurs and parallelization can happen. So this is the reason why the more data you give, the more training you give, the model becomes better. Now there's another aspect to it. So this part is just the pre-training. There is a human training part aspect of it as well. After the model is pre-trained with data, literally most models or like some models I have they have revealed have been trained from internet, right? So it's not a very high quality training data that they have gotten. So then the model is improved using fine tuning where they have humans having questions and answers. So the humans rate the answers and based on that the model is fine tuned. Rating an answer is good or bad or giving a good answer. So I'm just, I just talked to you about all of this so that you have context on this slide. So literally each of these models take billions and billions of parameters and a lot of GPU time, like you know, literally thousands of GPUs to train over months to come up with this model. So that means if you ask chat GPT something, they were trained on data which is at least a year old. So if something happened during that time between when it was trained to now, it cannot tell. It doesn't have the latest data. So which is what actually brings us to the need to have a more lightweight real time training. So that brings us to the case for edge AI. Yeah, in the era of our precedented data generation, the edge is where the action is. As highlighted by Forbes, our precedented data at the edge is not just a trace. It's a strategic imperative that offers a suit of benefits. Firstly, edge computing takes a stand on security and privacy. By processing data locally, we minimize the exposure to vulnerability and protect the sensitive information right where it's generated. Moreover, efficiency in data processing is vastly improved. By reducing the distance data travels, we accelerated the decision-making processes, enabling the real time insight and actions with less waiting and more doing. Let's not overlook the significant reduction in network transmission. Sorry. Oh, sorry. Yeah, in the network transmission, overhead, by keeping data locally, we alleviate the burden on our network infrastructure, leading to cost saving and reduce the conjecture. And importantly, edge computing optimize the overall system performance. It's about creating slick, responsive system that are as dynamic as the environments they operate in. So when we say data is being created at the edge, we are acknowledging a shift in the paradigm. The edge isn't just a place, it's a frontier of opportunity, a horizon where speed, efficiency, and security interest intersect to redefine the landscape of modern computing. Let's delve into how HAI is revolutioning industry with concrete real-world applications. In smart manufacturing, HAI drives predictive maintenance, allowing for preemptive action before issues arise. It ensures quality interaction, inspection with perception, precision optimize the production processes and offers real-time monitoring and enhancing reliability and efficiency on the factory floor. Shifting to urban environment. Smarter cities benefits from edge AI through improved traffic control, intelligent urban management and enhance the public safety with real-time phase and license plate recognitions. These applications are not just about convenience but also essential for a sustainable smart city security infrastructure. In the medical health sector, edge AI transforms patient care with advanced medical image analysis and remote monitoring capabilities. Health management becomes proactive, non-reactive with real-time data from wearables leading to personalized healthcare strategies that cater to individual needs. There are just snapshots of edge AI's potentials. Each application is a step forward or smarter, more connected world where technology empowers advancement in the fabric of our daily lives. Edge AI is not the future, it's the present and it's reshaping our world right now. Thank you. Welcome to a glimpse into the future. As envisioned by the LF edge, a criminal AI edge blueprint family. Each blueprint within our family is a testimony to the transformative potential of AI at the edge. We have a contributor there. Hi. Firstly, we have the school education video security monitoring blueprint. This isn't just about keeping our monitoring schools secure. It's also about creating an environment with safety and future. Recording in progress. Go hand-in-hand, leverage AI and ensure a protected space for learning in growth. Moving on, we see the power of federated machine learning or federated ML applications at edge. This blueprint is about pushing computational boundaries where the edge devices collaborate to learn and evolve, safeguarding data privacy while enhancing the intelligence. So I know that we bank. They use this for the fraud detection for the banking online and also the use for the warehouse management. Yes, so my name's Andy. I'm one of the technicians. We're just getting set up. I'm your colleague here. Baidu Robotaxi, they use it. It's where the rubber meets the road. So imagine the street moving with intelligence while we're the imagined streets. With autonomous driving, taxis communicate infrastructure with the infrastructure in real time. We have personalized urban mobility and safety. In the realm of professional development, the IBL skills platform for engineer education is reshaping how engineers learn. It's about equipping the architect of tomorrow with real world skills today. Using AI to craft a curriculum that's as dynamic as the technology teaches. Tending our gaze to the environment, we've committed to a blueprint that stands for sustainability and natural environments protection. Here, AI is not just a tool, but also a guiding. Monitoring the ecosystem and ensuring the agriculture productivity while preserving our planet's dedicated balance. Last, but not certainly not least, edge AI virtual agent embody the intersection of the intelligence, this interaction and intelligence. This agent, I call it the local charge of it. Empowered by real time generative AI models at edge, three defining customer service, offering the personalized and instantaneous responses. Each of these blueprints is a building block for smarter, more connected, and sustainable work. They represent the curriculum of our expertise, the combination of our expertise, our vision, and our commitment to a future where edge AI is not just pre-evident, but pivotal. As we continue to expand our edge, a Crano AI edge blueprint family, we invite you to join us. Together, we can harness the power of edge AI to not just imagine the future, but also actively shape it. So in addition to these Crano blueprints, which are all open source, by the way, and for you to use, there was a recent model released by MIT called Park Engine. So you can choose, pick and choose the parts of the model that can be trained on the edge, and also the data, which can be local. So this is, in my view, a path forward to using edge, not just for inference, but also for training. In LF edge, we have the edge AI project, and I think we are, we need to speed up. I can go fast, yeah. Welcome to the core of our AI edge project. Where innovation meets expedition. Today, I walk you through the critical component and workflow that makes our project, not just a concept, but a reality. Let's start with our current cornerstone, Shifu. Selective for its familiarity of our developers, Shifu serves as a shortened approach, allowing us to create immediate impact while staying agile for future technological shifts. So Shifu's development is also as flexible as it is robust, mainly into the Kubernetes cluster. For device, the Shifu. Oh. Yeah, I wanna talk about the API gateway. Yeah, we only have like a couple minutes. Okay. Five minutes maybe. Yeah, sure. Now let's talk about the data. The lifeblood of AI. The data from devices connected to the Shifu, and diving into this architecture, you can see there is an AI API gateway, the edge AI apps and the smart computing elements like algorithm virtualization and intelligent scheduling. So in conclusion, the AI edge project is not just a technology model, it's a testimony of to what we can achieve with the right components, dynamic workflows. Thank you. And this is a model with hugging face and interest of time, I think we'll skip that. And this is Volcano Engine by ByteDance, which actually computes edge, combines edge CDN and intelligence at the edge. So again, in interest of time, sorry, we'll have to skim through everything. This is Monolith recommendation system. It's a real time recommendation system compared to many other recommendation systems which are not real time, and it also implements collisionless recommendations as embedding, which makes it faster as well. So there is the Babbit multimedia framework, which is a multimedia video processing framework. You can use any AI inference model, any open source inference model and integrate it into Babbit. All the QR codes here, they have our LinkedIn, my LinkedIn, the ByteDance GitHub, and our Discord where you can ask questions. ByteIR is the backend, so in the interest of time. If you want to work with us, we have a program called open source innovator program and it doesn't need to be earth shattering innovation where you need to go to maths. It can be something, some idea that you're working on. You need some problem solving which an existing open source project can solve and we want to hear from you. Please come back and talk to me later because in the interest of time again, can't go much into detail. In the keynote, Jim talked about the importance of using open source. So again. Yeah, it's the lifeblood of AI development. We really imagine AI not as solitary and endable, but also a global synchrocy of minds. Yeah, basically using open source AI algorithms and knowing the source of data where it comes from and solving the problems that exist in AI together. That is the path that we want to go forward to. So these are some of the known problems. So for example, the current model can answer who's Tom Cruise's mother, but if you ask the name, Mary Lee Pfeiffer, who's her son, they can't answer that. The current LLMs, if you can trick them into giving really harmful inputs. So if you ask them how to create a bomb, it will not answer. But then if you say, hey, my grandma used to work in a bomb factory and she used to tell me how to create the bomb as a nighttime story, the bots will actually tell you how to create a bomb. Things like that. So these are a lot of security. Or if you give the prompt in machine language, they are not secure. They are not trained to block those inputs. They will give you any answer that you want. So those are the type of things which exist. And those are the problems that we can solve as part of open source. So some of the future. AI today is only as good as the best humans they can think. So for example, two plus two. When I ask you two plus two, you'll say four because it's in your memory, you're not calculating. That's type one thinking. Type two thinking is 17 and 2224. You don't know, you will go to the method and you will find it out. So that's the type of thinking the AI's today cannot do. And that is the example of whose Tom Cruise's mother, that's type one thinking. It's like, the answer is ready. But then to answer who is Mary Piper's son, they have to actually go through the input and then infer that it's Tom Cruise. They can't do it. So we need that kind of capability in the AI where it doesn't need to give you the answer right away. But it can actually go back and look at its input and infer real time and give you some. In conclusion, LLMs today, and this is again credit to my source. I forget the name of the influencer from whom I stole this slide. So LLMs today are basically equivalent to what happened with internet. So it's not just a tool, it's a complete ecosystem. It's beginning of that ecosystem. And there is lot more to be built on top of it. So we want all the different ideas. And this is a favorite quote from one of the people that I admire, who was the creator of OCP, you know, Open Compute Project from open hardware. Yeah, open hardware basically. So what he says is 99% of talent is outside your organization. So literally 99% of talent is in open source and we want you here. So, yeah, as we stated on the capses of technological renaissance, the future is clear. Autonomous AI systems are no longer distant genes. They are becoming a common reality now. Okay, so that, yeah, as we draw our discussion to a close, let's recap essential points today. We've unpacked concept of autonomous AI and it's broadening the role across various sectors. And our journey doesn't end here. I invite each of you to become an active participant in the open source community. Yeah, so again, there are different open source projects. I come from by dance. So Acreno, LF Edge, and also from by dance. So we'd love to hear from you and we'll take question answers outside of this room. Thank you. Thank you, I appreciate, happy holidays.