 Hi everyone. My name is Victor and today I'm really excited to talk to this group, especially I'd like to talk about Autogen, which is a framework for building multi-agent AI applications. Importantly, Autogen is a work of contributors across multiple institutions, including Microsoft Research where I work as a software engineer. To begin, I'd like to paint a picture. So in this picture, imagine a picture of a future where AI agents help us with increasingly complex tasks. So for example, we might ask an agent to write a novel, come up with some soft-skinned plan, or help with some supply chain problem. Interestingly, we probably just give the LLM or the agent a high-level task description, and then the agent figures out how to accomplish the task. In addition, over time we expect these agents to learn, adapt, become aware of our context, and better help us with tasks. In this future, agents become the new frontier for computing, transforming how we interact with the digital world. But how might this complex task be sort of implemented in practice? So if we take the last example there, is it possible for Rustry 1 to be exclusively used by Cafe 2? How will that affect overall cost? A few things need to happen. So first of all, we need some sort of plan. And so in this case, we might try to understand the current setup. We might gather some data, clarify assumptions, and then run some financial analysis. Importantly, we also need to act. So for example, we might need to read a summarized documentation, search some databases, consult with some human experts or subject matter experts, and then run some computation. And after all of that, we need to orchestrate communication across each of these steps and then get the final report. And so a few things are sort of important here. An agent that can help us with these tasks must be able to reason. So first of all, plan, sort of deduce next steps. They must be able to act. So for example, they must be able to use external tools, query databases. And then finally, they must be able to communicate. This might be communicating with other agents or communicating with humans to sort of get feedback. But is a single agent sufficient for these sort of complex tasks? And the core argument here is that as we sort of try to address more complex tasks, we might need to shift from just LLMs up down to let's say groups of LLMs. And so we're currently, the current paradigm is that we can apply LLMs to reason and solve tasks. However, we're all familiar with the fact that LLMs tend to hallucinate, or they might struggle with tasks that require multi-step reasoning. To address that, we might give this LLM access to some sort of memory or external knowledge to reduce hallucination, or might give it access to a computer or calculator to sort of help it address complex tasks, like let's say computation. Importantly, we will notice that as we address more complex tasks, this task might require multiple back and forths, require complex length instructions, and at that point, a single agent might start to struggle. And then finally, the current thing is that we can start to assemble groups of agents. And the idea is that if we have this sort of multi-agent approach, we have separation of concerns where each agent connects, address a specific goal, we need to improve results. Some prior recent researchers that like a multi-agent setup can help with divergent thinking, improve factuality and reasoning, and also provide validation. So in some cases, some agents sort of address the task, and some additional agents sort of verify the task was addressed correctly. Okay, so in practice, in theory, the promise of multi-agent implementation is very interesting, is very promising, but the implementation is hard. So how do we specify agents? How do we give them access to tools and LLMs? How do we orchestrate communication between agents and humans and get them to work together as a group to solve a task? And then how do we sort of make all of the participants in this loop are worth each action? An autogen was designed to sort of solve this problem. So there are three high-level properties of autogen. So first of all, we have a conversational paradigm. So we sort of believe that the primary way to sort of orchestrate sharing of states among all of these agents is sort of maintain a shared message history and message list as represented as a chat. And in addition, we want to offer a flexible API that's easy to use, enables communication across multiple agents and humans within the loop, and then make this workflow or API really customizable that we can support simple workflows and then complex workflows. And then finally, we want, we aim to have like a vibrant growing ecosystem of integrations. And I'll show some examples in a little while. In terms of API, each agent has a simple interface. It can send messages, which is essentially putting in a message into a shared message list. It can receive messages and then it has an API to sort of act or reply based on the messages it receives. Next, we have three built-in agents, one called the user proxy agent. Typically, we'll act on behalf of the user and can use tools like, let's say, a compiler. We can also have an assistant agent that has an LLM configured to try to solve problems by writing Python code. And then finally, a group chat manager that can orchestrate groups of agents to solve our problems. The API is fairly simple and so on the left, how might we sort of set up a two agents setup? On the left, we have just a simple line of code where we set up a user proxy agent and then an assistant agent and we just initiate a chat between both of them. In this case, we ask them to plot a chart to show stock price. And what ends up happening is that the user proxy says, here's what the user asked for. The assistant agents write some code. The user proxy executes that code. Some error occurs. The assistant deduces that, okay, some error has occurred. Here's some additional code to fix that, which is to install some dependency. The plot gets generated, some corrections are made, and then the final output is given to the user. So that was a simple setup, but we can orchestrate these agents into much more complex setups. And so one of the interesting examples is the example A4 where we could have a group setup where some agents write code and additional agents might work as some sort of safeguard and sort of critique or ensure that like the code that has been written adheres to some security constraints. Finally, I'd like to talk about the set of integrations that we sort of seen with the other gen ecosystem. So for example, we have support for multimodal agents. And so you can imagine a setup where one agent writes code to generate visualizations and another agent could be a visualizer and critique that takes the chart that has been generated, sort of looks at the visual quality of those charts, looks for over plotting, combined access, gives feedback, and then both of them collaborate to get even much better outcomes for the user. We have examples of teachable agents that continuously learn facts, preferences, and skills from user feedback, saving them and using them in other related tasks. And finally, we're even looking at examples, local interfaces that allow users to, for example, declaratively specify an agent. And so in this case, specify the maximum number of turns, the models that are used, and then run tasks against these declaratively specified agents and then look at the outputs. So in this case, we have four messages exchanged by the agents and then they can generate a chart that can be reviewed. We can also give these agents access to skills, for example, the ability to use APIs, generate images, and then we can even tell them to do things like go ahead and generate a whole entire book. And the agents will sort of interact. In this case, 12 messages took about three minutes. So we ask the agents to generate a children's book and so essentially use the image generation API and generate an entire book. And so these are some of the example experiences that we are starting to see users and our teams sort of put a type using a framework like Audigen. Audigen sort of studied out in March this year. We open sourced it in October this year. And in the last two months, we've seen quite a bit of growth. We've seen about 18,000 stars in GitHub, about 150 active contributors, 11,000 members on GitHub on Discord. And frankly, in my opinion, we're just getting started. There are a lot of open problems that sort of are yet to be addressed. So for example, how do we improve communication efficiency across agents? How do we integrate multiple models and optimize for costs? How do we streamline more complex long running workflows? And so we hope that you'll join us. Audigen is an open source project, MIT licensed. Like I mentioned, it's growing. We're just getting started. And we'll definitely benefit from your help and your contributions. Thank you.