 Hi. Good evening, everyone. How's everyone doing? Good? Wonderful. I know this is a bit of a late talk, but I promise it's going to be really interesting and exciting. So good evening again. My name is Victor Dibia. I'm a principal research software engineer at Microsoft Research. And today, I and my colleague are going to mostly talk to you about Autogen, which is a framework for building multi-agent AI applications. And so it's a really exciting project. It's sort of grown really fast over the last two months as we sort of introduced this. And it's been such a pleasure working with my colleague, Chi Wang. He's a principal research scientist at Microsoft Research. And without any further ado, I'd like to hand the call to Chi. Chi, you ready? Yeah, thank you. Awesome. We can hear you here. Can you hear us? Yes, I can hear you. Lovely. Please take it away. Thank you. Yeah, thanks for your introduction, Victor. Hi, everyone. I can't see anyone. I'm presenting from my laptop. And I'm in right now in the conference room at New Orleans from the New Rebs conference. I'm happy to have the chance to present Autogen in this conference. Autogen is a framework, multi-agent framework for enabling next generation AI applications. So before we talk about the multi-agent, let's first talk about the AI agents. So we probably have seen recently from Yogi's about a month ago that talks about how AI agents is about to completely change how we use computers. AI agents are very powerful entities that can sense environment, take actions, and respond. The recent release from OpenAI Dev Day, the GPTS, is a good example of that. By using large-level models, these agents are able to perform reasoning, use tools, and perform different types of tasks. And these are examples of single agents. And with the expanding range of these large-level models, the workflow of agents can become more and more complex. So imagine a future where the AI agents can help us solve many complicated tasks like writing a novel, generate a plan for doing research analysis, or answering hard questions. And over time, they can learn to be even more personalized and understand more context. So in the future, the AI agents will be the frontier for computing. But how do we get there? How do we manage the increasing complexity of such AI agent workflows? In Autogen, we follow the philosophy with a host greater than the sum of parts to design a framework that can coordinate multiple agents, which integrate large-scale models, such as 2BT, LIMA, and all different models, and can use a variety of tools, even human, included in this multi-agent cooperation. And this multi-agent design has a root in theory. Mary Minsky in 1986 has a theory called Society of Mind. It asks, what's the trick behind all human intelligence and any other natural cognitive systems? The answer here, there is no trick. The intelligence comes out of the society of simple processes, and Minsky called them agents. Each agent has a role that contributes to our combination, but no single agent exhibits the full range of intelligence. And if we compare that to the multi-agent systems, we can make multiple AI agents, but together solve problems. And when they collaborate, they achieve much more than they could alone. Then this provides a new dimension of scaling the power of the larger models and potentially lead to new emergent capabilities. To define this new future, since last year, we have been working on this new framework called Autogen. And just two months ago, we made spin-offs to a standalone repo on GitHub. And since then, the response from the community has been overwhelmingly positive. It's a top training repo on GitHub in October. And after 40 days of release, it's selected into this Open 100 as the top 100 open source achievements, which is a new reward for study from this year. We have been building a very large community with more than 10,000 developers joining on Discord and significant interest from the press and social media. In the next time, I will make a deep dive into Autogen's key concepts and design principles, demonstrate how it enables a wide range of scenarios and make a preview of new and upcoming features. Autogen is designed as an infrastructure or framework for easily building multiple agent, multi-agent application. And we want to support a broad range of domains and complexities. Our solution is to use conversation to connect to agents. Why is conversation so important? Zildo Zeldin, a philosopher and sociologist, has a book about the meaning of conversation. In the book, he talks about how the meaning of minds not only just exchange facts, but also we shift them, draw new facts, new implications, and form a new process of thoughts. And we observe a similar phenomenon in larger models as well. From the engineering perspective, we can compare the differential models and the chat GPT models. Both are from Open AI, but the differential models are optimized for completion. And chat GPT, including GPT 3.5 and GPT 4 are chat optimized models, which have been aligned with human conversations data. They are very capable to perform conversation. That advancement in the auto capability suggests that we should move from the single term completion to multi-term dialogues, which enable agents to integrate feedback from other agents to form reasoning, improve and make progress. And another observation is that these new models also provide new design space such as the city message, which can be used to define the role of the chat model to give agents capabilities to work more autonomously and have special capabilities and work with other agents. So that provides a new design space for designing agent-based systems. In nutshell, we could configure these powerful models to perform as different roles. For example, we can have a coder that writes a code, a critique that suggests improvements of the code, and the unit tester that tests this code. Or we can have a planner that decomposes a complex task, ask each specialized agent to solve them and recompose them. There can be many different ways to make the multi-agents work together, and we need a generic abstraction and effective implementation to satisfy all the different application needs. So in AutoGen, there are two basic steps to build such an application. Step one is defined agents. Step two is to get them to talk. So there are two key concepts here. We need a conversable and customizable agents, and we need to be able to program them so that they could talk and solve tasks in the way we want, and we need to support flexible conversation patterns. In the next two slides, I will dive into these two concepts. First, AutoGen has a very generic abstraction of agents. Each agent can be backed by either larger models or tools or human inputs, and they can be also a combination of them. The developers can choose a built-in agent from AutoGen or extend them to define their own agents. For example, one built-in agent, user proxy agent, is able to act on behalf of the user to perform tasks like using tools. A built-in assistant agent is using line-driven models to write Python code, and group chat manager is a special agent that can manage different agents in the group chat, decide who to speak next, and broadcast a message to each other. That agent performs a role of orchestration, and these are just some specific examples, and we have seen a lot of extensions from developers that build all kinds of specialized agents, but how do we get them to talk? AutoGen proposes a new programming paradigm called conversation programming. It's centered around these inter-agent conversations, and the conversation drives the communication. Each agent, after receiving the message, will perform certain actions based on their special capabilities, and then generate a reply. When the conversation finishes, the compute is finished too, and we allow controlling of these agents by using a fusion of programming language and natural language. That allows us to support many different, even dynamic workflows among these agents. Once the developer is ready, all they need to do is to initiate the chat by sending a message from one agent to another agent. I'll use the next example to illustrate how the conversation flow goes. This is a basic two-agent example. The blue box represents a user proxy agent that takes the initial user question, such as plot a chart for stock price change. The green box represents the building assistant agent. In this case, it will suggest some Python code to download the data from the web, analyze the data, and answer the question by plotting a chart. In the next term, the user has a chance to provide feedback before executing the code. But in this case, a human skips the feedback, so the user proxy agent executes the code on behalf of the user and finds some error. The assistant agent will suggest ways to fix the error, and again, the human skips the feedback, so the user proxy agent automatically installs the package, runs the code, and got the result. And next, the human did provide feedback because they found the access is dollars, but they want the percentage instead. So the assistant agent takes that feedback, revives the code, and finally, the desired output is generated. This is a simple example of the basic two-agent workflow using the default agents from AutoGen. It already enables experience similar to KGBT plus plugins plus code interpreter, all our advanced features through the paid subscription. But AutoGen can enable much more beyond that. In our technical report, we have studied a number of different applications, spanning a variety of different domains, such as solving math problems, using retrieval to augment the generation and chat, making online decisions using multiple agents to perform coding tasks, dynamic group chat, and conversational chess game. They also demonstrate the different kind of patterns allowed in AutoGen. In the first example, we demonstrate how to insert a message to chat between a two-agent, simple two-agent chat. In this case, there are two agents, student and assistant agents talking to each other, but at some time, the assistant decides that he needs to ask an expert. So it initiates another conversation between another two agents. And when that chat finishes, it returns to the previous chat and continue the previous conversation. And this can be done recursively and dynamically. In the third and fourth example, we demonstrate a hierarchical chat where a manager can manage sub-agents, coordinate between them, and potentially interact with another user. In the fifth example, the dynamic group chat is a special example of hierarchical chat. The manager always broadcasts the message to every other agent, so that creates the experience of multiple participants chat in the same group chat and have a shared message history. And the last example demonstrates how we can join multiple one-on-one chats and make more complex multi-agent conversation. And these are just several examples of building blocks. By combining them, developers can build even more complex applications. In the next few slides, I will use one of the example, the fourth example here about multi-agent coding as a case study to illustrate a few points about what autogen can enable. The next example is another colleague of me using autogen to solve supply chain optimization problems. The first point I want to discuss is modularity. For each point, I will discuss both how it works and show some feedback from the community. In this case, the developer decides to have three agents, commander, writer, and safeguard. Writer and safeguard are just variations of the assistant agents I talked before. Writer is able to write code. At this time, it has access to some proprietary tools for the optimization tasks. And safeguard is specialized to reveal code safety. The commander coordinates between the two agents and executes code after the safety review is passed and eventually returns the final answer to the user. Because of the modularity, a developer can easily reuse or make small modifications to existing agents to achieve what they want. In this case, they quickly build a prototype and found it work almost perfectly well in their benchmark. So I've seen other people using autogen, similar things like this, then just a few minutes and build applications like learning syllabus for music, build a German language tutor. They appreciate the efficiency of autogen and compare that to the idea of object-oriented programming. Saying that autogen packages many complexities into agents so that they could try a lot of different ideas and implement them quickly with this intelligent framework. Another point I want to talk about is balance of automation and human oversight. This is a very hard problem in general and autogen makes some progress by attracting this human as one particular backend for agents. And so no matter how complex the workflow is, a developer can easily configure an agent in the workflow to allow human input and configure the degree of participation of the human. And a human can participate in very natural way by just being one of the participants of the multi-agent discussion. Or you can make multiple agents to be able to take human feedback and allow multiple human participants. This easy way of making control and making doing different kind of experimentation is also appreciated by the community and compared to the prior frameworks. There are many other interesting features. I want to mention one last important point that is using multi-agents. It's possible to unlock new levels of capabilities. In this particular example, the end user doesn't have any knowledge of optimization or coding, but the user is able to ask a simple natural language question and multi-agents can work together, work through a lot of different steps like writing code, reviews code safety, execute it, debug error, potentially backend forcing with multiple rounds until the final answer is generated. And at the end of the review, they got the answer in a clean way that enables the human user to make very little intervention and achieve a very complex task that it couldn't otherwise. And our framework also enables developers to view such complex applications with very little effort and time. In this particular example, my colleague reduced their coding effort by more than four times. And we also have experiments showing that the design of multi-agents can indeed improve the performance compared to using a single agent that performs too many tasks. For example, the code safety accuracy index is improved a lot by at least 20% in record for GPT-4 and 40% for GPT-3.5. Combining all of these features and capabilities together, we have seen a very high recognition from the community saying it's a new unlocking machine intelligence, and some even saying they have not been subscribed since the rise of chat GPT. That speaks of the high level recognition. And people are not just talking, they have been building many applications on cloud or mobile in many different industries, including finance, product research, blockchain, creative writing, data analytics, education, law consulting, gaming and healthcare, even some retailers and manufacturing and telecoms. And every day we see new use cases from the community, they're very creative, and we learn many from the community. In the last part, I want to briefly preview some new features. Let's watch this video. Please let me know if you can't hear this sound contributors. For example, the AutoGen UI, the AutoGen Studio UI you have seen in the video was mainly done by Victor. Thank you very much for Victor for adding that feature, and still working on improving that. So that's the end of the talk. I want to thank all the contributors and all the developers that support us. We are very open to collaboration with the open source community. Thank you very much. So we have a few minutes for questions. Thank you very much. Questions? So I was bringing the mic. Thanks for making a great framework. I've been following the Microsoft research contributions like semantic kernel and Task Weaver as well. I have a better understanding after seeing the new features. What would you say the future looks like from an ecosystem perspective of semantic kernel, AutoGen, Langchain? What does the perfect world look like for you guys? Chi, do you want to go first? Yeah, sure. Yeah, I think there are a lot of interesting frameworks in this space because this AI and large models, large foundation models move in very fast way and we see a lot of interesting creative ideas. You mentioned a few of them, and I believe that in future the best way is to have a good integration of all of them that works best for developers' applications. For example, one potential way, even today, we have seen examples of people using AutoGen as a multi-agent orchestration framework and using tools from Langchain and worth medical kernel as a back end for some of the agents. And we're also seeing examples of using guidance as one kind of language to control the behavior of agents. So I do see a lot of opportunities there for the ecosystem's and integrations. Yes, one minor addition there is that I think we all acknowledge that the perfect way to orchestrate multi-agent is still something that we're learning about. I think nobody really knows the perfect solution and it's one of the reasons why we're really excited about AutoGen because there's this huge community of people that actually just helped us test it and could develop the emerging standard. And so right now, we've seen examples where you could specify semantic kernel as a back end for an agent or Langchain tools as a back end for an agent and so essentially we're all working together right now and I think over time the perfect framework and interaction space will sort of emerge. Any other questions? Yes, we got one over here. I have to tell you I was getting goosebumps with that video. Amazing work. So there's an observation and monitoring stuff that has to happen for performance and the sloops and stuff like that. So can you talk about that a little bit in terms of the maturity of AutoGen where that is today? Monitoring. I think this is a very interesting question. It touches on a very important perspective about why multi-agent, we believe, is critical because we see that a single agent or single-agent model often generates results that are not perfect, contains our source of error, hallucination, or just simply not matching the user's preference. And so today, a lot of human interventions still needed to provide the right feedback to fix those kind of issues. And part of the original motivation of AutoGen is to reduce some of the effort that humans need to make by automating some of the things like monitoring, preventing feedback, and more validation using either like your models or tools that perform even more deterministically and provide more, even greater guarantee. So this is the reason why we believe that in general, the agents of the kind of critique or monitoring are very important to add to their flow. But it's not clear how many such agents should be added and where to put them is the most effective way. We have seen some successful examples in particular application domains. How well it's generalized to different applications is still a question we need to study a lot. And by working with the developers, working on the applications, we hope to we hope such generalized patterns can emerge and will correspond any and more support for these working patterns. But with AutoGen, the main idea is that you have this very generic framework so you can easily try out and experiment with those different combinations or settings and accelerate the speed that we discovered optimal workflows. All right, well, you guys have been amazing. Thanks for all the questions. I'll be at the back to answer more questions. Thank you very much, Tiki Chi. Thanks for having us. Thank you so much.