 Thanks, everyone, for coming to this talk. I'll be presenting some highlights of the state of AI report from Airstree Capital. I'm Karina. And Airstree Capital is a firm investing in AI. In early stage, AI first tech and life science startups. And the state of AI report is an annual report that analyzes interesting developments in AI to inform conversation. It covers research, industry developments, safety, policy. And today, I'll be giving a high level overview on what you can find in this report. First of all, the general partner of Airstree Capital is Nathan Benesh. He also founded Rice and London AI, an AI community for industry and research. And also the Rice Foundation, which funds open-source AI projects. The team that has contributed to the report is us. I'm a venture fellow at Airstree Capital. Prior to this, I was an applied scientist at WAVE. WAVE is an autonomous driving startup in the UK. So the goal of this report is really to inform conversation and the implications of AI for the future. It has been reviewed by around 30 people to make sure that there's no misrepresentations or that there's no emissions. But of course, it's a living document. So we're open to feedback on it. So let's get started. The first section is research. And to kick things off, I'll just mention that when GPT-4 came out, it crushed all other LLMs. It has been generally regarded as the most capable general AI model. This report came out in October. So I guess this doesn't include the recently released Gemini model. And not only it outperformed other LLMs, it also performed many humans in the bar exam, GRE, coding. And what's interesting here is that this model really showed what happens when LLMs get scaled. So there are these capabilities that emerge. So although the architecture is potentially similar to GPT-3, the model had capabilities that were unseen before. And what was central to this was probably reinforcement learning from human feedback, which kind of showed the importance of human-labeled data. And although it suffers from hallucinations, these are currently being worked on, or now the discourse is also a little bit changed to say, well, these models, they're meant to hallucinate. So it's going to be an interesting space to see what's next. Of course, the open source community responded to this. And there's a lot of excitement and a lot of advancements coming from the open source, which, so for instance, a lot of smaller models fine-tuned, for instance, LLMs specialized data sets applied to lots of downstream applications. A lot of companies contributing to this. A few of them are Mosaic ML, Together, Adept. At the time of writing the report, Mistral AI7B model was one of the best small models available. What's interesting here is that this community also makes the models more parameter-efficient and contributes to better fine-tuning methods. So really helping the community spread the use of LLMs so that potentially smaller companies can use the models as well. When Lama 2 came out, this was in July, it has been seen as the most capable open source model. And it pleased almost everyone with the terms for commercial use. So as long as the commercial applications didn't have more than 700 million users at the time of Lama 2's release, you could use the model. And in September, it had 32 million downloads. We also looked at the popularity of various language models. On the y-axis, there's the MMLU score of these models. And on the x-axis, at the time it came out. And the size of the dot is proportional to the number of mentions on x. So you can see that proprietary models are, I guess, are more popular online. But open source commercial models come closely second, generally GPTs and LLMs. Historically, people looked at the number of parameters of a model as a proxy for the capabilities of the model. Not everyone would agree to this. But generally, that was kind of considered true. But lately, the conversation has shifted more around the context length of the model. So the capabilities are constrained by the size of the input. So there's been a lot of work in changing the architectures to cope with memory and the compute required and to allow for increasing the LLM context length. And at the time of writing this report, Claude had the longest context with 100,000 tokens. There's a few architectural innovations that we picked up on. Probably the most popular one is Flash Attention. And this improves the way that the attention matrix of the transformer is computed. And it allows for better parallelism and better work partitioning. Another one is Forbit Precision. This allows the weights of the model to be quantized so that inference is faster. Another one is speculative decoding, which allows to decode tokens in parallel through multiple heads rather than doing a forward pass. And swarm parallelism allows to train billion scale LLMs on poorly connected and unreliable devices. There's also been some conversation around smaller language models, whether they can compete with large language models. And there's this paper from Microsoft that shows that very specialized models on QAT data sets can rival a larger model. This is, of course, a very, it's a very narrow domain. But it goes to show that the data that the models are trained on should not be overlooked. Something very interesting came from Epoch AI. It was a study that looked at the amount of data that these models used. And they said that we have exhausted the low quality data used to train large language models by 2030 to 2050 and high quality language data by 2026 and vision data by 2030 to 2060. So it's possible that other innovations might help with this. So for instance, speech recognition systems like Whisper could make audio data available for LLMs going forward, if this were to be true. Another way to generate more data to train LLMs is by generating synthetic data. The answer is this is not very clear, because there's been, I guess, research that shows both sides of it. So for instance, there's one paper from Google that shows that if you train a model on ImageNet, and then you train it on a synthetic version of ImageNet. So 12 synthetic versions of this image data set. And you generate them using ImageNet, a text to image model. Then this model that's trained on synthetic data outperforms the model that has been trained just on the original data. Of course, this is a classification task. So it might not hold for everything. But it was definitely interesting to see, while other people say that actually introducing synthetic data ends up polluting the training set. And the way forward is carefully controlled data augmentation. But definitely the case on this is not clear yet. A very interesting area at the moment is agents. So LLMs are learning to use software tools. There's some works that were particularly got a lot of attention. So for instance, Toolformer is a LLM from Meta that can decide which APIs to call when to call them what arguments to pass and how to best incorporate the results into future token prediction. And from the open source community, similar models were auto GPT and baby AGI. On the same topic of agents, a model from NVIDIA called Voyager showed that, through iterative prompting, LLMs can reason. So what they did is they used GPT-4 to produce code that can call the Minecraft API. So this LLM could play Minecraft through code generation. And the model learned to acquire new skills in Minecraft. And so whenever the code would produce an error, they would just reprompt GPT-4 with the error until it produced code that could be run. So Voyager learned to reason, explore, acquire unique items, in particular diamonds, which are quite rare, quite hard to get in Minecraft. And it also performed previous state-of-the-art models that were trained to play this game. And it's not just, well, I have to say here that GPT-4 is very likely to have been trained on Minecraft code. So this might not hold for every game. But it was definitely very impressive that an LLM can play a game so well. Another, the typical way of playing these games has been through reinforcement learning, although it is quite challenging to train. So in reinforcement learning, you have to explore the environment a lot. It has a high sample complexity. And it's quite difficult to put in prior knowledge into these models. And a model called Spring was able to beat and to outperform an RL model by just reading the game's original academic paper and playing the game and just exploring through an LLM. So this area of research of using LLMs for planning, it's very interesting to follow, because it looks like these models are outperforming reinforcement learning. And there's been years of research that went into reinforcement learning. LLMs are also going into the embodied AI space. This model is a vision, language, and action model that provides driving commentary. It explains a lot of driving models are end to end, so pixels to actions. And one of the criticisms of these models are that they're not interpretable. So you cannot know what the model is doing. And one way to address that could be through language. So lingo one explains what a driving model is doing. So it can potentially even be used to improve reasoning and planning. So going deeper into robotics, Palmy is a foundation model for robotics. It's a 562 billion parameter general purpose embodied generalist. It's trained on very diverse data. And it can control a manipulator in real time. And at the same time, it can also get state of the art on a visual question answering benchmark. And surprisingly, this model is better at text-only tasks than pure language models. In particular, it's very good at geospatial reasoning, because when you train it, there was robotics data in it. Another language model used for robotics is RT2. In this case, a vision language model is fine-tuned all the way to low level policies. So these are robot actions, how to control an end effector. And what's interesting is that instead of only training on robot data of trying to achieve this task, if you start from a model that has more knowledge about the world, so for instance, this internet-scale training, this enables generalization to novel objects. And not only that, it can also do semantic reasoning. So they have an example in which they show that, being shown a bunch of objects, the robot can figure out what to use as an improvised hammer. And it might have never seen this in the robot training data, but of course, this kind of semantic reasoning is present on internet data. And these models can't yet run real time, but I guess it is a matter of time. A very exciting space is text-to-video generation. And this is, I think, yesterday or today, Imagen 2 was released. And the race here is between two different types of models, one that are based on diffusion, video diffusion, and others that are based on masked transformers. And these are a little bit like birth, but instead of tokens, you have image patches. But they're essentially trained exactly like language models. A very interesting area of research I came out or resurfaced this year, I guess it is, a little older, was 3D Gaussian splatting. And this competes with neural rendering models. And so what's interesting here is that they are able to learn a 3D scene from a collection of images. And it is now possible to render that scene from a different camera view at really high speeds. Another very exciting area is combining nerves with generative AI. And I guess there's a lot of applications that not only require the creativity that Gen AI can offer, but require the geometry that neural rendering can offer. So combining both worlds can produce really interesting results. Another model that has been extremely popular this year was segment anything. This is a promptable segmentation model that can generalize really well out of domain images. So it's been tested on 23 out of domain image data sets. And it outperforms state of the art in more than 70% cases. And it has an Apache 2 commercial license and is generally one of the best segmentation models available today. Another computer vision model that has been released this year was Dyno V2. So this model produces image features that can be used to perform a wide variety of tasks. So classification, which is an image level task, like what's in the image to a pixel level, so segmentation of images. And it was the first time that a self-supervised model provided features that are so good that they're comparable with weekly supervised approaches. Switching a little bit domains, Med Palm 2 and Med Palm Multimodal, these are models that can pass. They get a passing score on the US Medical Licensing Examination. And there was this study on 1,000 consumer medical questions where these model's answers were preferred over a physician's answer by a panel of physicians across nine axes. So to wrap up the research part of the presentation, I guess what's really concerning here is that a lot of this work comes from very few places. And I think it's worth being aware that this work mostly comes from the US and a couple of big companies. So how to open up research to other places would be probably worth looking into. This is no surprise to people in this room, but NVIDIA had blow-up earnings and entered the $1 trillion market cap club. NVIDIA GPUs are used more than any other alternatives in research papers. So 31 times more than FPGAs and 150 times more than TPUs. And they have a very long lifetime value. So if you look at the most used GPU, the V100, this model was this GPU was released in 2017. And not only is the most commonly used chip in research, but it peaked in 2022, which means that, for instance, the A100 which was released in 2020 could peak in 2026. And this means that NVIDIA GPUs will probably hear for the next decade. AirStreet also tracks the compute from private and public companies. This is information that is publicly available. A lot of companies are seeing a hit from the release of chat GPT. This is just one example. There's plenty others. But for instance, on the day that when chat GPT got launched, KEG, a company that focuses on improving learning, has seen an immediate hit. But that's not to say that these companies are not adapting. So a lot of them are now pivoting to using AI. And for instance, KEG is now partnered with Scale AI to build LLMs. Something that's worth being aware of is that a lot of these applications, like chat GPT runway, they actually suffer from a low median retention. So that would be 42% median compared to 63% when looking at other web applications. So I'm thinking of how to retain the active users. I think it's still an open problem. So hugging face is seeing a significant momentum, or has seen a significant momentum this year in its attempt to keep AI models and data sets accessible to all. And over 1,300 models have been submitted to their leaderboard. And they had more than 600 million downloads in August 2023. And this is probably going to keep increasing. Something that I guess it's interesting to see is that a lot of the people who worked on the Transformers paper, I now have started their own startups. And they have raised collectively significant amount of capital. Gen AI investments are increasing. So without capital pouring into Gen AI, investments would have suffered. AI investments would have suffered a 40% drop compared to last year. And there's been 16 billion invested in Gen AI alone. However, a lot of these could actually be going to compute. So these rounds, particularly the ones raised by foundation models or frontier models, they require significant amounts of compute. So it's possible that they are trading equity for compute. Finally, the last section of the report is on politics. Although for years, there has been a divergence in regulatory approaches. This year, these approaches are kind of starting to stabilize and settling into three distinct ways. So one would be relying on existing laws and regulations. Others are looking into introducing AI-specific legislative framework. Like a big example here is the EU Act. And other countries are completely banning specific services. While governments are still making up their minds on regulation, a lot of the large labs are stepping in. So they are proposing regulation. And for instance, they're meeting with policymakers. And they started a frontier model forum where they kind of figure out what the best approach is for regulation going forward. However, there have been concerns that a lot of these come from large players. And it's possible that they will control regulation in a way that favors them. So it's important for, I guess, smaller players and the open source community to speak up about these issues. There is an open debate about how to prevent misuse by bad actors on large language models. So I guess open source LLMs level the playing field for research and enterprises. But they come with the risk of proliferation while closed source offer more security and control but less transparency. I think I won't have time to go into this. But it is unclear how to prevent malicious use. And it's unclear how to enforce this. But we haven't yet seen a large scale proliferation of models tuned for misuse. Something really interesting in this space is that two years ago there were very few people working on safety and AI alignment. So this is just research from the state of AI team looking at how many people at each organization focus on AI alignment in 2021. That wasn't that many people. But fast forward to this year, there's been a lot of change in public discourse and a lot of scary titles in the media. So definitely the AI safety debate got a lot of attention from figures in the government, from lawmakers. And I guess this just goes to say that AI is more political than it was in the previous years. And we also have some predictions. These ones, these are the ones from last year. And some of them came true. Some of them did not. And I won't go into them. These are the ones for this year. And some of them are already looking like they will come true. And I think that's it. This is all I had for you today. Thank you. Thank you. Yeah, great report. I appreciate when people put out good knowledge. So maybe the next round of LLMs will read it and give us advice. So you don't have to answer questions for me. There's so many layers to this. There were so many layers before LLMs. And there's so many layers now. And where would you say, in your experience and perspective, is the highest growth market? Which layer are we talking about? Are we talking about apps, SaaS applications, consumer prosumer applications? Or is it the engineering frameworks? Or is it down to the infrastructure vector databases? Or is it TPU GPU? Yeah. I believe it's a de-application layer that we've seen the most. I guess that's also the easiest to build. So for instance, an infrastructure layer is possible that there's not a lot of people that would be able to do this. So for instance, building on foundation models or building on models that just generally need a lot of compute. Realistically, there will be very few people that are able to do that. So I think most of what we're seeing is at the application layer. And this is very exciting as well. Hello. Thanks a lot for your talk. I have a question. So you talked about the context length being the new race, right? What do you think about retrologmental generations of future of it? So there is this battle in some speculation with retrologmented versus fine-suning? Like, why do we provide all this context? Should we just fine-sune our models with our data and not rely on the context that much? So do you think retrologmental generation is here to stay? Or do you think it is actually threatened by everyone just having their own fine-sune model? Yes. I'm actually not sure on this topic. So I think it's likely that not a lot of people will be. It's likely that it's easier to stay, but I'm answering this with very low confidence. It's not my area of expertise. Yeah, I do think both are here to stay. Like, fine-suning will also be, I mean, we'll see more and more people fine-suning their models for their particular applications. So that's not going to go away any time soon. Yeah. Or accuracy versus speed issue, I think it is a speed issue. It is much quicker to add new knowledge and improve your model with instruction-tuning with RAG than it is if you're going to be going to retrain it like a frontier model with fine-tuning. And yeah, it may be down to $500, but if you would say like there's a million customers and you want to train a million frontier models for each customer versus having data in a vector database, you're always going to choose a vector database. Until it gets so cheap that the vector database is more expensive, I don't think it's going to be an issue. So just following up with that, didn't you mention in one of your slides that you get the biggest benefit with fine-tuning? You had two components. You get the biggest benefit in fine-tuning, but fine-tuning is a lot cheaper than using a RAG approach? No, I don't think I had anything on RAG in fine-tuning, actually, in this presentation. The other question I had was that you mentioned there's a break in the massive amounts of profits for NVIDIA. I didn't quite understand that. The evidence was that one is going towards equity versus debt. What does that mean? No, I think so to clarify that slide, it was more to say that a lot of the companies that are raising, I think it was this slide. Yeah, this turned my policy break. I think to put it simply, a lot of the companies that raise a lot of money are paying these money to NVIDIA in order to train their models. So essentially, they just need the compute. So they're selling their equity in order to be able to have access to this compute. Right, but they're trading off equity for debt. So does that mean debt is cheaper to them than equity, or does equity go off? Oh, no, so that's a different. So that was the first statement. And the second one is that it's possible that that's not the best way to do things. And something else that came out this year, a new trend. So the trend might finally see a break. So CoreWeave raised $2.3 billion debt facility instead of equity to buy GPUs. So it's unclear which way is better, but the second one is a new way of doing things. I see. They rather bore the money to pay for the GPUs, which is less risk for NVIDIA. Because they're not risking on equity, they might get nothing out of it. Yeah. Thank you for the report. It's very comprehensive. I'm just curious. I know, standard bodies such as ISO and IEEE, they are also working on AI stuff. Do you see the standard bodies play a role in this, or not at all? Which standard bodies? Do you mean the ISO? Like SC42, IEEE, they have all kinds of AI forums as well. And I'm just curious, if you see them playing a role in AI development? I'm not sure, to be honest. Yeah, I don't know the impact of, yeah. I don't want to answer the question incorrectly.