 Awesome everyone. Well good morning and thanks very much for coming to our session today. Call the session something that we like to say a lot at our company which is that the future is fine-tuned. I actually think that this narrative, you know, we started saying this in February and March of this year and I think at the time that we started saying it was less and less clear if that was actually going to be the future. But now as we talk to more organizations and just even industry leaders, I think there is actually, you know, this agreement amongst people that smaller task-specific models are going to be a really large part of this upcoming AI wave and what I want to be able to talk about today is a little bit about why do we think those smaller fine-tuned models are really going to be the compelling area for us to be able to actually productionize and realize value from generative AI and then also how do we pull up that future if we're going to have a lot of smaller task-specific models. How do you want to do that fine-tuning and serving to actually be able to use that inside of an organization? So I just want to start off by at least talking a little bit about how we think about the LLM revolution, which is, I think, you know, we often consider the LLM revolution to actually just be the revolution of pre-trained deep learning models. So if you think about what I think captured a lot of interest from most of the, most users and a lot of folks in the market today, it was GPT-3 and GPT-4 and also the chat interface put in front of both of these that started to show that if you took some of the model architectures that existed since 2017 or even earlier and scaled them up by two or three orders of magnitude, you started to see some really interesting new generative capabilities and also the ability to use these models sort of few shot or single shot right out of the box. But the underlying technology that kind of made these models really powerful, I think, has existed for a little over a decade. And one of the workflows that we see really common is actually being able to use a pre-trained deep learning model that's smaller and task-specific and just customizing and fine-tuning towards what your data actually is. And so whereas a lot of the really large models, 1 trillion above and GPT-4, have captured a lot of the interest because of their generalizability and their breath in terms of what they can accomplish, our favorite customer quote, and at this point I think many folks have heard variants of this, our favorite customer quote is that generalized intelligence might be great, but I don't need my point of sale system to recite French poetry. So the most common thing that I think we hear in this intuition where people really nod their heads is that it's amazing that GPT-4 can do everything from write me a college essay to generate SQL, but usually I just need a very small sliver of that model to be able to solve my individual task. And the question is, how do we actually get to that little bit of the model that will solve the last mile problem that I actually have inside of an organization to be able to use AI more effectively? And we started to see this sentiment more and more reflected amongst a lot of the organizations that we work with. There's this idea, I would say 70% of the customer conversations I have today that have used gen AI in some way go something like this. Yeah, we built an open AI prototype and we've maybe even gone to production with it, but we want to be able to move off of it in the next six months. And there's a few different reasons for that. Open AI is incredibly easy to be able to get started. Because of that generalized model, you can start to use your English language prompts and get back responses, probably most folks who here in the room has played around with GPT-3.5 or GPT-4. Pretty much 100% of folks have had an opportunity now to be able to actually start to use this, but when we actually talk with organizations about what does this look like going to look like in production, we hear a few things consistently. It is great for the prototype, but when I actually think about putting this in production at scale, it's going to get really expensive. It was great for the prototype, but that latency and the availability is going to kill me. Or finally, hey, our strategy for what we're going to be as a business is going to have AI embedded in it. And so we're not comfortable deferring all of our IP to a third party necessarily, and not having control over the model and the weights. And so where we see a lot of the future actually being applicable is a series of smaller task specific models that are really actually right size to the individual thing that you're looking to do. So if you want to determine customer sentiment, you probably don't want to use one trillion parameter model in order to be able to do that. You're better off with a fine-tuned BERT. If you want to be able to do customer support automation, perhaps it's more of a model from Astral. And any of the different model gardens that you want to be able to use in order to be able to actually solve what your individual task is at the level and scale that your task requires. And of course, the benefits of these smaller task specific models is going to be number one ownership. So your model and your weights, so you actually are able to kind of have end-to-end agency of what you're building, but also a lot, but also efficiency. So these models are smaller and faster, and you're able to better control your outputs in terms of what you're actually looking at being able to generate. We've just seen this empirically now time and time again. So these are a series of benchmarks that we ran, but of course, increasingly, you can validate any of these on your own internal data, especially for a structured generation task. So what we did was we were working with an organization that was looking to generate JSON. So essentially just some structured configurations, they could plug into their organizational system to automate workflows. And what we found was, you know, open source models out of the box didn't do fantastic on this problem. They got about 33% coverage. GPT-4 out of the box got about 66% coverage. Maybe good enough to be able to show your boss that one cherry-picked demo, but never good enough to actually make it into prod. By fine-tuning on a set of examples, we were actually able to get significantly better performance with a much, much smaller model, Llama 2.7 billion, and then a little bit better performance as we scale that up in order of 9.2 to Llama 2.70 billion. The really interesting thing to me here was that here we started to get more to production grade quality in terms of what the model is being able to output, but at a significantly smaller footprint than what models here usually are consuming. So a single Llama 2.7 billion model can be served on an A10G or even a quantized version on a T4. And so this model is not only performing a lot more better from an accuracy standpoint, but it's actually getting you results at a latency that's going to be lower, and then the GPU footprint that's going to cost the organization a lot less. And so as a result, we've seen a number of these types of examples where you right-size the model, you train it on task-specific data, and it's just significantly better at doing that individual task. And if you believe me so far that the future is fine-tuned, and you're working at any sort of organization where you think that this might be an interesting way for your organization to apply AI and machine learning, I think the next series of challenges is great, you've told me that we need to have many task-specific fine-tuned machine learning models inside my org, but that's actually relatively hard to be able to pull off in any sort of systematic way. And in our experience, there's really kind of three key challenges, maybe you could distill them down into two key challenges of what that actually looks like. The first is training is complex. So at the time at which you've already collected your data, which can be a challenge depending on what you have already, but at the time at which you've collected your data, you now need to get into this idea of being able to do distributed training across large models that really are GPU hungry. And we see users run into, I just feel like I see users essentially hit every branch on the way down from out of memory errors towards, you know, poor GPU utilization and others in this particular class of problems. And then the second issue that I think people run into is amazing. We're going to go ahead and have many task-specific models. Now I need to be able to serve all of those. And maybe I have many task-specific models each for my own individual task. And if each of these models requires an expensive GPU and each of those GPUs are now really serving production traffic, somebody is going to be, you know, not happy looking at a bill. And so how do I actually look at being able to address this challenge of being able to serve many deep learning models and many large language models in particular. And then finally, I think everyone probably in this room is probably at this talk for the same reason too, which is how do you keep up with what is kind of the latest industry? How do you keep up with, you know, what folks are publishing from research in terms of the best ways to be able to do training, to do serving, and you have to constantly update it seems like there's a new model and a new approach that comes out every week. Being able to stay on that bleeding edge I think is, you know, the final challenge that we see a lot of organizations face. So I wanted to be able to start off by just talking about, you know, the reality of fine-tuning and training a lot today, which is that I think that if organizations kind of buy into this idea of fine tuning, we relatively consistently see them running into a series of repeatable challenges. So the very basics of it might seem relatively easy. So you pull off a model from Huggingface and you're going to write maybe some PyTorch or some of the code in order to be able to actually do the fine-tuning for this model. But distributed training, especially over large data sets and over larger models that are GPU hungry can be quite complicated. Number one is, has anyone here like, yeah, or who here feels like they have really easy access to a lot of high-end GPUs for, for model training for LLMs? I know at least one person in the audience is from NVIDIA. So I expected one hand to be up, but no hands up. So the first problem you're going to run into is do I even have the high-end GPUs to be able to do all of this model training out of the box? The second problem you're going to run into is it's actually, you know, you're going to be struggling to now get like really good utilization and use all those best practices and frameworks for being able to use that GPU that you've paid a lot of money to some black market provider to be able to offer you. And then you're going to hit, you know, the constant training failures along the way. And so this is actually an area where our team has been invested in the open source as a project that's part of the Linux Foundation to simplify training with a declarative machine learning framework we call Ludwig. Ludwig really came out of my co-founder's experience as an ML researcher at Uber. Basically, he was the person that was tasked at solving many different ML models for that organization from fraud prediction to rideshare ETA to helping us recommend all that good food that might exist on Uber Eats. And what he found was he was reproducing his training jobs and his code every time and starting from scratch. And now my co-founder peer likes to say he's a lazy data scientist. He doesn't want to reinvent the wheel every single time. So what he did was he came up with this declarative abstraction, this simple set of YAML configurations that would allow you to train any deep learning model and set up your N10 pipelines all the way from data pre-processing through the training loop and post-processing on the other side. This made it really easy to start. So you could build your first machine learning model, like essentially in just, you know, six lines of config, you specify your inputs and your outputs. And you basically get a prefab model out of the box. But the really cool thing about this prefab model pipeline is you can customize any part of it that you want. So if you wanted to change one part of the model pipeline to use, you know, when I wrote this slide two years ago, the large language models at the time were BERT and these other variants, you could go ahead and essentially substitute that in with a single line of a configuration change. You can customize your training loop, not without having to really write any low-level code, but just controlling what you want and allowing the system to automate the rest. And this same framework today now is heavily used for users that are looking to be able to do generative AI. And so in particular, one of the most common use cases for Ludwig now over the last six months has been fine-tuning open-source large language models. And it's just as easy as adding, in this case, you know, the model type is an LLM and the name of the model that you want to be able to use. But the cool thing is Ludwig actually packs in a lot of functionality that allows you to declaratively fine-tune LLMs. So you can choose your base model and your trainer as you might expect. And so these are all like the basic parameters that you'd set up in order to be able to run off your training job. Ends up being about eight to ten lines, you know, that you might want to be able to specify. But you can get kind of a fine-grained granular level of control and a lot of the best practices that you, you know, consistently hear about in industry without actually having to, I think, you know, at least how I think about it, I'm able to use deep speed without having to figure out how to use deep speed. And I think that applies to like every individual framework that's cutting up in kind of these areas, whether it's flash attention or any of these other techniques, you get those kind of best practices and advantages out of the box. And you can just use, you can just configure which ones you may want to use and which ones you may not want to use. And of course, you know, as model training has adapted with LLMs, no pun intended, parameter-efficient fine-tuning has become a really popular way that people are actually doing their fine-tuning jobs overall. And so if you ever wanted to use LORA or ADA LORA, QLORA, prom-tuning any of these different techniques, you'd be able to do so as well just through a series of small changes in your configuration. And so what Ludwig really is oriented around from our open source training package is make it very easy for anyone that wants to be able to fine-tune their own LLM by just having a few lines of configuration. And you can think of the headline for Ludwig as control what you want and then let the system automate the rest. Now, if we've succeeded with Ludwig, we've made it really easy for you to train a lot of different models. And if you've made it really easy for you to be able to train a lot of different models, then somebody inside of your organization is probably going to be thinking about the fact that you might end up wanting to deploy many models overall. And there's a lot of use cases actually why you might want to be able to deploy many LLMs. We work with one organization, for example, that does customer service automation, and so they have many customers that they actually end up servicing. And they don't want to share data across their customers, so they want to fine-tune a model per customer. You see the same thing with people that are doing game development on others, where they want to fine-tune a model per character. But if you have thousands of customers and you're going to then have, you know, thousands of fine-tuned large language models, somebody in your organization is probably going to look like that when they're thinking about the fact that you need a thousand GPUs if you want to be able to serve each of these models naively, essentially one per GPU. And so we actually ran into the same problem ourselves. We have a free trial that we offer where anyone can go ahead and start off training and serving a large language model. And that was my face when I figured out, you know, how expensive it could be if anyone could come in on for free trial and be able to serve their own LLM. If you think about how much will this cost in the cloud, well, mostly LLMs that are, you know, in the 7 to 13 billion range, we will serve off an A10G. And so on AWS, this might be something like, you know, $1.20 an hour, which if you think about, you know, the cost per month of these LLMs, if you scale up to a larger number of fine-tuned models, it gets out of hand very, very quickly. Especially, you know, this is going from like 1 to 32, but we spoke with one organization that has half a million LoRa adapters out there, right? And so that seems like it would be just extremely expensive in order to be able to do at any sort of reasonable scale. And so I think what we started to think about here was, how do we take advantage of the fact that the way that models are fine-tuned today actually looks a little bit different than what maybe conventional fine-tuning will look like. And so especially with LoRa adaptation or parameter-efficient fine-tuning techniques, you're not necessarily fine-tuning the whole base model, which here is represented through the yellow bars, but really you're fine-tuning a small series of adapter weights, maybe that are comprised much less than 10% of the overall model. And so if you're fine-tuning a model for user one, the base model maybe stays 90% plus consistent and then there's a small fraction of that model that's changing in terms of the adapter weights. Now the way that most organizations deploy their LLMs is they all say, okay, user one has a fine-tuned model and so I'm going to deploy a replica for user one and then user two is a different fine-tuned model, so I'm going to deploy a replica for user two. And you can see that there's a lot of overlap between the fine-tuned models that exist for user one and user two, but if you're going to deploy each of these individually on a new GPU, that's going to get really prohibitively expensive instead. So what we came up with is honestly probably, my favorite part about it is maybe the branding, but what we came up with is something LoRa Exchange. And our key intuition was, or our key goal was, how do we allow you to serve many LLMs? So in our benchmarking test, usually about 128 LLMs for the cost of a single base model deployment. And maybe based on the color highlighting that we showed in the previous slide, you will be able to guess a bit at the intuition that we applied, but the key idea was something that in this very small text you'll see in the bottom right was called dynamic adapter loading and then our marketing guy was like, we can't go out with dynamic adapter loading, we need something more clever, so we call it LoRaX for LoRa Exchange, but really it's the idea that we'll go ahead and substitute in the most relevant adapters for that fine-tuned model on top of a single base model deployment. So as different users come in, they're all querying the same base model, but what you're doing is substituting in the adapter that actually is the one that's relevant for their individual use case. And so LoRaX is also open source. We just open sourced it about two or three weeks ago, and so if anyone wants to be able to try out either of these two projects, you can think about Ludwig as the easiest way to be able to declaratively train any of your deep learning models, and then LoRaX as the way to be able to serve many instances of in particular parameter-efficiently trained deep learning models on top of a single instance. And so each of these are open source and you're able to try them out directly. And inside of our company, PradaBase, what we've essentially done is just build an N10 platform on top of Ludwig and LoRaX. It's free to try out. And so I'll just show you a little bit of how these technologies actually work today through a very quick demo that also demonstrates what fine-tuning looks like and how to be able to get started. Cool. So I'm going to go over to the PradaBase homepage. You can get access to this if you go to PradaBase.com and go through the free trial flow. But when you get started on the PradaBase homepage, what you'll be able to see is we have a couple of getting started and quick start guides right out of the box. So if you wanted to do something like fine-tune a Lama 2.7 billion, you can actually fine-tune Lama 2.13 billion on a free trial as well. You could start to do that. Or you could learn a little bit about how to train a much smaller, custom supervised machine learning model. One of the really nice things about being built on top of Ludwig is that you have that single interface to any of the different deep learning models that might exist. The first thing you do, of course, in any good machine learning job is you connect some interesting data. So the one that I'll go ahead and use for our demo today is called the Consumer Complaints Data Set. So this data set is highly exciting. It's collected by the FCC and essentially is these massive blocks of text that are coming from consumers, U.S. consumers, that are complaining about different financial services products and practices by different financial services companies. And so in this particular task, there's a few different things that we might want to do. The first is we might want to classify what product an issue is this consumer complaining about. So they're consuming a debt collection or communication tactics. This is what the FCC does. Basically, they look at all these incoming complaints and they spend some time just manually curating and classifying what the complaint was about and what product it was about. But we thought what would be kind of fun is if you could not only classify product an issue, but what if you could also get the company's response? And so what if you could write an email back to this customer? And so what we did was we just prompted GPT-4 to be able to come up with a bunch of sample emails because the original dataset didn't contain them. And our goal was to see can we fine tune a much smaller task-specific model, in this case, Lama 2.7 billion, to be able to do both of these tasks, to be able to come up with both the product and the issue that a user is complaining about, as well as a sample email that could be used as the template for what we're going to reply to this customer back with. And so that was sort of the setup for our problem and what we started to think about as being one way to be able to demonstrate the power of fine-tuning. And I want to just show you a little bit of what the results really look like by contrasting our fine-tuned and base models. So inside of our query tab, which is going to reload, inside of our query tab, you can query many different open-source models right out of the box. So if you want to query Lama 2 or code Lama and Mistral, many different open-source models are pre-deployed for you, but you can deploy any open-source model that you might be interested in. And just to save ourselves some time, I actually have a model here and a query that I've written with this prompt for what I want to be able to do. So in this prompt, I say you're a support agent for a public financial company and the customers raise the complaint. Generate a structured JSON output with the product issue and the generated company response. And then I give the complaint that I want the user to be able to talk about. So here the complaint is talking about hand receiving multiple calls a day regarding a debt. And so this is what the complaint is about. And so we prompted Lama 2.7 billion to be able to actually address this question. And you'll see that Lama 2.7 billion did not do that well. If anyone's played around with open-source models and tried to get them to do a pretty custom task, like classify and respond to this email, you might see some really weird behavior. So we see models repeat themselves over and over again. In this particular case, the model basically printed out a series of Xs. Correctly identified, it's supposed to come up with these fields, but just printed out a series of Xs, and then a whole lot of new line characters, which I've also seen done a couple of times. Instead, what I'm going to do is I'm going to prompt it with my Lama 2.7 billion complaints model. So I'm going to show you how I generated this in a second. But this Lama 2.7 billion complaints model is a model that's specifically fine-tuned on the exact dataset that I showed you. So it's fine-tuned in about 1,500 rows in total. And what we'll see is that now my Lama 2.7 billion model that's fine-tuned does, well, significantly better. And so it correctly identifies that the product is about debt collection, the issue is about attempts to collect debt not owed. And we don't need to read this entire email on a small screen, but we can probably tell that it's a reasonable start towards an email, at least significantly better than four Xs. And so that's a large part of the experience that we think about inside our Prada base, which is how do you connect to data? How do you build these models to clarity with Ludwig? And then this model, Lama 2.7 billion complaints, is just served with Lorax on top of the original Lama 2.7 billion model. So I didn't have to do two different deployments to have my fine-tuned model and my base model. And if I fine-tuned 10 different models, I don't have to do 10 different deployments either. All of them essentially just get served automatically directly through Lorax. Let me show you one last thing, which is how this model got trained. And then I'll be able to pause just for any questions, overall. So I'll take you a little bit under the hood in terms of what actually it took to be able to train this model. And luckily it wasn't that much. So here we have this concept called a model repository. And here this is my consumer complaints response generation model repo. A model repository is essentially a collection of all the different model versions that I've trained over time. And I can see things like the lineage to see who trained a model at what time and what were the ones that were succeeded and were canceled and which ones failed. Now training and building your first model version is actually really easy. So you just click, I want to create a new model version. You'll say I want to fine tune a large language model and then connect to your data set and provide a little bit of information like the prompt template that you want to do and which model you want to be able to fine tune. So out of the box, we allow you to fine tune any open source model and we have particularly good support for any of the llama variants. When I say particularly good support, I just mean heavily kind of tested over all those. And you have a number of training templates you can use out of the box. So if you want to use Loret or any quantized model training, all that this does is populate that underlying open source config you saw earlier. So when I talked about that declarative aspect that allowed you to control everything in this series of small yaml files, that's really what's happening under the hood here. And so what we're doing in terms of this platform is helping you be able to populate this with what we think are the right and best practices. And then we right size the compute that we think is most effective for your task and what's available in your zone. So right now this is actually just me in the free trial and we have access to a number of T4s. And so this is going to go ahead and train on four T4s. Once this model actually trains, you'll see it trained for some amount of time. You'll be able to see this model actually took about an hour and a half to train overall. You can see all the metrics that users love to be able to see like loss curves and metrics that I think, like perplexity and character error rates and others. And then finally, when you're ready to actually be able to deploy this model with Lorax, you can just simply deploy it and it'll be available in this query editor for you to be able to query just like I was doing previously with the Llama 2.7 billion complaints model. So that's a lot of like, you know, how we think about the future being fine-tuned and what it means to be able to go from pre-trained deep learning model, make it easy to be able to train that model declaratively. And then when you have a model or several series of models, be able to efficiently serve those fine-tuned models on top of a single base one using Lorax. And so that's a lot of what I wanted to be able to introduce. Here today, if you want to be able to try any of these capabilities directly for yourself in the open source, you can, let me actually mix this. If you want to be able to try any of these capabilities directly in the open source, you can try model training with Ludwig on the left and Lorax for serving on the right. And if you want to just try all of them packaged together in one source that has that UI and infrastructure handled for you, then you can go ahead and get started directly in a free trial as well with Predebase. So that's a little bit of what I wanted to be able to introduce and why we think that the future is going to be fine-tuned and what we're trying to do in order to be able to pull that forward future a little bit more forward. I think I have time for a few questions in case there were any that came up throughout the talk. Thank you. Yeah, there's one over there. Oh yeah, 100%. Yeah, so like, you know, we would always, the main thing I would say we sell is actually like model ownership. And so, yeah, you have access to be able to export the models if you'd like. You can take them out wherever. You can even run Predebase inside of your VPC. So definitely I think ownership is a key aspect of it. Totally. Yeah, you can think of Predebase's primary value here being allowing you to train and serve open source large language models. And so with RAG, you can, RAG will basically be a way to be able to augment the knowledge of any sort of LLM. And so if you want to use any open source model with a RAG solution, we have a few tutorials of how you can use a Predebase hosted model with something like a Llama index or a Predebase open source model with other like vector DB solutions. So we provide the model backbones. Yeah, it's a quick question about your charging model. So for fine tuned models, it sounds like you actually do a charge per token model instead of charge per like holding onto that GPU, which is obviously a big benefit to those that don't have enough traffic to fully utilize or saturate a GPU. Yeah, exactly. So thanks for the plug. We actually do both. So by default, we offer the ability for users to be able to charge per token. And like you said, that's the easiest way usually to get started. And we actually have the most cost effective solution for serving fine tuned models. So basically it's the cheapest place to be able to serve a fine tuned model because all of them get served off top of our single base model implementations. If you do want a dedicated deployment though, so you want to be able to serve your own individual instance on top of your own GPU, we support that too. And we have integrations with GPU cloud providers. And so something we support for some of our customers, especially as they get into production. Yeah, oh yeah, sorry. Yeah, such a good question. And definitely something that I was very concerned about. And the short answer is actually an extremely negligible amount. And so we have some, if you go to our blog posts, we have a, when we open source Lorex, we open source Lorex about three weeks ago, we put together some blogs about our throughput and latency. And the latency and how it scales with a number of fine tuned models, aka fine tuned adapters that you're going to have. And it really adds actually very negligible amount of latency that you might see on any individual model, especially compared to the cost of generation in the first place. And so luckily we've actually seen the adapter switching cost, especially as you get some 100 adapters to be really low. How was the one question back there? How do you evaluate the quality of the fine tuned model? Yeah, great question. And I would say definitely get asked that every single time that I think about this presentation. From our perspective, what we really look to provide is two things, model training and serving. As part of training, I think that there's two ways you can think about model evaluation. Number one is there's a series of metrics people look at. So is my loss curve decreasing? And then generative specific metrics like character error rate and perplexity. And you get all of those reported in kind of a conventional machine learning way across training and test splits that we create for you. But there's a part of this which I think requires that model evaluation today often requires some subjective evaluation either over a held out set that you have or through human radar evaluations. And there I think it just is required to be able to easily deploy many fine tuned models and be able to directly see what the outputs are and test them in batch. And so both of those are just easily possible through our query playground where you can be able to deploy a fine tuned model and then test it. So the way that most of our users work is they first sanity check the metrics and then they actually test out the models and see how well they're doing on any of the traffic they expect to be realistic. Yeah, there's a question in the back. Yes. Yeah. Very short answer to just be us. And do you use any sort of an inference server to do the inference calls? If yes, what kind? Yeah, so we essentially did an early fork of what Huggingface was doing with text generation inference. And so we're using our version of that in order to be able to do this. So we did not we open source Lorax is something we built on top of Huggingface TGI. I think the fork of Huggingface is TGI open source. I don't think we've made all components of that open source just yet. Something to look into though. One question. Like a dizzying rate. So I was wondering how fast are you going to adapt to that? And how are you basically navigating that space? Because you're building this. But how are you going to keep pace with all the developments? Yeah, if you've seen that meme of the person like building the tracks of the train is going like I certainly say it feels a lot like that. You know, sometimes the very short answer is in our configuration, you can pass in a path to any open source model any Huggingface model. So if something comes out, you can actually go ahead and train it right away. So the main impact I'd say on our side is just have we tested all of those models before? And so that tends to be like where we have like a whitelist of models that we say we've tested all of these and we know they work. But users in our trial test out models that we've never seen before. And they just are able to pass it through a single config. So an extension to that would be what let's say there is an hugging fit like tomorrow there's a Huggingface alternative, right? Which is also open source. Do you have a capability for us to like upload our own model to it? Like are the checkpoint files that we have and then use your service to train it or something? Because we don't have the GPU, let's say. Good question. So you can definitely train your own model inside of ProtoBase. So if you didn't have a GPU but you wanted to provide the config for your training job you can train that inside of it for sure. If you have your own trained model and you want to be able to pull it into the ProtoBase today we allow you to be able to pull that model in through like through Huggingface Hub. And then the question is if there was the next Huggingface how do we enable that? And I think that's certainly something we'd be open to. We just we don't have that natively built in yet mostly because all of our models do come from like all of our users come from Huggingface. But if that next platform got kicked off which I heard some rumors pretty recently that might happen then I think we would certainly look at integrating it here. Thank you. In gen AI. Yeah, definitely. I mean the thing I would say I'm most excited about is it felt like 2023 was the year that everyone kind of discovered GPT 3.5 and 4 APIs and endpoints. And so what I'm excited about next is like people going I'm just about people going smaller like in models. And so what I think I'm excited about what ProtoBase is doing is right now we offer like the fine tuning solution. You know I think what's coming up on our roadmap is going to allow people to be able to pre-train and trade from scratch to their own smaller variants of models and then also be able to expand to other modalities. So right now we're very focused on text models but you know our open source in Ludwig has always supported images and other formats and so I'd love to be able to CSD multimodal. Hey, so hey right here. Yeah, oh sorry. Obviously customers have to follow your best practices in order to get the most out of your solution. How can you show customers that you've had an effect that they want and they're you know that they should still be using you to make sure that they're getting value? Yeah, definitely. I would say the effect that they want comes in two places. One of them is that there's a quality improvement over their base models. And so there I think we make it very easy for them to be able to compare kind of what I was showing just switching like you know the two tabs between the base model and the fine-tuned model so you can see what the impact is over there. But then the second is that on an ongoing basis that they're getting their money's worth overall and that this is cost effective. And the honest answer is today we like put together for like our customers that are doing larger dedicated deployments. We actually just put together like a spreadsheet that says this is what it would cost and this is what you know the costs on Prada base. But we're trying to distill that into an actual pricing calculator that would go on our website so it's not you know our poor PM doing it every single day. But that's kind of like the way that we think about our ongoing benefit is on cost. Yeah, this is really exciting. On my team we've been fine-tuning GPT35 a lot. We have dozens and dozens of models. This is really exciting because we it's an easy way to do that with the open source models. However, I've been doing a little bit of quick math. It seems like running inference on fine-tuned models like the ones you have in your website are like about 10 times cheaper than GPT35. But is the accuracy I mean the three five models are incredible when fine-tuned. And how does that compare with what you can get fine-tuning these things? What kind of task are you fine-tuning for? Programming language creation. Got it, yeah. So I would say you know the accuracy these open source models it's really task dependent. So if you're fine-tuning a model that had to be really general purpose almost kind of goes you know anti fine-tuning as a pattern. You wanted something that could write amazing French poetry and German literature. I would say I don't know how great an open source model is going to do necessarily to the box or put another way you need a lot of label data. But the more and more granular and specific your task gets the better we see performance from fine-tuned models especially in the open source. And actually I think code generation is an excellent example of a model in which open source does really well when fine-tuned and even actually out of the box. So we have two variants in our playground today CodeLama 7 billion, CodeLama 34 billion. I bet you could test them out if you have that data set you've all been doing on GPT 3.5 in like a day, an afternoon and see kind of what those empirically look like. But I would be very bullish on code generation use cases and ones that are more specific and granular. It really comes down to how much data do you need and the data quality. But the nice thing is for code there's some really good pre-trained open source backbone models. Awesome, thank you. Yes, sorry, I think I missed your question. Well I know you're specific to LLMs but have you ever thought about just being broader to transformers in general? Such a good question because we're actually not specific to LLMs. However, my talk is very specific to LLMs because now that's what I think everyone is really interested in. Luba came out in 2019. It allows you to be able to train any deep learning model. And so you can go ahead and train RNNs. You can fine-tune BERT, DeBERT vision encoders. You can train an RNN, a CNN, an LSTM, any kind of deep learning model architectures actually support it out of the box. I don't know if it's a hot take but one of my takes definitely is BERT has been the workhorse for NLP tasks for some time. And I actually think that a lot of the tasks that we're doing with large language models can be done by 100 million, 300 million parameter or just a good old fashioned deep learning model. This is just actually one example of a model pipeline by the way that's using more conventional models. So this is DeBERT, it's like a variant of BERT but you can train many different transformer models here. Cool. Well, thank you very much. Everyone's been a great audience. Those were a lot of excellent questions. If you have any further questions, I'll just be sticking around, feel free to ask. But hopefully you get a chance to start hacking either in the open source or their free trial. And if anything breaks, please email me and yell at me and we'll make sure to get it resolved as quickly as possible. See y'all.