 Hey there, welcome to this webinar on why the serverless paradigm is a great fit for AI-powered apps. My name is Sohan Maheshwar, I'm a Dev Advocate at Fermion and today we have Mikkel. Hey Sohan, my name is Mikkel and I also represent Fermion. Yeah, so today we are going to be discussing about the serverless paradigm and why it's great fit for AI-powered apps. And we're going to start right off with a pretty cool demo. We've actually built a sentiment analysis app or a sentiment analyzer. Here we go. Mikkel, do you want to talk through what we've built and then we'll eventually get to how it works and why this is pretty cool. Yeah, definitely. So this sentiment analyzer is, you know, a assessment, a screen, a demonstration of how you can use the serverless AI features that we built into SPIN. SPIN is an open source project for doing a server-side web assembly. I'm going to show you a little bit about how these SPIN applications work and how the AI features within SPIN work. So what I'm showing you right now is this web application that can basically take any sentiment and analyze whether it's a positive or negative or a neutral statement that we have. So let's go ahead and say, hey, this is a cool video you guys are recording. Hey, let's go and analyze that. I hope it says positive. It says positive. There you go. Okay, I'm getting bored because I want to see some code. Let's see if I can spell this right. It's probably pretty forgiven, but let's try that. And okay, that's actually only neutral. Okay, now it's really boring. And maybe we can get a negative sentiment now. It's still neutral. Okay, it's not that bad being bored, I guess. So essentially, this I think is a cool demo where an AI model is actually analyzing the text that Michael was inputting. So today we are going to talk about how this was built, how this was also built using serverless computing with not too many lines of code. So as you saw, maybe we want to actually see some code. So should we just dive straight in? Yeah, let's do that. Let me just go ahead and share another window that I have. So let's go over here. So what I'm showing you here is directory of the actual application behind the, sorry, the code behind the application that you see. And something that you would notice if you know about the spin framework is that we have this spinToml file in here. That is the manifest that we use for spin application. So I said earlier that spin is an application framework using WebAssembly to implement sort of a serverless type of application framework. What that means is that when you use spin, it's really easy for you to break up the functionality you have in your application into small discrete components. And what the spinToml file does is it describes how an application like this works and the components that make up the application. So if you take a look here down line 9, for instance, you can see that we have our first component, which is actually the sentiment analyzer. It's called the sentiment analysis ID here. And what you will see in line 10 is that the sentiment analyzer is implemented as a WebAssembly module. And that is a core feature of how a spin works is that all the code you write and all the components that are being served are server-side WebAssembly. Now we can dive a lot, if you want to dive a lot more into what that means and what the spin framework does around WebAssembly and so on. There are other videos here on the CNCF channel where we've done some more deep dive into spin as a framework and the WebAssembly side of it. I'm not going to spend too much time on that today, but just want to call out some of the features that you see in here. So one thing to notice that we have these multiple set of components in line 19 and line 27, you can see that there are other components that sort of make up this application. Line 27, sorry, is the component that is called UI. You can see that if we get down to line 19, for instance, and the UI is basically a file server that serves some static asset. So we have an HTML, some JavaScript that actually makes up the front end. What you can also see in here is that we have in line 12 the AI models that our sentiment analyzer is using. So in the spin framework, we support two different AI models today. So META's Lama 2 chat model is one of them and the other one is Lama code, which can help you, you know, which have an understanding of programming. I'm going to show a demo a little bit later where we use that model. Finally, we also support creating embeddings, which we will also talk about a little bit later and what that is in this world. So part of defining your applications, you have to tell the spin framework what AI models your application can use. There's another thing in here, which is called the big key value store, and that's basically, you know, a store where I can store some arbitrary data next to a key. Spin has this concept of a default key value store that just, you know, sort of a convenience feature that magically works for you and you can just start storing data and persist that across the request that comes in. So that's a little bit about sort of the application anatomy in here that we have. Last thing that people would notice if they're familiar with Rust is that in line 16, you can see how this component is using cargo build. So it is a component written in Rust. With spin, you can choose to write your components in Rust or TypeScript, JavaScript, Python and Go. And there's a few other languages where we have some good SDK supported. OK, so let's go back and take a look at the actual code in that sentiment analyzer. And as I said, this is code written in Rust. And what I'll start to walk you through is sort of, you know, step by step and you will get the idea of what spin as a framework provides for you here. First of all, there is an SDK that we use in spin, and that provides a set of features around dealing with HTTP requests. The model in spin is an event-driven model that you know from other serverless frameworks, which means that a certain trigger is the one that sort of wakes up a component to go and evaluate whatever data comes in, run the handler for that trigger and then potentially returns the data and do something else. In spin, there are multiple types of triggers you can use. This scenario HTTP request is the trigger type. You can also use monitoring a Redis queue, for instance, as a trigger. And there's a few other triggers that have been contributed as plugins to spin for various queues. I think SQS and MQTT is also being worked on right now. So there's a bunch of HTTP features in here because we receive an HTTP request and we reply back with a response. You can see we have the key value store that I briefly mentioned before and then we have on the large language model interface where we can go and do our inferencing calls. So to give you a quick overview of how the code work, we have this entry function that's called that. It's annotated with this macro HTTP component, which means that the spin framework knows that once an HTTP request hits the application, this is the function that will be called. And you can see that the request comes in as a parameter to the function and it returns a response, an HTTP response. So there's a router implemented saying that if we have a post to a certain URL, we'll go and call that perform sentiment analysis function. If it's anything else, it's just not found or we'll call not found function will return and not found response. So this is the sort of the meat of this component that helps do the sentiment analysis that we have here. Let me just scroll down a bit so we can see some more of this function. So again, this function takes in the HTTP request and it actually, you know, wraps the pulls out the body, which contains the sentence that we want to have analyzed. Then we open the key value store. So hopefully you get this idea that when I use, so when I use the store here, that's a reference to the key value interface. And the reason why we open the key value store is that if you go down here to line 32, you will see that we'll actually try and go and get the sentence from within the key value store. So this is a way where we've implemented a caching mechanism where if the given sentence that is being sent has already been evaluated, there's no need for us to run the large language inferencing again. We can basically just look inside the cache and see what's the response positive, negative and neutral to that particular sentence that was provided. So if it's not the case, well, if you find the sentence in there, we can just return it straight away if we don't find it. We will go, let me see, we're down on line 44. We will actually go and make an inferencing call with this. So we call that infer with option function that is... I actually think this is a function that's implemented in this code. But basically what we do is we provide the model that we want to use for the inferencing and then we provide the prompt that we want to have inferenced. And there's a sort of a predefined sentence here to tell the model to return either positive, negative or neutral based on sort of the idea of it being a part and so on. And just to confirm the language models that we're using here are also open source, correct? Correct. Yeah, the middle language models are open source delamate models. The easiest way to get the models is to go to Hawking phase and then you'll have to agree to some terms of services of using these models and you will eventually get access and can download these models. Now, you can run all this locally with spin as well, but running the large language model on a machine without having a dedicated GPU that is very powerful, it just takes a long time. Like some of these responses would take 15, 30 seconds maybe. So it is really hard to do that. But the way that the examples that I've shown here are hosted in our cloud offering where we have real powerful GPUs available for you to run with. But anywhere where you can get to the GPUs that can run these large language models, you could basically run this. So yeah, that is really what is done in here. And I think part of this whole serverless framework that we have with spin, it's fairly easy to, at least I hope you get the idea that the implementing kind of an application like this, where you want to go and analyze some data, whether it's sentiment or what it is, and you want to use a large language model to do that, and you want to get some response back, but you can implement a thing like a caching like we're doing right here, is a really, really simple function to write using a framework like spin. So if we just head over to the... Let me go back to the application and I want to show you one more thing. I just want to show you how the caching work. So here I'm back with that application. And we can go in and maybe redo the now it's really boring, and we should see that that answers really quickly. Much faster, yeah. Much faster in case of like, you know, we can try a new one again, now it's getting better again. And you can see that, oh, it's actually pretty fast as well with that LLM, but still the caching is working here. And one way of proving... I have a question here, but is there some sort of text matching or is it like a fuzzy search where what you're saying is matched to something similar for the caching? In this instance, we are basically taking the exact string and matching against what's in the key value store. And if you want to get... So we have a way where we can see what is in our key value store. So I can just refresh this and we should be able to see some of the stuff that we put in down here. So this is actually just a way for us to show, you know, the data we have stored in this key value store. And if we go and look at the... It's really boring. We should see that the result here is neutral, right? So this is what we can go and read without having to do the LLM inferencing. So we can see some of these other ones where happy today is going to turn out positive and so on and so forth. Yeah, nice. Yeah, I mean, I hope that excited everyone. So, Michael, we have like the architecture of how spin and the large language model and the key value store sort of work with each other, right? And I think it's important to sort of know that. So maybe you can take us through how all of this works and then we can go to like maybe another cooler demo as well. Yeah, I mean, the diagram you're seeing here has the spin model and the spin application in the middle there. And then we mentioned the key value stores, you know, as sort of a... I mean, we call it a cache, but it is really persisted, right? So if you run the default implementation of key value store with spin on your local machine, we store things in a local SQLite file. But you can hook in, you know, other implementations for key value store. So for instance, we have a, I'm actually sure we call them adapters, but we have a way, a provider, I think we call them a provider, right? We had key value store provider for Redis. So if you wanna use Redis as your backing store for key values, you can do that. Basically, with your spin application, you provide something called a runtime config that translate a given name on a key value store into a Redis endpoint. And you know, that's how you would connect those. But the spin SDK and the API where you can open a store, get a key, list keys, delete keys, all that will work seamlessly if you change the backend to it, Redis for instance. So there's another component in here which is called a Know-Ups database. It's just SQL, it's relational database, which means that for a local developer experience, it's still SQLite. And again, I'm actually not sure we have a, I think we have a provider configuration today, so you can point to your own SQLite instance. But again, it's the same concept that applies for that database. The reason why it's called Know-Ups here is because when you run it in the Fermi and Cloud offering, we named it a Know-Ups database because it works serverlessly in the sense that if you, in your application, describe that you need a database, we will just provide a database for you. And that's really convenient. Because I think as a dev, you don't need to think about the implementation of the database, but just the code and how it interacts with the data in the database. So in a sense, there is no operations that you have to do for the database. Exactly, and I think that is really this serverless paradigm that the spin framework tries to take to a level where all the instructions needed for whatever host runtime or host implementation you wanna run your spin application in. You just describe, here's my application and these are the databases and the stores that this application has access to. And then the host implementation, whether it's Fermi and Cloud or somewhere else you wanna run this has to sort of go and resolve these. There's a pretty important point in not only do the declarations in the application manifest define what you need. It also defines what the individual components can access. So for instance, if I want to serve files or I want to have a component call a remote endpoint, they can break out of their little sandbox unless it's specifically defined what endpoints or what files the component can get to. So there's a fairly strong security model behind these WebAssembly components as well that this whole framework provides. Nice, okay. So I'll take that image off the screen for now. I think the demo sort of spoke for itself and we have all of these things, right? But what are the possibilities that this really opens up, right? Like for the average developer like you and me, how can I use LLMs or what can I use LLMs for basically? Yeah, I think, I mean, the example we showed now was basically text analysis, right? So in this case, it was sentiment. I think it's an area that, you know, where this can lead to sort of, you know, augmentation of things happening. I know we as a provider of a cloud offering, we do a lot of surveying our users, trying to figure out, you know, what works well, what doesn't work well, do they have any problems with, you know, completing their tasks and so on and so forth. And it would be awesome, you know, for those types of surveys to really be augmented with, you know, type of sentiments so that, you know, you can have this, you can have a scenario where you can see someone filling out a form, the form calls into a web hook and provides some of this information. We build up a database, where we can either use a SQL option or a key value store to basically say, hey, someone is talking about something over here, maybe this feature, right? And there's a negative sentiment or there's another feature been talked about, there's positive sentiment and start using the large language models to help analyze these. And I think that sort of new opportunities that open up here is that, you, if that's a scenario you wanna implement, right? It's a fairly simple spin application you would have to write. Like you have to write the web hook who takes the data, does the analysis, stalls it in the database and maybe you have a small UI to get the data out again. And, but it's also a scenario where you don't need a GPU 24-7, right? Unless you get a lot of surveys. And I know we for sure do not get that many surveys. I don't think anyone, I don't know who does. But, you know, but in those cases, it might be that you have like, you know, 20, 30 inferencing calls you need to run and doing a day or something like that. And getting to the power of an NLM model without having to, you know, buy hours of GPUs or wait for a GPU to be available in startup time and all of that, a serverless model for this is a really, really good fit for that type of application. Yeah, and I think with LLMs and Generative AI being democratized so much, we're seeing so much more implementation of this into apps, right? And at the same time, doing it servlessly means you're doing it cheaper, potentially faster and also more sustainable as well. Yeah, exactly. Yeah, go ahead. Oh, okay, yeah. No, I just think that there are other scenarios, right? That are interesting to consider and to talk about. I think one thing is definitely productivity. Like I've, you know, a lot of the sample code I write today, I have an LLM model as my constant on pairing partner developing. Any CSS code I write is from an LLM because CSS is so hard to write for me. Yeah, there's another, I mean, but it's really useful, right? Like, you know, you probably want to do some accessibility and have an expert or someone who is, you know, experienced within the programming language you use to go and help, you know, review and make sure the concepts are being used correctly. It becomes idiomatic and so on and so forth. But at least to do something and get going or, you know, it's really, really helpful. So, you know, as a developer, yeah, definitely productivity, I think is, you know, will increase or is increasing with this. Like I think the code pilot is a really, really good example of that, right? Specifically, I also think, you know, not from necessarily only developers, but everyone else who, you know, has a lot of things, information to process or stuff to create, like, you know, we have code samples that can help analyze blog posts, sorry, summarize blog posts for you, right? So, you don't have to read everything, but maybe you can get, you know, the list of morning readings. You get like five summaries or 10 summaries and you pick one that you actually want to go and spend the time reading while the other ones, you sort of just get a little bit of headline and figure out whether that's interesting for you or not. Yeah. So, we can actually show how we build a small code helper with spin as well. So I can go on and share my screen again. And this time we're going to go to this code generator. Actually, there isn't a lot. Well, I think the only thing that would be interesting to see within the code here is, again, this is a simple API that's implemented as a spin component. So it gets an ACP request. And the main difference between the code that we looked at previously, and then the code here is that in this case, we're using a different model. We use the codelab instruct model instead, which means that now we have a model that is specifically trained to help programmers and understand programming languages. So what I can do with this one is I have a small client here called code chat. And we can choose a few languages. So let's, for instance, I actually needed, I actually needed a SQL statement for this to clean up one of the other demos. So we can ask, yeah, let's give me a SQL statement where I need to delete all rows from a given table. And I for once, I cannot remember SQL as well necessarily. So this is one of those where this is pretty helpful for me. If I have one of these running, I'm not sure if I just clean this up and then try again. Let's see if there's gonna come a response back. So the client that I'm running here is interacting with the LLML service, LLML service, sorry. With the back end running inside firmware cloud right now. Right. And again, this LLML is different from the previous LLML we ran. But again, this is also an open source LLML aimed specifically at code generation, correct? Yes, correct. So I mean, the service that we have built to help with this, yeah, now you can see we got a timeout operation here. The service we built for this is a preview service at the moment. So there might be something here that is not really working. Let me just check one thing and see if I made a mistake here. I might have made a mistake. I'll just go and see if we can, we can get this demo up and running. Just wanna make sure I have the right endpoint that I'm hitting. I am hitting the right endpoint. Okay, let's try again. So maybe I'll work this time. There you go. Okay, we're back online. So basically we now got a segment of saving back, right? And we can, you know, because this just didn't work. Let's try something else. Let's say we wanna have a bash script this time. And in the bash script we want, what do we wanna do? We wanna search for a file with the text log in it in all subdirectories. I guess a lot of people may know how to do this. I probably would not remember how. I've always hit Google first for like, yeah. Hey, there you go. So there you have a bash statement, right? So again, a small example of how, you know, I mean you could have put a web UI or anything like that. But a small example of how you can build something that would understand various programming languages and just help with some of that productivity. Nice. And in our architecture, right? So does this, no, this doesn't use embeddings or anything like that. It's just straight up inferencing where you're using code generation. Yes, this is interesting. Yeah. And if we wanna touch a little bit on the infrastructure side of this, I think when we talked about the spin framework, I used the phrase like, you know, host implementation or hosting provider, whatever you wanna call it. And what we've been showing so far are things that runs inside of Vermeer Cloud, which is a commercial offering that we have as a company. But the spin applications are able to run, I mean, I would say anywhere, well, it's just a CLI. Like if I run them locally, I use a CLI. There is however the caveat around the LLM inferencing that it needs in each CPU. There is a project, however, called RunWati on the container D, which enabled you to have various types of web assembly frameworks running inside of Kubernetes. So basically what you're able to now, and they just very recently released this particular feature is you can create a part that consists both of containers and web assembly. So if you have the scenario that we've been running here where we need a KV store and some LLM stuff, you could create a, I'm not sure if this is what you wanna do, Kubernetes wide, that's really not my strong area. But in theory, you could create a part that has a Redis container and a spin application running together so that you're able to have that key value store through a Redis provider. And then however you would provide the GPU option inside of Kubernetes cluster, it's definitely also option if you have access to that type of hardware. Nice, very cool. Yeah, I assume many people familiar with Kubernetes so that could be like an entry point as well for you to try it out. Michael, one thing that we actually briefly looked at in this architecture diagram was this phrase there which says generate sentence embeddings. And I personally think the whole creating embeddings with large language models is very powerful only because you can create relationships between sets of data, like say data in a documentation page or in a blog and with a language model attached to it. Yeah, let's talk about that perhaps. Yeah, let's talk about, yeah, but I have, yeah. So really what embeddings does is it take a given text ring and creates a vector, a set of vectors. And what you're able to do is you're able to compare those vectors which means that you by translation are able to compare similarities between sentences. That can be used in many different scenarios. For instance, we saw the sentiment analysis before and you asked whether are we mapping when we map into the cache in the KV store, are we mapping the exact sentence or what are we doing? We are in that case, but if you have, if we had created embeddings off of the questions we could have said, well, if there's a 95% similarity or 98% similarity or something like that we will say it's a match, right? So even without necessarily having 100% similarity between the questions being asked we could use embeddings to, you know, infer that it's the same sentence and thereby, you know, just give the same reply back in that case. Another scenario would be search, right? And let me just show a demo that we built which could be fairly useful for a lot of people, I think. I will go over here. Let me just share the right screen. So in the model, well, let me talk about the functionality first and we can talk about how this relates to how we use some of the spin features in here. But basically what we can do is there's an API behind this there's an ender point where we can post in some data with a reference and then we can afterwards we can go and compare a given sentence to that text that we pasted in. So one scenario that we could do is we can go and look at the documentation we have around spin. And we can say, okay, here's something that is, you know, introduction to spin. So let's use that as a reference and then, you know, let's paste a little bit of this stuff in here and we can submit that as a sample to our database. We might wanna talk about, you know, installing spin. So let's add that in here as well. So now we're just gonna add another set of data. What we wanna do, yeah, let's do this. Oh, we can take all this and add that as a text. So now we submitted another sample and we can take a last one and say building spin components in Rust would be one of the things that we want to provide. So let's say, let's just take some of this so that we have some text we can go and match on. Okay, so now we've submitted some samples into a database here and if we want to go and query that, let's consider that to be a search index, something like that. That is really what is the closest comparison here. And we can say, what is spin? And say now, this is a sentence that we want to compare against what's in the database. And what we'll get down here is the references in the actual text that we match against and we'll get sort of a similarity score. And the similarity score is where one is perfect. So it's the same text that we have. And you can see now, this is sort of the way where the introduction to spin actually comes up very high, probably because, you know, the what is spin answer is, the question is being answered in there somewhere. So if we do something, how do I install spin? Again, we're gonna do a comparison and we can see installing spin comes up really first. And, you know, we can say, can I use rust with spin? And we will probably expect that the building rust components of spin components in rust comes up very high. So I think you can sort of get an idea of how, you know, and if a queue could be mapped to this or really just to search, you know, across the documentation set could be matched to this. I want to show the implementation of this one because I think it touches on a few. Sorry, let me just get the right sharing screen here. This blows my mind because to do something like this, say even two or three years ago, you'd need like a data science team and like multiple, I know, like fancy vector databases and like huge number crunching. And now with few lines of code, you can just create embeddings to find relationships between, yeah, like any individual dev can do it. It's that democratized. Yeah, exactly. And what's interesting about the whole, so basically this, the functionality I showed here is all implemented in the API. So there's a Rust library here where we can see there are functionality, you know, to get these embeddings and create a store them or delete them from the database. So all the stuff that you need there, right? And I think what's, let me just go to the explorer. But what I want to show you is how this embedding, this is just a compiled, when you run these things in spin, right? Even though there's some rust code that's been written upfront, all of this compiles down to a WebAssembly. And the WebAssembly file, the Wasm file that is referenced in line 10, is something that we could just distribute. So if I wanted to build my own spin application where I want to have this functionality being part of it, or I just want to expose this as an API endpoint I can hit from other applications, anyone can take this WebAssembly file and start building a spin application around that using that as one of the components. So, you know, there are certain, you know, the SQL database is being used in here, there's an AML being used in here. And all of that just, you know, makes it really, really easy for you to reuse components and start building these things, you know, for various use cases. That's amazing, that's very cool. Yeah, I mean, just as a quick recap, we started off by showing a sentiment analysis demo. We went in the code and we showed how we used this open source framework called spin to actually build it out. We discussed the architecture behind it, how it works with something like a key value store. And yeah, then we spoke about use cases, things like productivity and augmentation of existing apps. And then Mikkel showed a great demo of embeddings as well, how you can build relationships between existing, you know, sets of data and do that in your code. Mikkel, any last thoughts on the serverless paradigm and building AI powered apps? I mean, so, you know, from personal experience, having felt that, you know, there's this world of AI and last language model. And I guess, you know, I have as most people, a lot of other people have, you know, played around with chat, GPT and being very, you know, impressed by what it can do. I felt like as we've been, you know, getting this functionality together around spin and I've had a chance to actually think through and start building applications with this. There's so many things, you know, opportunities and possibilities that opened up. And basically, you know, I'm pretty sure some of the stuff that we've demoed here, I think is that we will quickly go and implement on our documentation sites, for instance, or websites. And yeah, I definitely want to have, like since I'm part of being responsible for the projects we have and build, getting that sentiment analysis into our surveys is definitely something that I want to have as well. And with the spin model, it's easy to build. It's very, you know, low impact to run. So I think in that sense, it just opens up a lot of great opportunities. And I'm really excited about that. Yeah, same here. And honestly, I think we're just scratching the surface on what is possible in the world of like applications, both mobile web, whatever, with things like LLMs, right? So I'm so excited to see what people go out there and build. If you want access, go to fermion.ai. You will read all our documentation, how to get access to this, and you know, how to actually get started as well. And I think that's it. That's all we have time for from Mikhail and myself. Thank you so much. And yeah, go try it out. Thanks, bye. Bye.