 Hi, everyone, and welcome. Platforms, platforms, platforms. We've been hearing and talking about platforms the past few years so much. And along the way, we learn one thing or two about platforms. First of all, we want to focus on why we're building platforms, right? So we want to build platforms to enable product teams to deliver software better, faster, and safer. So we learned that we want to establish a platform as a product mindset. We want to definitely talk with the end users, understand better their needs, their requirements, and then expose a platform API, design an API, which can fulfill those requests from the end users of the platform. Then internally in the backend, we can do whatever makes sense for us to implement the platform, implementing different capabilities to support those functionality. But recently, we've been getting more and more requests about new types of application. And we got to hear a lot of different buzzwords, generative AI, large language models, embeddings, rug, vector stores. What's that all about? So before we even start extending our platform to support these use cases, I really would like to know more about this domain. And we have Lisa here. She's an expert in generative AI. So can you tell us a bit more about that? Yes, Thomas. So generative AI is the latest buzzword. Artificial intelligence has moved from this initial machine learning phase where it could categorize or analyze data to a point where it can really generate new things itself, text, images, you name it. So for this session, we're going to be talking about large language models. We want to integrate these into our programs as developers. Large language models, there's a couple of categories in them. So quickly, a language model is a completion model. If you send in, I like, it will probably say chocolate or something. There's chat models. They are instructed and trained to chat with you. So if you say, I like, it's going to say, oh, what is it that you like? Can I help you with something? And then we have a bit of a different beast there. Embedding models, they turn your text into a numerical vector that encodes the meaning, which gets very interesting, because you can use it, for example, to group text or fragments of text that have similar meanings by just looking which vectors are closest together. In a large language model, you send in text, which we call a prompt. So you probably have heard of prompt engineering, which is just hard words to say you tweak what you send in so you can tweak what comes out. And all large language models have a context window, so they limit what you can send in from 200 characters to 20 pages, for example. And then a next step we would want to do with a large language model. If we don't want to go train it or fine tune it ourselves, we can still make it learn from our documents, for example, our company documents that it didn't know about because they were not in the original training data. We can use these embedding models and these embedding stores where we store these numerical meaning vectors to actually go find the parts of our documents that are relevant to the questions we ask. That's called retrieval augmented generation, or RAG, again, a hard word to say chat with your documents. You see here the schema of how it goes. You split your documents in parts. You store the segments in a embedding store or a vector store together with their numerical vectors that represent the meaning. Then in a second step, when your user interacts with your large language model, you go find the relevant segments by also embedding the question and finding the pieces that are probably most relevant meaning-wise. And then instead of just sending your question to your large language model, you're going to send, hey, this was the question, the original question. And then here are parts of the documents that might be relevant in answering. So I hope that the platforms that Thomas is going to build will support these basic building blocks of AI powered applications. I quickly present myself. I'm Lisa Rass. I'm an AI transition specialist at OpenTide and also a collaborator of the open source framework LangChain4j. I'm also a committee member of the Cloud Native Hacks that are going on in parallel here. And I love everything AI and open source. Yeah, and I'm Thomas Vitale. I work at Systematic, a software company from Denmark. I'm a software engineer and CNCF ambassador. I'm really passionate about anything Java and Cloud Native. I combine these passions of mine and wrote the book recently, Cloud Native Spring in Action. And in general, I'm a big supporter of open source technologies and try to contribute as much as I can across both the Cloud Native ecosystem and the Java space. So I want to make sure that I understood correctly the problem. So we're talking about applications that are powered by large language models. And these applications are actually implemented using regular programming languages. So we're talking about Java, Golang, Python. But then we have two new types of integrations that you're looking for supporting a platform. One is towards a model provider. And it could be a local model providers like Colama, which is great to run large language models locally on your local development environment or for on-premises deployments. There could be cloud services, for example, Mistral AI or Open AI. And then we have a second type of integration towards these databases that really don't store data, but they store the meaning of data as numerical vectors. And we call them vector store. So it could be Weaviate as an open source example, right? Correct. All right. So I think we have all the ingredients to start designing a new golden path in our platform to support these new types of application. And let's start from how developers will interact with this new golden path. We're going to have a portal where we can extend the amount of golden paths that we have with a new one. I have created already a template for a new Java application, integrated with large language models and a vector database. So you can start the new project from here. Let's call it my app, because I'm really a creative person. And so that will be owned by Liz's team. And then we can provide information. I'll just use my GitHub repository for now as an example. And I'll give it a name, my app, super creative. And then let's define some parameters. This is specific to the Java application. It's called my app. So we can customize this onboarding procedure here. The interesting part, you can choose a model provider. For example, Olama, if you're working for a local model on-premises or OpenAI or Mistral AI or whatever other service, let's say Olama, and then pick a vector store. And here in the platform, then we can provide a list of whatever we support for now. We'll use GitHub actions for the pipeline and create a repository. So now you can go and check out the repository. But there's an important question we need to ask ourselves when we implement a platform. And it's all about the continuous development loop. So once a developer check out the Git repository, are we forcing developers to run Kubernetes, to have a Kubernetes environment to run the application? The answer is actually it depends. We talk with the development teams and find out a strategy that works best for them. In this case, we want to meet developers where they are. And developers across different teams are using, for example, Java, like Lisa's example, or maybe Python, or Node.js. And they already have a nice developer experience flow from the frameworks that they're using. So we want to keep that so to not introduce additional cognitive load to developers. In parallel, though, we have to solve the problem of, how do we provide these new integrations that the applications need? How do we provide a model provider and a vector database if this is running locally on a developer computer and we are not providing services from a Kubernetes cluster? And a great solution to that is using a tool which is really developer-friendly. It's called Test Containers. With Test Containers, you can have all your different integrations. For example, you can run all Lama. As a local model provider, you can run Weaviate as a vector database. And developers can use it both at development time when working locally, but also to run integration tests. Because testing is also important, we want to test against the real thing. And Test Containers allow to do that. It integrates with the application lifecycle. And it supports Java.net Ruby Go lang. So it's not Java specific. So in the context of the platform, we also want to provide a polyglot solution. So we are seeing an example in Java. But all the tools that we're going to implement inside the platform will actually be flexible enough to support different languages. So maybe we can test it out. I'm curious to get some feedback from you. And if that works. So this is what came out of your platform just like that. So let's have a look. I lost the side part. But it's a great application that let me try to run it. It's normally functional right like that. And I see I have a bean here with a chat language model, an embedding model, and an embedding store. So this is all set up for me. That's super convenient. What else do I find here? A document loader. So I know if I want to interact with documents, where to store them and how to use them. This ingester phase is there. The content retriever is there with some parameters that I might want to tweak for my application. That's great that I don't have to set all of that up. I see a chat memory. And then AI service here, a chat agent, which in Langtain4j is the workhorse of our application. And it gets to use the chat language model, the content retriever to talk with the documents. And it has this memory already there. So that's really a great start for me to work from. I have a look at this chat agent AI service here. So imagine I want to make something else. I can now directly jump into the business logic. And I want here not a chat agent, but I would want a composer agent. Because we're talking about orchestration here. And we're both musicians. So let's get some music in here. The nice thing in Langtain4j is these interfaces, these AI services, you can just declare what you want to happen there. You don't have to write much code for that. Ah, wait. Good. Okay, this is not my usual computer or ID. So I want a method called give composition advice. Also the nice thing is LLMs are very forgiving when there's typos. And then if I wanted to know what to do, I can just make an annotation at a system message here, which is the instruction for our LLM. And we want to say give composition advice for scoring the following movie scene. And then the movie scene itself is gonna be what we sent in. It's gonna be the prompt here. And so that's really all we have to do. And then we can just build it like this AI service builds from our composer agent and it will know what to do. Maybe we wanna give it a couple of tools because that's super cool. You can tell LLMs that they can use programmable tools, something I define here that runs in Java, accesses my database or whatever I want to happen. If I describe it well, my LLM will know that it has this and will call it when it's needed. And work with the result. So let's make a tool. I'm also very creative here. Let's see if I can get these curly brackets out. Yes. Yeah, the magic of the Italian keeper there. So we want a method that gives the virtual instrument code because we will wanna be using Thomas as a virtual instruments, not any instrument, but that takes the normal instrument name as an input. And then again, curly braces, yes. In this case, we will just, for the demo return one to three, but you can do any Java code here. And then to let the model know it has this tool, we're gonna make tool annotation saying gives Thomas's virtual instrument code for normal instrument name. Jet language model will know what to do with this. Anything we understand, they usually also understand. And then here we will add tools and say we built these tools. And voila, that's it. So it knows it can use this instrument retriever and it will also do so when it's appropriate. So imagine I'm ready with my application here. Can we now take it into production? Yes, but first I'm interesting because I'm interested in knowing more about how these type of applications work. Like we talked about Java, but I think that it might be nice to know how we can support other teams working with different technologies. And in general, what are these AI orchestration framework? If you can tell us a bit more. So we can support different languages in the platform. I'm sure other teams will be really happy. We would also be happy, yes. So with this boom of generative AI, there's also a boom in people that wanna build applications around these models, maybe have multiple models in there that interact with each other and everything. So there has also been a boom of AI orchestration frameworks. So the most famous one, the first one was Langchain, which was made by the Python folks. Of course, because machine learning and training, data engineering happens mostly in Python, but they also now have a version for Go and for JavaScript. Then there's, for example, semantic kernel for Microsoft, supporting JavaScript.net and Java, and then Langchain for J that I'm a collaborator of myself. It's a very popular one for Java at the moment and it's just great, of course. So they're all having similar building blocks. So let me dive into the building blocks for Langchain for J just to show you. You see here these language models, image models, more types of models will be supported in the future. The tools, which we just saw, they're there too, there's the memory, because models are stateless, and if you add a memory, then they become stateful. Output parsers are an amazing feature there because the LLM always returns text, or sometimes JSON, if you ask it to. What Langchain for J does for you is parse the output to real Java objects. So you're just calling your AI service and you get a list back or a plain old Java object, whatever you want, and you can just continue from there. You don't have to parse any strings. At the right, you see all the reg components to chat with your documents, and then usually all the AI orchestration frameworks have a layer on top to abstract away even more, like AI service we just saw or chains in Langchain for J. All right, so basically from the platform, we can actually support different programming languages, and then we know that for each programming language, the development team would use one of these orchestration frameworks, and then in the platforms, though, we need to provide the integrations to the actual model provider and vector database. So let's try to go to production because we need to complete this golden path. So for now, we talked about the development workflow. So the first thing we want to do is containerize the application, and we could use a Docker file, we could even make it part of the template bootstrapped from backstage, but using a Docker file would not be that much of a deal, not for the platform team because you lose control over all the security and performance optimizations, and for the developers means additional complexity, additional responsibilities. So instead for this example, we chose a cloud native build packs where as a platform team, we can centralize the logic to build production ready images without the need for a Docker file, and it works across different languages. So in this case, we are going to use it with Java, but it works with other types of languages, and then with one command, you can get the production ready image. You can use the CLI that comes with the project, you can have it part of the pipeline, or if you're working with Kubernetes, there's also Kubernetes native implementation, it's called KPAC, and in KPAC, you get a custom resource, it's called image, and you point to a Git repository, every time there's a change, it automatically builds an image, it also signs it with cosine, it generates a salsa at the station, so you have a quite secure artifact also pushed to the container registry. And then we need to configure the workload for deployments. And at this point, there are many different alternatives, but the key part here that I want to focus on is I don't want to expose the internal details and internal tools to developers, for example Helm, Customizer, Istio. Instead, we are defining a developer-friendly API, in this case, I'm using crossplane, to define a workload API, which only contains the information that the developer needs to provide. And this is just a few things that the platform cannot infer by itself, for example, what's the name of the application or what are the integrations. In this case, we have OpenAI and VV8 as a vector database. And then internally, we can implement it using all our favorite tools. In this case, I decided to go with a crossplane composition to implement the API and use Knative to achieve serverless deployments using, defining some application conventions. I'd like the platform to be application aware, instead of having applications that are platform aware. So there are some conventions across different languages on how to run application in productions. I would like to codify all those conventions in the platform. So we are removing even more cognitive load from the development teams. And then we're using service binding. This is an API from the Kubernetes project to bind in an autonomous and automatic way applications to external services. But we have a problem here. The platform, as it is now, doesn't have those services, doesn't have a model provider, doesn't have a vector database. I need to build that. And for the platform, we are using Carvel. So Carvel is a CNCF sandbox project. Each capability in the platform is bundled as a Carvel package and delivered as an OCI artifact. So what I did is, because the platform engineer experience also counts, it's not just about the developer experience, I made a template to build a new Carvel package in a nice way. So I can create now a new repository for this new package. I have already pulled out the repository. So what I can do is using the CLI that comes with Carvel, K-Control package unit. And then I define the name of the package. In this case, it's Vidiette. What is the foundation of this package? We typically don't start from scratch. So I can point to an existing Helen chart or an existing GitHub release with some Kubernetes manifest or some local files if it's a first-party product. And then I confirm all the versions and now I have my package. At this point, I can configure it any way I want using different tools. I can use Helm, I can use YTT, I can use Q, I can use SOPs to resolve secrets. And once I'm done with it, I can say K-Control package release. Of course I would do this in a pipeline and I give it a name. It's a container artifact. So it will go on my container registry on GitHub and then now it's delivered. And I can use it directly from my platform thanks to the package management functionality from Carvel, which is Kubernetes native. What it means is that I get a custom resource called package where I have a reference to my package and I can configure how I'm gonna customize this. In this case, I'm using Helm, YTT and K-Build in sequence. All right, I think we have everything in place for the application to be in production. So maybe we could give it a try. What do you think? All right. Okay, this is really amazing. You're making developers lazy, but we love that. Good. Here is our file application. It's a composer assistant. We picked four videos of which we will orchestrate one at the end. So the large language model is gonna tell us how to do it. We will execute this. So when we click on compose, we will get the advice here. So I wish to ask you which one you would like. So we have a romantic scene that we could orchestrate. Getting ready for the composition. The demo gods, please. Yeah. All right, so we have a romantic scene. We have a stroll in a gloomy forest for who likes that. There's an alien attack over the city. And then there's some sad melancholic video and we can make the music for any of those. Who would vote for a romantic scene? I thought we were in the city of love. Thank you for the two people. Who would go for a mysterious stroll through the forest? Ah, some gloomy developers here. All right, who would go for the alien attack? Ha, surprise. Okay. I think if it would really happen, we would have to talk again if it's so much fun. Okay, anybody that would like to sad, be doing it. Okay, so we will go for the action movie because that was the majority vote, but anybody that choose sad can come to me after the session and I will make you sad. That's not the problem. All right. But Thomas. Yeah, but first maybe let's talk more about the application. I'm happy to show you the architecture. Yes, I'm really interested in that. Going on there. I will need to myself. Okay. Yes. There we go. All right, so the way this application is built up. We have this UI. When we click compose, it will send the video description to our composer assistant application who will first fetch in our Reviate embedding store any instrument descriptions that are similar to our video description. For example, action, impeding, scary, heroic, will probably be words that are gonna be searched for or like meaning clusters. Once it has these relevant segments, it's gonna send to our LLM. Please give us composing instructions. We want to use Thomas's virtual instruments and we have a singer. Here is the video you have to score. This is the scene description and here are parts of instrument descriptions that you might want to use. And then the model itself will see that, oh, if I need Thomas's virtual instrument codes, I don't have that in my training data. I'm gonna call the tool that is there. So for every instrument, it is supposed to call the tool that will fetch the virtual instrument code from the database. And then it will take everything and give us a clean answer back in the application. I did not click yet, so I will do that right now. We choose action. So now it's generating a strategy. While we wait for this, hopefully it comes. I should have prepared some drum rolls, you know. We're talking about this. I wonder, I mean, everybody is now completely enthusiastic because composing advice, for sure it's everybody has been waiting for this, right? So in now the whole room here wants to go on our application right away. Can we visualize the load? Yes, because we can see it fails and of course we cannot go in production without having observability. And the good news is we have OpenTelemetry configured everywhere in the platform. And on top of that, we also added some specific conventions on top of OpenTelemetry to get information from the models and interaction with the model provider. We used our own custom ones, but there is a working group currently defining a standard set of conventions in OpenTelemetry also to represent data around large language models. So I'm really looking forward to that. So let's see what happens. We can also give this another try. And in the meantime, we want to see why it failed. So we can check the Composer Assistant and we have some specific spans for the AR orchestration framework used in here. All collected from OpenTelemetry so we can see the result. So the second one succeeded, the first one failed. So for the first one, we can see at which point it failed. So the model is called, or the application is calling the model multiple times because it needs to integrate with the tools. So we have to debug what's happening in that workflow, that chain of execution. But if we look at the one that succeeded, then we can also get additional information about the prompt. So you get all the prompt here if you'd like to tune it a bit and do some prompt design. You can see the finish reason and also all the tokens. So we can see at the end how much we're spending for this demo, right? That's great. That's perfect. Yeah. That's amazing. That's all I wanted to know. That looks good. So we got a recommendation here. Let's have a look. So we want to score an action scene. We got three core progression. I think they look fine. So maybe we can go with the first one. What do you think? Looks good to me. Yes. So let's try it out and see what happens. So Thomas, yeah. You're preparing your audio editor. Yes, let's do, let's start with something strong. All right. Do we have sound? Okay, we got some percussions. And now I'll cheat and just loop through it because that's fun, right? All right. So should we add some strings perhaps? We got the percussions. We got some strings. It says use Albion strings and some drones. So we'll follow the recommendation from the model. Nope, not that wrong one. All right. So how's it going so far? Alien attack. Should we add some more strings? Yes, the arpa. Yeah, that's weird, right? That's not really action with the arp but we can do something a bit better. Let's see down here. Yes. All right. We're getting there. So how are you feeling with the voice? We're almost there. Should we add some cello? I think he's suggesting a cello. So we could add some cello and then layer up the voice. Yeah. Let's add some cello. All right. Let's do it here. And like this. Oh, I don't want that. Go away. Like that. Yeah, should we give it a try? Okay. Drum rolls. I think that worked out. Where? I think we survived this. And it was a risky one. What an amazing voice. Okay, so to recap, we got the AI to help us generate this composition and it even gives some interesting results but we could probably improve it, right? Because the model has not been trained specifically with this purpose. So we integrated some of our knowledge around the feelings that each instrument's deliver, right? And some of the virtual instruments. And yes, we got the observability. I think we can move on and just go very quickly across the journey we've done so far. The first thing is we talked with the development team. That's the most important thing. First, we hear what's the problem that they have and of course, generative AI is super high right now but we want to know exactly what the needs are. So after understanding better the needs, we defined the project boost trapping phase using backstage, the boost trap also builds the local project using test containers for the local developer experience. And next, it will also set up a pipeline. In this case, GitHub Actions but you can use whatever pipeline engine you prefer. And then the other important thing is we are not exposing an internal Kubernetes tool or knowledge to developers. We are using crossplane compositions in order to define a developer friendly API and then we keep all the complexity inside the implementation. The development workflow again is based on test containers. We got the build pipeline in particular using build packs to help containerize all these applications. The deployment configuration was based on crossplane and the actual implementation was using Knative. We want to scale this. So when people are relying on generative AI and large language models, we also need to talk about scaling because there will be many requests. Some requests will be slow because as you saw, it took some seconds in order to process all the requests. There's a big back and forth between the model and the application because we are integrating knowledge from database and tools. We bind services using this API called service binding from the Kubernetes project. So as part of the golden path, we also automatically bind the application to the list of services that the developer mentions while bootstrapping the project. We got observability. That's really important. You saw things can fail and when you add large language models, they can fail even more. So we need visibility into the tokens, into the cost, into the prompt. So we can do some prompt design. And finally, of course, we want to go to production because if you don't get to production in a fast and secure way, our application is useless, right? It's in production that it delivers value. And the important thing, and I like to know if you also enjoyed that part, is developers are only involved in these first two steps, bootstrapping the project and then working on the actual business logic. And the platform obstructs everything else away. Yeah, I mean, as a developer, as a developer, this is the dream. So I didn't have to do anything for my project setup, except for choosing what I actually want and where I want this project to live. And then, all the rest, making sure there's observability, going to deployment. I didn't have to do anything. Like the VV8 setup, nothing. So the moment your platform is out, I'm definitely gonna use it and I'm never gonna waste, let's say I waste maybe two hours on the whole setup just to even start coding properly and then my business logic is done in another hour. So this cuts all this time. This is just really amazing. Okay, that's great to hear. Thank you. Yeah, so we did some music together. I mean, we are in Paris, you know. So music, it was a good fit. But you can reach out to us afterwards and in particular Lisa is really an expert in generative AI. So if you have questions around use cases and how it can help with improving the society, she's the person to talk to. So reach out afterwards. Yes. Okay, thank you very much for your attention. Yeah, we'll be around for questions if you need to. So just come to us. Also, I'm giving another presentation tomorrow if you'd like to join with Mauricio. We're gonna talk about unlocking new platform experiences with open interfaces. So we'll dive even deeper into how we can improve the developer experience across the distributed systems and integrate even more services in a transparent way for developers. Thank you.