 With that, I'm Amita Sharma and I'm managing OpenShift AI team at Red Hat and I'm from Pune, India. I have my fantastic colleague with me. Myself, Arti Lohanathan works as a Principal Quality Engineer in OpenShift AI team at Red Hat. Today, both of us will present this topic, how you can build and operate smarter AI applications with the help of OpenData Hub. The topic will cover that why we should really care that AI and ML workflows should be open source, what are the challenges exist, and how we can solve those challenges, and then that will be followed by a demo by Arti and question answers. Let's dig right away in. To wake you up a little bit, I would like to ask a question from you. I'm sure all of you are using an AI application in some or other way in solving your day-to-day problems. May I ask any one of you to give me one example that how you are incorporating or using the AI in your life. It'll be great if that example is a little bit interesting or unique. Anyone please? Come on. I'm sure somebody can give a very simple example that how we are using AI application in our day-to-day life. Anyone, I'm sure you are present in the AI track, you can give me one example. Have you ever used chat GPT? Anyone of you? Just raise your hand. That's the simplest example of using any AI application. In our observations, we use AI application in getting the answers of the questions. We also use BARD sometime to generate AI images. We use AI systems to generate codes. We use them to plan our trips. Not only that, we draft our mail sometime by using AI. We can see and we can say that the consumer base is increasing of AI application and the adoption is also increasing with the confidence. Whether it is ordering a coffee to a robot or accepting the driverless car, the market is increasing day by day. To tap on these opportunities, all the industries and everybody wants to build their unique solutions with the artificial intelligence. If the consumer is ready, market is ready, the industry is ready, what are the challenges? Why are we lacking behind? Let's talk about and understand the challenge. I would like to give you an example, a very woman-centric example. Chef, though it is representing a male figure over here. Just imagine, you want to set up a restaurant, a food business and you have the best recipe in the world. You have all the ingredients and you have the best chef in the world. But you don't have the right tools and the kitchen to make that recipe. Then your business will not be able to flourish. Similarly, even if you have the best idea in the world to create an AI application, but you don't have the right tools to create it, that solution will be of no use. Correct? Moreover, operationalizing an AI is not trivial. There are many stages to it. I'm sure you must be hearing a lot of buzzwords since modeling in the AI track and they all make sense. Whenever you want to develop an AI application, you need to decide what is your goal, what is your objective. Accordingly, you need a model, which can be consumed by your AI application. You need to train that model and the amount of data required to train such model is huge, it's among us. And then you need to train your model on certain parameter, deciding those parameters, deciding those hyperparameters so that your model can accurately generate the output. That is also not an easy game. And then deploying your model and utilizing your model by your AI application. The story doesn't end here. You need to continuously monitor your model, whether it is accurate or not. If anybody has used BARD over here, you have seen that it generates three or four answers. And it corrects itself, it fine-tunes itself with every question asked. So that monitoring and correction is also required in the life cycle of the model development and AI application. So the point here is, it's all very challenging and complicated. To add to it, when I mentioned about the data, data complexity and the amount of data needed to train these model is huge. And such kind of data is controlled by some hypervisor and tech giants in the world. I don't need to name all of these. You, we all know that they capture our data in exchange of free services. And that is why they are the central owners of such huge data, which is required to train any fair enough model, right? To add to the list of challenges, this will be the most interesting one. And I'm pretty sure many of you might not be aware about it. We all know that for developing the ML model, ML workflows, we need GPUs, CPUs, computing power, high-resource computing powers and all. But this is something actually so much interesting that everybody talks about DI compliance in AI ML, right? And I'm fan of it, by the way. I'm not against of it. And I really wanted a women's photo also in the slides which Harish Pillay has presented today morning and it has all the men in that slide. So that is one thing about data filtering and not being biased. But there is also a very important factor when we are developing the AI ML that is impact on the natural resources. Have you ever thought that chat GPT, we are using in our day-to-day life, it consumes at least one liter of water for every 15 prompts. So every time you're asking 15 questions to chat GPT, it consumes one liter of water. So how many gallons of water the chat GPT model must be consuming? Now you must be thinking how it is relevant, the water consumption, electricity makes sense, data servers make sense, electric garbage, carbon footprint makes sense, but water, it seems irrelevant, right? But if you know, I will be interested to know from you. If you don't, I'll be happy to answer in Q&A section because this is not the topic for today. Now I have presented the challenges. The answer lies in open source. I will not leave you alone with the challenges. We will dig into the solution which is open source and I have my friend and colleague with me to take us further in the open data hub, which is our solution for these challenges. Over to you. Thanks, Amita. So for the real world challenges that Amita speak in the previous slides, we have one stop solution that is open data hub. It covers three personas, data scientists, app developers, and AI applications. So as a data scientist, what we want? We want a self-service-like experience for the machine learning projects. So where you can access rich set of modeling frameworks, resources, and share and collaborate with others and delivering the work into production with speed, agility, and repeatability to drive business value. So some of the challenges that data scientists face today is they are all working in their own isolated environment. So it is very challenging for them to collaborate their work together in kind of mode that they develop and with limited resources. So how do we overcome these challenges? OTH allows customers to build, deploy, monitor AI applications at scale. What we make it different is that we help customers to bridge the gap between the data science initiatives in MLOPs world and the application development world. So it can go hand in hand to accelerate for getting the models into production and train and serve the models to AI applications. So OTH is an open source platform for foundation models. It supports many infrastructures like physical and virtual environments, private and public clouds, and it is built on top of Kubernetes. As a data scientist, you can take the models from hugging phase that is an AI community to create tools and models for the AI applications using machine learning. You can take them and train and deploy them into the AI applications at scale. Now let's see how we utilize OTH to create, build, train the model and to integrate them into a fraud deduction application and refine the model using automated data pipelines. Let's experience how OTH looks like. You can install OTH from the operator hub and once installed, it will be visible in the installed operators page. The components of OTH are like work benches, data science pipelines and model serving are present in the data science cluster details page and parts of each of these components are under open data hub project. Now let's see how the dashboard of OTH looks like. So this is the data science project where you can organize all your work together and you can start your experiments. So this is the dashboard that Amita speak in the previous slide and this is the kitchen in which you can cook all your AI recipes. You have work benches and you can connect to the external object storage using data connections and run your pipelines and serve the model to the AI applications. For this demo, I have used Minioivo as our data store and I have created two buckets, one for fraud detection model to store that and another for storing the pipeline artifacts. Now let's see how we can create the notebook image from ODH dashboard. So for this demo, I have used TensorFlow image and have connected the object storage that we have created before. And once the notebook is launched, we have this page and you can clone your code repositories into it. I will explain a bit about the dataset and model that we have used for this demo. This model depicts the critical transactions for potential fraudulent activities and it analyzes the transactions based on the geographical distance of the user's transaction and the price of the current transaction compared to the median price of all the user's transaction and how the user has completed the transaction either using pin number or hardware chip or using an online form. Yeah. So next, we have used sequential model from Keras library for the fraudulent model and we have added layers on top of it. This is a deep neural network and it consists of three hidden layers and one output layer. Let's build the model and after training it, we can, as a data scientist, we can always test the model and for example, I have asked whether the salis transaction is fraudulent or not. The model gave me 0.0080. So whatever the value greater than one is considered as fraudulent and whatever is less than zero is considered as not fraudulent. So it gave, it is trustee one. Once we have tested the model, we can then upload this model for future use in the mini IO bucket. So I have stored it in the ONNX format. So now let's see how we can use that model from the data storage and serve it to the AI applications using ODH. For that, you have to use the model serving option. You have to mention the model name and the serving runtime. I have used OpenVINO model server here. Once it is done, you will get the inference endpoint and using that inference endpoint, you can always integrate it to the AI applications. For example, here I have created the model using GRPC request using the inference endpoint and I got the same output. The transaction is not fraudulent. And we also have a Lira plugin to integrate the Jupyter notebook with the data science pipeline. It enables data scientists to visually create the end-to-end AML workflow using the pipeline visual editor. It consists of one or more nodes by defining the execution dependencies. For this demo, I have used two nodes. The first node is for creating and building the model and the other one is for saving the model and I have defined the execution dependencies on the right side pane. And while execution, you have to select the data science pipeline so that while running, you can always go and analyze the locks in the backend. I mean in the ODH dashboard using the pipeline run section and also you can see it in the backend cluster. So all of this with ODH, workbenchers, data science pipelines, model serving, different technologies are rolled out into a nice deployable package for customers. We are continue to release more technologies to round out all the end-to-end AI AML workflow. Thank you. Yeah, so you have seen that the life cycle which I showed to create the AI application, it is nicely presented and all the tools are nicely integrated in the Open Data Hub. You can use all of those tools to create and generate your AI applications. We are also working to increase the amount of tools we have currently in the Open Data Hub, for example, for the distributed workload because you need to train your model in such a fashion that it has a lot of training jobs, tons of training jobs and you need to distribute your workload so that the amount of computing, the cloud computing it is being utilized should be minimal because cloud costing is not less, right? And then hyperparameter tuning is something else we are working on in future and determined to deliver that. Currently, I think we have approximately more than 20, around 20 plus open source tools integrated in Open Data Hub and provided to all of you. So if you would like to join us in this journey, you are welcome to find us in the GitHub link. The slides will be shared and then there is a Slack channel, you can join us in there. If you have any questions, please go ahead. Thank you so much. So everybody knows how to chat GPT drinks water when we query it. Can somebody tell me? I am also interesting. Yes. No, no, no, it is open. We have Red Hat OpenShift AI which is the downstream product of the Open Data Hub but Open Data Hub is the upstream for that. Yes, it does. I'll let RT explain. Yeah, we do support LLM models. So right now we are storing it in the S3 bucket but we do have support in future. Yeah. Any more questions, please? That's all then. Thank you. Thank you so much. First time to false Asia. First time. Oh, quite a lot, quite a lot. Okay, so I've done false Asia for maybe like last seven, 10 years maybe. Just in case you want to catch the presentation. So we do record them, actually their live stream but after the event they will also post the video on the YouTube channel. So if you go to YouTube and look for false Asia you can find the video presentation but give it some time because there's a lot of video and they need some editing. So be patient with a while and you can actually watch all the presentations on the YouTube channel. So not to worry if you can't catch all the presentations. Maybe since we have a bit of time I can introduce the next speaker, Hisham from Instacluster and from France, I get it, yeah. Yes, thank you very much. I'm really excited to be here. This is my first time in false Asia, first time in Vietnam. Yeah, yeah, of course, yeah. I just want to start by doing some publicity to our friend in Red Hat, Mita and Artie. So go and check OpenShift.ai, it's a really amazing tool and we have recently a common joint support, Apache Spark support on OpenShift.ai. So it's definitely something worth it. So yeah, that's a lot of technical words there. It was just to attract you into my talk. It's just nothing consistent in there. So yeah, about me, I'm an open source product architect in Instacluster. We are part of NetApp. Previously, I've done a lot of things around data. I hold a lot of position data, something data engineer, data scientist, machine learning engineer, a lot of data stuff. So I'm talking as a practitioner. I'm not trying to sell anything here. Just talking about like a real life use case here. So the PlayStation, yes. We played a lot of games in the office. Yeah, I was managing an ML engineering team there. We built the platform, which was really amazing. At that time, I still had some hair. That's still gone, everything is gone. Yep. So, and yeah, I did some graph theory at the time. Did a PhD in random graph theory a long time ago. So all the heart is not at the right place, but you know, just as disclaimer here, there is no contribution to open source. Any of the one that I'm gonna mention, but the plan is at some point contribute to the community. My vision of things is necessarily biased. So, you know, my background, the company I worked with, I worked for, sorry, and stuff like this. And maybe I'll put my timer here, okay? And yeah, most of this is based on the kind of, the principle of OSS and open, you know, open data and like no less sharing. So it's really like in the spirit of sharing pretty much everything that you will see in a lot of things. And you know, not machine learning, but human learning is a lifelong, you know, work in progress. So it's always work that we need to kind of do better. Okay, the agenda for today. A lot of things here, we go through a lot of things, but there will be no definition, no mathematical equation, no code at all. It's just to show how we can using open source solve a real life problem. Okay, quickly, what is the real time machine learning? Okay, machine learning, you guessed, you know, smart people sitting behind their MacBook, writing, you know, some Jupiter, and yeah, that's an amazing model. But what is real time? So that's the whole thing. And I'm trying to show it here with a real life example. A smart wise person said, the open model of sharing is a foundation for a better future. And I let you read, this was from yesterday from Mario. So I was really lucky to, you know, it just took a picture from his slide. So you see, we can use technology to produce locally more environmental friendly and more healthy. This is what I'm trying to solve here. So after the COVID, at least in Europe or in France, a lot of people start cycling to the office. So you guess, this is where I leave from the other side of Paris and I go to the office and I start every morning trying to cycle. It's nothing like Hanoi, it's nothing like this. It's way different. And just, you know, so the thing is, how do I get the fastest cycling route to the office? You know, that's just the first question I ask. The least traffic. I want to avoid traffic, okay? So you go out there and you say, this is like something obvious. It's probably solved already. I went to, I don't want to name the app, which is by the way, the best we have so far, at least for cycling. And I end up in parking lots, accidents and closed roads. So okay, we have recommendation system for the next product we buy, the next food we eat, the next movie we want to watch. But how about like, you know, having something that impact me and you, you want to avoid traffic, you want to go somewhere in like healthy way, this is something like still challenging. So yeah, I mean, one of the main thing here is like when you are cycling, you're doing exercise. So you don't want to breed all this toxic or this polluted, you know, particles. So where do we, I went to the academic world. And by the way, there were some like, you know, really cutting edge research. And especially more recently, India and China, a lot of researchers from India are doing like a really great job because it's becoming a really serious kind of, you know, health issue. So that's why like researchers from all over the world, especially here in Asia are trying to tackle this problem. So academia is great, but still it's in the papers. So it doesn't scale. We have some guidelines on this, but not necessarily enough to build a product around this. So, and here we are at the real time because you know, I have an hour cycling from my home to the office. A lot of things can happen, accidents. I know just think about all the things that can happen. So I want to get like recommendation real time based on the events that are happening. If there is a rule of law somewhere, I need to get this information. Okay. Like if I don't know if you heard, let's name it. Like, you know, this navigation system called Waze, which was acquired by also by Google. It gives you information about road closures and stuff like this, but it's based on crowdsourcing. So me and you users, we need to put the information. So we are far from there, at least for, you know, cycling and pedestrians and whatever like, you know, so, but there is no kind of obvious solution to do this. Okay. So the only definition here, no textbook definition, nothing. The real time machine learning in this case is, the ability to generate prediction or recommendation or decision based on kind of adapting to the changing of environment. This is the key thing. If you, you know, if you want to get somewhere, if I want you get out from my talk here, just, you know, think about, this is like the only definition as I said. So it's a box somewhere. We put all the information, weather, pollution, you know, like road traffic and stuff like this, putting that box, hopefully not a black box. We want it to be a white box so we can understand what's happening there and give you a recommendation or give you the best, you know, route to get from point A to B. Okay. So this is what the platform look like or at least, you know, the goal is to put there the minimal kind of, you know, stock that will help us to tackle this problem based on open source, mainly open source with different licenses. We can argue here, like, you know, this is not completely open source. This is blah, blah, blah, blah. But this, the target is like, you know, bring all the open source, put it together. Again, as I said in the beginning, I'm biased. So you probably list other tens of tools. I choose this, you can choose others, do benchmarking. This is not something VS, something. This is just how to put together tools to solve a problem. And one of the key things here is to use Spark or Kubernetes. This is how the title, because it's a building block for different parts of the platform here. The data ingestion, transformation, machine learning, training, and even, you know, like, serving. But this is not something I mentioned here. Okay, so what are the challenges for real-time machine learning? She is originally from Vietnam. She wrote a lot of books in Vietnamese, but not technical books. But I do for publicity for her. Designing machine learning systems which is like an inspiring for me. She listed all these kind of challenges or key things to consider. And I think the blue one are more from the data science side, sorry. I don't want to minimize or anything. And the other side is more from the platform or from the tech stack, okay? So I cannot agree more with her that the real-time machine learning challenges in our experience and all the practitioners experience is mainly an infrastructure problem. It's mainly tooling problem, how you stitch together, how you bring together, how you build your platform. Again, I'm not minimizing all the cutting edge stuff that are happening from modeling side, but the main problem if you want to build something is around, you know, tech, tech, stock issues. Okay, so Spark and Kubernetes can help us do stream processing, training, scalability and latency. Latency is the key and resource efficiency because as Amita was saying, we don't want to burn liters and kilowatts of electricity. We want to minimize the work we are doing because we're thinking about the environment after all. Okay? So just quickly, why Apache Spark? Apache Spark is a big data framework. It's in the Apache foundation software. We use it, it has two green things. It's versatile, easy. You can use it with different language framework to do ETL, ELT, real-time machine learning BI. And it's connected like, you know, to different object store, Kafka, whatever, just list test. Fast here, it's in the gray area where there are other frameworks, you know, that may be faster for other stuff, but the green light here is enough. Spark on Kubernetes has been a long-going journey, but it's maturing enough. Kubernetes is in the CNCF, in the Cloud Native Foundation, but it has been running perfectly. And this is what we recommend. One of the key things here, I mean, there are a lot of benefits of running Spark on Kubernetes. One of the key things, it's the isolation and the resource management, which we consider as being like, you know, major thing here. If you are embarking or if you are starting to think about operators in the Kubernetes world, there are operators that make these applications run natively on Kubernetes. So there is a Spark operator there. We are really happy to contribute to this open source project. So yeah, if you wanna test it, go ahead with the Spark operator way of running Spark on Kubernetes. This is not, you know, I'm talking about this. This is not just like our friends from AWS made this study. It's not just like the new hot topic, I know as engineers, we like to use the latest, like, you know, no, this is really serious because it brings some value. It brings performance costs. So, you know, a lot of benchmarks out there. So really encourage you to check this out and this is, again, not like yet another hot topic. So, how, what are the challenges? Let's come to the meat of the talk here. So running Spark on Kubernetes seems like an ideal perfect idea, but there are some challenges. So monitoring, scalability, latency and model training. Now just give you some pointers here. Of course, like I said, I will avoid anything, you know, too tech here is just to give you ideas. So it's the tip of the iceberg. Kubernetes is complex, Spark is complex too, bringing together, you multiply the complexity. So the first thing first is logs and there are logs, tons of logs. And one of the key thing here is that you're looking of the key information, there are buried hundred, you know, under tons of others. So this is our list, a few things here. How do we solve them? Of course, there is a rich ecosystem, open source ecosystem, design the monitoring, use something like Vue and D, Vue and Byte, you know, LogStash and Prometheus. Prometheus kind of, you know, helps a lot on this because there is a built-in Prometheus with Spark. So collect the logs and, you know, change the paradigm because we used to do this, you know, monitoring just to kind of, you know, find out that bad things happen. But monitoring, when you kind of change the paradigm, monitoring is now, it's more like, you know, to be more proactive, you know, helps you to prevent things from going wrong. And also we learned a lot of things, we learned things about how do we optimize our infrastructure, how do we kind of, you know, miss allocation and miss utilization through the monitoring. Scalability, so again, it's not, you don't get scalability, it comes, go back to like, you know, like, I'd say common sense, how do you, how do you size your clusters? How do you choose even the VMs or, you know, like the machines, whether you're running this on-prem or in the cloud, how do you choose them? Dynamic allocation is a concept in Spark, which is like, you know, two technical here, but it's a really key. And then just shuffle data is the data that, you know, moves from one worker to another in distributed system. So scalability, the right sizing again, I'm just listing here a few points here. But one of the key things that again, Amita, I think mentioned, CPU, your training model, you're inferring, so GPUs or CPUs, there are smart ways to, you know, ask our nice data scientists not to burn all the GPUs. Like, you know, even if you are doing deep learning, doesn't mean you need only a fleet of GPUs. You can have a mix, you know, workloads between CPU and GPU. And this is what's really great with Kubernetes. You have like affinity groups where you can mix workloads. So this is things that, you know, taking consideration and this is like, not sometimes easy, but it's feasible with a lot of effort. Yeah, so I think I'll skip this, but dynamic allocations allow you to do like auto scaling. I mean, there is the auto scaling concept in Kubernetes. There is the auto scaling, you know, concept in Spark itself. But anyway, this is something you have, this is really challenging. And I guess this is not solved yet. It's out there in the community, at least on the Apache Spark side. So this is a work in progress, but I would say, you know, the main recommendation is be careful with this. Chaffle data is how to manage, you know, this external chaffle data. So a lot of big tech companies are working on this. Oh, you can't see the last one here, it's from AWS, but yeah, sorry. So at the least, I mean, we can recommend again in like an unbiased way, we've tested the IBM Spark S3, chaffle data works perfectly so far. Latency, again, if you think about it, I mean, some of us who did like computer science a long time ago, we used to have at least learned this, the latency number every programmer should know. So it goes back to, you know, how do you, you know, things like, you know, milliseconds, network and stuff like this. This is, this really matters like, you know, it's old school, but brings back to, if you have a pattern like, you know, your data source Kafka, Spark Kafka, it comes to things like that. You have to take the paper, pen and paper and figure out how do you can reduce the latency and where are the bottlenecks, like old school, fashion way. So yeah, there is another concept of the stateful. So we need to keep all the states. So it adds some complexity, but it's sold recently with RocksDB as a state provider. ML training, again, I think Amita and Artie mentioned this. If you take that Apache Spark itself is a bit lagging behind and after all, if a data scientist ask you for things like circuit learn or, you know, stuff like this, they are mainly built for mononode or like, you know, for single node. So you have to think about ways to distribute stuff like this. I can mention Ray, Ray is doing really great, but at least in our opinion, it's not mature and versatile enough. Okay, so things like whole world and TensorFlow and PyTorch, these things, they work pretty good with Spark and Kubernetes. There is, I mean, it's open source from NVIDIA. This is like the Spark Rabbits, which is kind of a really cool like for ETL distributing ETL on GPUs. Okay, there is another kind of key challenge here. I'll just mention just quickly is the native, you know, the default scheduler and Kubernetes kind of lack or had this issues where you can get in the point where you have a starvation of resources for some of your workloads or jobs. So the community has been working on two kind of main solution. The first one is in the CNCF, which is Volcano, which is pretty great. Actually, we have been testing this extensively. And the other one in the Apache Foundation, which is Unicorn, I mean, you can test them if you wanna embark on this journey, but both of them provide what Yon like as a resource manager used to give. Conclusion and take ways. Yeah, yes, it's again, from our experience, you wanna do real-time machine learning, Spark and Kubernetes is at least the right source for the building block for doing all the heavy, you know, transformation, native integration, best practices will show that, you know, we can reach some good latency, scalability, you get it by default. And the integration with the rich ecosystem. If you remember, we started, we mentioned Kubeflow, Feast, stuff like this were already there. Yeah, so use the Spark operator. If you're back again in this journey, design and build your logging and monitoring. I would say before anything, but it's not, doesn't make any sense, but just keep it in mind that you need to do it. Keep adhering to the best practices from both worlds and yeah, use the rich ecosystem. And just why, by the way, like on the north, I think it was mentioned, contributing to the open source doesn't mean necessarily like a write Java code, you know, you can write Python, it's better. I'm just kidding, but I mean, you know, share your experience, go out there and write, you know, your thoughts, your ideas, your challenges, contributing to the open source is way larger than again, just writing and be a computer to a code and keep cycling. And this is what again took this from Mario, Mario. So there's a lot of copyright here. He said, let's collaborate and solve global problems. Thank you very much. The second one is a really great question. I was expecting this one. Okay, so there is this say, at least in French or English, you say like, you know, the king is dead, long live the king. So, you know, so it's like when the king is died already, you say there is a new king so long live the king. So Hadoop is not dead, you know. So we don't say that Kubernetes is replacing, you cannot imagine how many companies, big tech companies are still using Hadoop. We're just trying kind of, you know, using the latest approaches like the cloud native, more containers and stuff like this, which Hadoop was lacking and it's still lacking, but all this rich ecosystem, some of it is coming, coming from the Hadoop world. So Hadoop is still there. There are some still interesting use cases. I mean, one of the use case that I was mentioning is solve this kind of problem, Spark on Kubernetes, like real-time machine learning, but a lot of like, you know, again, tech or small or bank companies use Spark on Kubernetes. It was mainly built to do ETL, ELT, you know, do extract, transform and load. Just extract your data, transform it, put it in an S3 bucket in a database, whatever. So this is something commonly used. You're welcome. You have any question? Yeah, of course. So you think, oh, sorry. Yeah, the question was, yeah, thank you. About how to use framework like TensorFlow by torch with Apache Spark and what is the point after all to use this? So think about Apache Spark to distribute the work. Like if you have a model that doesn't fit on your machine or doesn't fit on your VM, you want to distribute this. You want to distribute your training. This is what we do, you know, at scale, like, you know, you cannot just run your model on the machine. So you need to distribute this. Spark has this, you know, like plugins, I would say, that help you to distribute the training on your cluster. So you have big cluster with different machines and Spark help you to do this. So TensorFlow, things like part torch, we have a part torch operator. I don't know if I mentioned this, but you see that the torch distributor here, like, is it this one? Yeah. This is something that helps you to use torch and distribute it on your cluster. So you don't have to worry or think about how this is going to work under the hood. That's kind of, you know, in the nutshell, the usage. Yep. Yeah, so that's actually, you know, something. Yeah, I repeat the question is about Ray. Ray is really amazing. You know, like, you know, Python based for mainly data scientists. I mean, it's emerging. It's really fast. It's doing really cool things. But I mean, again, just in my point of view, which is a biased one, it's not versatile enough. We want a tool. I don't want to manage like, you know, multiple tools. I want a tool that can help me do ETL, ELT, streaming, machine learning in one thing, you know, in one approach. I don't want to manage like multiple of them. And by the way, there is a way to run Ray on spark. So you can, yeah, but why are the complexity again? This is a question to ask out there to the community, I guess. Thank you very much. I can see all the cycling is making you very trim and fit. Next, ah, okay. Great. We have, next up we have Mr. Koji-san. Koji-san, Anura. He's a CTO of UTI, ING. And his topic will be mastering web application configurations, a journey through NGINX. UWSGI, a Flask with Knowledge Graphs. Okay, I can see a lot of people changing. So if you want to take the opportunity to stretch, do a bit of stretching because you've been sitting down for a long time, feel free to do so, we'll have a minute or two. Break to swap. Yeah. Okay, so let's put our heads together to welcome Mr. Koji-san. Hi, hello. Let's get started. My session title is here, Mastering Web Application Configurations, blah, blah, blah, and with Knowledge Graphs. Okay. Oops. Well, okay. Okay, who am I? My name is Koji Anura. I live in Fukuoka, Japan. So I love Apache Hop. Apache Hop, do you know Apache Hop? Yes? Oh, great. Apache Hop is a data orchestration app. I'm a founder of Apache User Group, Apache Hop User Group in Japan. So, and I love Neo4j Graph Database. So, I am one of the founders of Neo4j User Group, Tokyo. Please join us. Okay. Hi. I'm studying English on just Duolingo. For studying Duolingo? Oh, what kind of language in English or something? Oh, thank you. I learned English and Klingon, do you know? Klingon, Star Trek, Klingon language, yeah. Over three years. Okay. Today's agenda is here. First, web app, about web app. Second, let's write account file something. Third, one of the good solutions. Okay. For notes, we're about knowledge graph. Who uses knowledge graph? Okay, just one. Thank you. Oh, this is a big problem. Web server gateway interface. What is the correct pronunciation? One is micro-wiskey. Second, U-wiskey. Third, new-wiskey. First, first one. No. Second. Oh, okay. Third, okay. I love micro-wiskey. Yeah, like this, micro-wiskey. Okay. In Japan, it's called Uesugi in Japan, but micro-wiskey is normal. So first one, web app. Yeah. Who loves Conf or any files? Who loves? No, no, no, loves. Hate. One is yes or easy to set up, but no, difficult to set up. And no, I'm not sure what's the best setting. Yeah, it's like our PZ. You cannot achieve the goal in the attempt to seek hint from the master. Okay. Python web apps here. So which framework do you like? No, Django or something? Are you? Fast-CPA. Fast-CPA? Oh, good. Yeah, this time, are you use of Flask? Okay. Please imagine you are a developer of Flask, Python. Web API, a web app. So, and please imagine that you are also working a Ruby project. Do you like Ruby language? Yes? Okay, and most famous web framework is Ruby Rails. Okay. And, oops. And please imagine that you're also working on the Java project. Okay. Python and Java is most common, which framework do you like? Spring something or play framework? So I chose a Spring boot. Okay. If you have three different language web app here, like this, how do you use access to the website? Python port, 5,500 Ruby, 3,000, Java, eight or eight. Okay, and this, I choose a web server. So I choose a NGX for web server, like this, okay. So in that case of Flask, it's common to use a microviskey, okay? Like that, circle, red circle one. Microviskey connects engine X and Flask. Microviskey option has a lot of options, so difficult to create any file. So traditional program serving methods. One is seeking advice from experts to conducting web searches. Three, using AI assistance like ChatGPD. For user, ChatGPD, four or 3.5? 3.5? No, four? Oh, you're rich person. Okay, let's write any file for microviskey with ChatGPD. Okay, just easy. Teach me how to connect to NGX and Flask. ChatGPD answers here, a simple to connection. This is NGX Confile, like this location here. Just a simple connection. So I ask a next question. So how about using a microviskey? ChatGPD answers here, this line is changed up.soc. Any file for microviskey is here. Did you write any file? Did you write any file for micro? No? Are you familiar of this conference? This any file? No? No, okay. ChatGPD answers here. Okay, it's really, it's very simple any file. Okay, I ask a, yeah. Okay, I ask a next question to ChatGPD. How do I automatically reload when the app changes? So they answer the pie auto reload. Please add pie auto reload. Yes, it's correct. Virtual environment environment. It's not correct. ChatGPD answers here. Python path equals something. But, okay, yeah. Oh, it's hot in. And, okay. Okay, what options should I add to my new setup? So I ask to ChatGPD. And they answered add socket-backlog option. But it's not correct. It's no socket-backlog option here. That's five, five, five, okay. And this is hallucination. Okay, one of good, good solutions. And how can one reserve hallucinations waiting for next version? Some GPT 3.5 for 4.5, 4.0, 5.0? No, no. It's good solution. One of the solutions is here, plus Knowledge Graph. Knowledge Graph is here, like this. Dog is animal. Animal is living things. So cows is an animal. Cows eat herbs. It's a Knowledge Graph. So I create a Knowledge Graph. This is an example of Knowledge Graph, movie and person. Matrix, movie, person, matrix, which matrix? Listen. You can easily search in Cipher language. It's not a sequel. Matrix movie and person is here. And Keanu Reeves and Matrix 3. And this is a graph. So I put micro-rescue options to graph here. Just category is one and type is 125. Option is 1254. So a lot of options here. So you can add same as relationship. This option is different name, but same thing. And you can add check it relationship. So you set the disable logging through. So please check it log here. It's easy to set up in the graph world. And this solution is gen AI apps from Neo4j.com. It's easy to write here, match. Okay. This Neo4j works with three different AI. Okay. Conclusion. So really fast. The advancement of AI is remarkable. But Knowledge Graph is particularly effective for representing fact. Okay. Next, we would like to welcome, let me see. Ms. Zhang Lirong and Chow Miao from Miakari. Is it? Okay. Who will be speaking on opportunities to address complex challenges in e-commerce search through LIM. So open. Yeah. Hi everyone. Thank you for joining our talk today. And we are thrilled to share you with the topic of opportunities to address complex challenges in e-commerce search through LOMs. This is our agenda of today. We will cover our LOM based approaches from the construction of offline evaluation data set to the innovation of scenario generation and our pioneering work on image quality assessment for satisfying user experience. Some brief introduction about our company. Miakari is Japan's top free market application for everything selling and buying. Our monthly active users are over 23 million and there are over 3 billion items are listed in total. So making sure users can find what exactly they want isn't just nice to have. It's essential in Miakari. And for this purpose we are addressing the limitation of what search can do in online shopping by leveraging the extraordinary potential of large language models. I'm Liro and here I'm with my team member, Miao. We are both machine engineering engineers focusing on making such results more relevant to users in Miakari. I will first kick things off with the first two topics today and then Miao will take from there to share our journey on image school. So let's head into our first topic today. The revolutionizing of the construction in the offline evaluation data set by the help of large language models. As many of you may know, there are two critical stages in the domain of search re-ranking and retrieval. The evaluation purpose for each stage has distinct purposes and requirements. For example, in the re-ranking the task is to re-order a set of given items. For example, in these cases, we want the clicked item to be ranked higher and not clicked item to be ranked lower. So the positive and the negative signals are both clear in such cases. Considered item got the user's interaction information as the relevant item and those who didn't get the user interactions are irrelevant items. Also, the evaluation, it's quite also relatively clear to give a proper evaluation on re-ranking items. So anyway, here can tell me which one performs better, the first one or the second one. Obviously, the second ranking model ranked the relevant items higher than the first one. And in contrast, the retrieval task aims to improve both recall and the position. So this aims to provide as much as possible relevant items and this includes previously unseen items. In the meantime, we want to also minimize the inclusion of irrelevant items. So the biggest challenge here, as you may appreciate, is how we evaluate on those unseen item data sets. This is particularly complicated in second-hand free markets like ours because the inventory is constantly updating. The items in our marketplace is refreshing consequently and our such systems are designed to prioritize newly listed items to encourage new sellers. So in the real search result page, only a few items got historical information to help us judge the relevancy. And a large majority of the items are new showing they don't have enough historical information to help us judge the relevancy. So as you can see, with the two retrieval systems, it's hard to tell which retrieval system performs better because those unseen items, they could be relevant or they could also be irrelevant to users. And this also introduced us with paradox. So we first want to ascertain items relevancy before showing it to users. However, the relevance judgment inherently requires showing the data to someone. As a common practice in constructing the offline evaluation data set can be relying on the search logs or employing human annotators. But the method relying on search logs can face the unseen item issue. And the employing human annotators can facing a high cost and even the reliable issues. For example, one of my team members have faced the situations that the code workers he hired are relying on machine learning models to do the annotation. And beside that, there are recognition bias from person to person and the complexity of providing the annotation guidance to the annotators. So with the showing of the large language models, this gives us new opportunities in constructing an offline retrieval data set. So our approach is inspired by some pioneering studies. One of the study comes from the Microsoft paper, large language models can accurately protect such as preferences. And this paper indicates that OOM annotators can perform better in the annotation task than human annotators in terms of the cost efficiency and the accuracy. So based on this assumption, we developed a system to retrieval the potential relevant and the irrelevant item to our search logs and the semantics search. So item provided by the semantics results, since they are retrieved by the semantics similarity between the queries and the item information. So they can be different apart from the lexical search. Those should provide us with the unseen item data set. And we then applied the GPT4 annotation upon those unseen item data set based on the user's query and the item information that we provided. Those unseen items should be valuable to us for later evaluating our retrieval system that improve recall. So this is how our annotation prompt looks like. At the top of the prompt, we first provide with the definition of each relevancy label. And there could be the exact label, substitute label and the complement and the irrelevant labels. Following with several examples, those examples are established based on our real items in our marketplace. So this can help us, can help the GPT model to better understand the real situations in our marketplace. And then based on the user's input query and item information, the GPT4 model will generate the relevancy label. And the definition of the relevancy label comes from the ESCI data set. We later also conducted the evolution on the accuracy of the annotation prompt. The evaluation also conducts on the Amazon's ESCI data paper. And our findings indicate high accuracy across most of the labels except for the substitute label. And this exception adjusts the limitation of large language models with complex judgment cases such as situations that even human annotators find challenging. I would like also to share some real examples from the GPT4 annotation. So for the query Amazon Firestick TV, the GPT4 model considered the Fire TV Cube, Apple TV and PS2 device as irrelevant items to this query. And for the remote control of the Fire TV stick, it considers as a complement item to this query. And for the set with the EcoShow device, it considered this item as a substitute item for this query. So we can see a high accuracy from the GPT4 output. And it was asked to further explore the capacities of GPT4 model on the annotation task. Okay, let's head to our second topic today, which is using OOMs on synonym and generation. So synonym and expansion is to use to solve a customer's pinpoint, which is the disconnection between the user's keyword and the item they want to search. This huddle is significantly heightened in Japanese free market, many because of the lexical gap between foreign languages and Japanese. So for example, here a user wants to search Louis Wheaton, and the item listed in Japanese Katakana Vito will not be matched in this case. So to address this gap, synonym and expansion stands out as one of the most effective solutions. So by expanding the synominance after user's query, whatever the keyword user choose to use, Louis Wheaton or English Louis Wheaton, they can be all matched to the correct items. And in Merukali, we also tried OOM-based approaches to improve our synonym and generation process. In the proposing phase, we provide with the item titles, and GPT4 model will give us the output of the South Keyword and the target synonym. This is how our first version of generation prompts look like. At the top, we give a simple instruction on how to generate the synominance, as well as a definition of what is the synonym. Following with several examples to make the model understand our expectations on generative synominance. And our evaluation outcomes affirm that the GPT-based method outperforms all the other unsupervised methods that we previously utilized. And this improvement is not only in precision, but also in recall. So let's see a concrete example of the keyword syn-free. Significant improvement can be observed on both scope and the position after applying the GPT-4 model on the keyword syn-free. And we also reached a significant improvement though our online A-B test. So encouraged by the promising outcome that GPT-4 generated synominance delivered to our business metric, we further explored the capacity of GPT model with the aim of improving the quality of generated synominance. So in our second try, we add a layer of filter by introducing abuse prevention rules into our process. And this is how our second version of prompts look like. Inside prompts, we define 11 abuse rules. And this goal is to make the model mindful of potentially misuses or inappropriate associations that could occur in the synominant generation process. This approach requires the model to perform a dual function. First to continue its original task to generate high quality relevant synominant pairs. And the second task is to evaluate on the generated synominant pairs to check if it's against the previous defined rules. So after the second iteration, we get a higher quality of generated synominance. But still, we found LLM struggled to generate domain-specific synominance. So, for example, it's hard to let the model generate bugs for the word perch. Those words can be crucial for our search system. So in our third go, to further help the GPT-4 model generate exact synominance. We use Google search, for example, to check where the porch has the same meaning with bag. We let the model to conduct Google search, what is porch, and then summarize the top results in the prompts. The model then can sink it over to come up with the synominance. And this is an example of the keyword styling gear. The Google says styling gear is a silver accessory brand which established in 2000 year. And based on this understanding, the GPT-4 model then considered the styling gear as a valid synonym. But for the keyword bar accessory brand, it considered it as an invalid synonym. And later, we also conducted the evaluation on checking the quality of generated synonymance. So the negative samples are clicked from our first and the second iterations. Many based on human checking and from customers' complaints. Based on our latest experiment, the new university dictionary only covers 0.93 percent of those negative samples. And compared to our very first synonym dictionary from the unsupervised model, there are only 3.31 percent of the first negative, first positive items can be found. So from to summarize, we reached a better performance and more mature on each iteration of the synonym in the generation. And we gained a higher quality of generated synonymance compared to our traditional method. And I will hand over to Miao to introduce our image quality project. Okay. Thank you, Lirong. I will try to finish this part in five minutes. So let me first share the background of this project. In our Macali app, only price and images displayed in the search result. And we are considering the price and image can affect user clicks, apart from the search relevance. For example, if an item have a better image, it may attract more clicks from user. So based on this assumption, we are trying to conduct an experiment to take the image quality into account for the search ranking. So this is the overview of this project. First for training, we build a training dataset with image pairs from the same search result page and use feedback learning to train a model. And we deploy this model in the indexing stage by adding an image score to each item and index the extra field image score into the elastic search engine. At the retrieval stage, we use the image score from each item to as a factor to the elastic search score. Since the final, the original ranking from elastic search is based on this score, we can use the image score to affect the ranking. Our today's topic is basically on the data generation. So let me introduce how we process this data. We use the search log from user query to get similar items from the same search result and filtering these items by price so we can get similar items with similar price. And then we filtered them by the position so we can get some limited batch and we will send this batch to GPT-4 Vision VPI to get an image score. Here is how we design our prompt. So we are inspired by XIQE which is a recent research on image quality evaluation. So we leverage the authentic evaluation part because it's aligned with the objective. Here is the prompt we use. We leverage a chain of strategy to encourage the LIM to first provide a task specific analysis before assigning a score. This method cannot also not only get a more accurate score but also provide an explainable information. And we evaluate the LIM annotation by comparing with the pertrained model for the image quality evaluation task. And the result shows that the LIM annotation has a distribution that better reflects the real world scenario and also demonstrates the relevance to our user clicks. Here is some visualized example of low quality image judged by GPT-4. So you can see there are some dark image or some blank image with only text information. And here's high quality images judged by the GPT-4. And we also check the trained image score model on our LIM annotated dataset and the result also shows the correlation of good result of the image quality and the predicted score. So here is a summary of our learning from our projects. And yeah, we think label method is a good solution for quick iteration or experimentation. So I hope this information is helpful for your task. Yeah, basically that's our sharing. And we are hiring. Thank you. Thank you so much. In case you guys want to learn more, they do have a booth downstairs. So feel free to go and approach them. Where is the next speaker? Ah, okay. Cool. Have you tried it already? All right, cool. There's a lot of people coming in to listen to your talk. And a lot of people are coming in. Oh, you do it? Because that's the coffee break. Okay, thank you. This is the first time I come to Fosacea. Thank you. Okay, today I will talk about JMA, a new open source language model that provided by the Google. Let me introduce myself. My name is John. I'm a Google developer expert machine learning from Indonesia. And I'm also a Google developer group organizer and also I'm a lecturer and a director of research innovation center at my university in the Institute of SDS from Indonesia. My research was in the natural language processing, machine learning and knowledge engineering. So before we started to know about JMA, let's talk about what is an LLM? In the era of generative AI, LLM played feature role models. LLM was a model with strain, a lot of data that can capture a semantically or statistically pattern from the words. So it can be done for generated text or doing some things with kind of creative contents. Okay, let's look about in the some example here. We can see about that the process of LLM, the example, the simple example of LLM is about to create an autocomplete for some several sentences that we created fulfill the blank. It says that look at its strain gets and the LLM can say it's a dog. Also it can use a reasoning here. Let's say Paris is from France and Tokyo is for Japan. But not all this have a great surprise for LLM. The biggest surprise for LLM is also if we train our model into a boom then it will be also created response or its generated text poetic to complete the boom. And also when we train it into a code and the LLM also can provide a complete compilation of the code that we provided. Now let's move to the modern LLM. If we say about the modern LLM all the modern LLM is large. Why it is large is contains a lot of knowledge that has been drained to the LLM. If we look at this example for the sum of the LLM example a Gemini from the Google if we look to explain this kind of joke they can give us the detail of the meaning of the joke that we put as a prompt to instruct the LLM. And the second one LLM also can give a suggestion to us to a creative way to help us to get a better idea. Example in this I provide into Gemini give me 5 needs idea for science thread project and the model can give us the example here. Now due to the popular of the LLM especially in the generative AI era many research has been done in the LLM especially also Google was released the latest model on December was Gemini and the last February is about the Gemini the last open model that provided by Google. Google has a long history in the open models and their ecosystem started in 2013 we know all about the word 2 fact model especially in the field of natural language we know this model is very popular and then in 2017 Google released a transformer with this state of the art model until now that we created and used for developing some application and then in 2018 Google released part with the Palmip and 2015 they released T5, 2022 T5X and the latest open model that released by the Google is Gemini in 2014 last February so let's look what is Gemini Gemini is a lightweight standard of the art model surprisingly Gemini share the great recipe of Gemini AI so if you look at the Gemini Gemini was built using the same technology with Gemini Gemini AI so the Gemini has a two size that we release to public that is 2 billion parameter and 7 billion parameter and they have a long context about 8K tokens and there are two versions the base version and the second one the instruction turn version and there are two versions in the size that is 2 billion and 7 billion there are several usage of these two types of size the 7 billion is the use for the GPU or the CPU and the 2 billion is for the CPU or on device training or on device machine learning what is special of Gemini architecture the Gemini architecture was built in the transformer decoder architecture and it is trained with a lot of context there are 8,192 tokens that can be provided as an input to the Gemini and for efficiency the 7 billion model use the multi-hat attention but for the 2 billion we use only the multi-query attention this is the parameter table table parameters that that contains in the Gemini Gemini also use not absolute position embedding but they use the rotary position embedding to have more efficient and they use the share embedding between the input and the output to make model compact and also they replace the standard activation function with the GPU and each transformation was normalized by using the RMS normalization this is the embedding parameters from the model from the 2 billion and 7 billion as a comparison okay, benefit using of Gemini as an open source is we have a lot of community in Google also they are provided in the hacking phase in the Cagle there are a lot of people of use of Gemini right now and we can lower barrier to the researcher and developer to get collaborate to develop a more advanced model from Gemini and also we can drive the faster innovation and collaboration between industrial and academic and also since this is an open model we can examine the model and contribute to make it more better and we can get trust by looking the model since we can look the code as we know that Gemini is an open source language model and also we can we can also tell or customize to Gemini to accommodate what we need in our company or our research and we can explore more innovation based on this LLM okay, now what is the best of Gemini Gemini has best marketing in the academic and human evaluation with the sharing technology of Gemini they get best performance in the mathematics and got things benchmark also the reasoning compare with the open model, the other open model and it widely accessible for the developer you can run it into their own laptops and they also can be used in the cloud environment especially in the Google cloud environment we're using the Vertex AI the efficiency the Gemini doesn't need a lot of resources to run it, we can use our own laptop to run Gemini as our personal assistant if you want to deploy it as a chatbot model in our laptops and also we have provided a safety and responsibility according to Google's strict AI principle to ensuring the safety of use of this model unfortunately but we must when we develop to adjust our model again because even if Google already ensuring when creating the model by profiting data filtering and also the reinforcement learning feedback that create for training the model we still need to adjust because we we cannot ensure that all the possibility that have been made when we use the Gemini this is a table of benchmark that Gemini use compared to the Lama we can see that Gemini surpass all the performance of Lama building trust with Gemini how safety is Gemini AI principle in Google Gemini development has been prioritized by Google to use their AI principle to reduce the unwanted or unstable unsafe issue by applying some of data filtering and also there are a lot of training then human evaluation has been done in the Gemini we can look at detail and the Gemini report and the data filtering also done by filtering the personal information or the sensitive contents when we when Google train the Gemini as the data and also they they use the human feedback to evaluate the model when they train the model to ensure there are the negative or the unwanted situation is minimized by this kind of training and also there are race assessment so they are Google has released a generative AI responsible toolkit to support the developer for creating the AI capability we can look at detail on the model card in the documentation there are a lot of tests already conducted on the Gemini now for the access as I mentioned before Gemini has been released into the hacking phase and Gagel we can use a lot of popular framework like Keras, PyTorch, checks, hacking phase to access the Gemini and to create the model for our needs and it also can run in the laptop desktop mobile devices especially also in the IoT port to make on-device ML because of the efficiency of the Gemini and it also Google when creating the Gemini was also partnered with the NVIDIA and it is already optimized for the NVIDIA start from the data center hardware until the local computer hardware so it can be run smoothly in all the informant and also if you have the Google Cloud Informant we can use Gemini in the Vertex AI seamlessly so we can use the Vertex AI API to deploy, to modify and to custom the Gemini model let's look at the example case that I have been done the first one is about I want to try to create a question answering system with the Gemini only by using the instruction from model only, I do not do any kind of fine tuning in here so the first one I put it the prompt design to have give an instruction to answer to behave to my question answering model to behave like I won't and then I put some context and we can provide a question that Gemini will answer based on the context that we give we use the Gemini 7B instruction model and we use the tokenizer surprisingly that Gemini is created for English but we try to maximize Gemini to another language I try to use Gemini in Indonesian language in my hometown and this is the prompt for the instruction you are a role in a chatbot which name a Talon Tech you are a chatbot for my campus my university, ESDTS and answer it very based on this context if there are any question there is not in this context please don't answer it and this is the context that I give the information about my campus and this is the result the first one is that we can say hello and they answer it about I'm a chatbot from ESDTS but when we ask the Gemini into another task that not related to my university information the Gemini was answered I'm sorry I cannot answer that they already know about the context and when we try to ask about what is my university they can answer what this give some detail information about this and the second one is I do also the fine tuning with my team we and Patrick we try to fine tune the Gemini using some of local laptops I use laptops with standard GPU and we try to fine tune we use the dialogue instruction model we are using the dialogue tune language model which is we will have the prompt into several templates like this the user and the model question and answer okay and the first thing is that we create the context we give the information and then we create the question and answer purpose on the context and the second the third is we format into the Gemini template so that the model can be run and we fine tune the Gemini to answer model we use the Laura and SFD trainer for fine tune the model this is the example of the model of the context I created where is my website my website of my company and this is the second context and we created a list here and we formatted into the question and answer for each context so each context let's say the website of my company I created two sentences that ask in this example of how they answer it the Gemini should answer this question based on this context and we formatted after we create the question and answer pair we format it into the Gemini template okay the first one is we created the start of the turn the user and we provided what this should be saved by the user and we provide the end turn and the start turn by the model what they should answer this kind of template will be used by the Gemini to learn and create to fine tune the model this is the example for the context 0 that I asked about my website this is the instruction that I made and I created the template for the Gemini to chat template become like this so the user I create the instruction I create the question and the model should answer like this surprisingly we use the quantized version of Gemini we only use 4-bit quantization with the Gemini 7 billion instruction version and we use Laura and SFD parameter SFD trainer before we get the fine tune the information was completely wrong okay after we get the fine tuning they can provide the good reaction based on the context that we've given but it's still not 100% correct because the lack of data that I provide for this experiment the result is not so good okay but if you get a lot of data if you get a lot of data then you can get a more better result the third case was I try to add some modality to Gemini since Gemini is a multimodality Gemini was not a multimodality so I try to add some modality to Gemini I create Gemini to have the show and tells from the image so I create using the same architecture with Mini GPT4 we use the Gemini 2 billion instruction model we train, we create a efficient model to create the token of our image feature this is the image embedding we put an image into a clip efficient model and we put it as a token image and with the prompt to the Gemini and we train the fine tune to answer to create the description of the image we use a linear model as an import representation and we use the clip VIT for the efficient embedding this is the Flickr 8 data set surprisingly the result is I think it's very good this is the image that I use from this URL I just create a short poem from this and this is the result and the other one we put a cute dog photos and I try to create a poem like this this is and we push the Gemini a recipe I want to create the recipe from this image and we can get the recipe and the last thing I want to create a song from this image and we get the song we only use one single GPU for this so that's the efficiency of Gemini and how powerful Gemini to support the open source language model development that's will be for my talk and if you want to learn about Gemini please look at this details more further to have more detail explanation regarding to Gemini thank you if you want to discuss for me you can find it okay you asked about the Gemini was wrongly it's controversial last time that this has been spread I answer that Gemini is not a multi-model language model so it's only based on the character or only for the text language model but when I try to accompany the Gemini to add some modality here it's maybe wrong it may be provide information wrong because we cannot ensure the Gemini performance since actually Gemini was not to create for the modality but in my example last time I tried to push the Gemini capability by adding some multi-modality but surprisingly this is I created and got the good result but if we provide another version of input maybe it give a wrong description it based on the but yes we need to tune our language model and we need to test several case to prevent our model give a wrong answer yes yes especially when we do not provide a large of training data so if you only provide a small data like I use in my example actually this is only I use the context information only 100 context and please the data question answer only maybe 200 I think it's not good enough to provide a good result but if you give the model a large of data training for the for the fine tune for this template it will give more accurate information and you need a lot you need a lot time to train again okay okay I guess that concludes this afternoon session and I think there's a coffee break now I believe we convene here at 4 p.m. to continue our AI very interesting I think it's a deep dive into the still diffusion model so by Adobe so I'll see you back here thank you thank you good evening ladies and gentlemen my name is Shan Dutta and today I'm not here to give a talk I'm just here to perform a bunch of magic tricks I'm sorry organizers I'm not a I'm just a magician I was looking for international stage this is the stage you all have been pranked and this is not an AI talk or is it am I saying the truth or not to find that out we have to go back a couple of years and listen to a story four and a half years ago I was in my sophomore year of my college this is the second year and I got a call from someone very high at a very high rank at Amazon US that was my day zero in AI and what she explained me is the kind of algorithms and the kind of problems that they are solving at Amazon at the very core level with the help of AI throughout that two hour conversation on call that I had I was left with one question is this magic so you are telling me that I give you a bunch of data you have something called as a model and once that model is trained you can predict things and you can even predict things that are in the future if that is not magic what is it since the last four and a half years I have been on the quest into finding the answers of what is this magic what are the tricks behind this magic and who are these magicians a brief introduction about myself as of today I work as a machine learning engineer too in the core generative AI team of Adobe India along with that I have freelance contracts with two very popular startups one is Lightning AI and another one is a vector database startup called as LanceDB I also represent Jarvis Labs AI as a global brand ambassador previously I have worked with the likes of NVIDIA Weights and Biases SF as well as a bunch of research institutes like Indian Institute of Technology Kanpur also Kaggle Notebooks Master Top 17 in the world and I have a bunch of top ranks in multiple hackathons if you want to follow me on my socials feel free to scan the QR introducing you magician diffusion think of this example who is a magician what is a magic trick any magician who is there he gets a number of props he learns to how to use these props in the best way possible he iterates over that trick multiple number of times and gives out what is called a performance if you relate it to the world of AI those props are actually the dataset that you are providing to the model the model is actually magician who iterates over this data multiple number of times unless he understands what to do with it and the kind of output the model provides is the result that is there in terms of magician is the performance that you are going to see now before we actually look on to what are these tricks that the magician diffusion uses to bring such beautiful outputs let us see what are the outputs that we are talking about what are diffusion models actually able to do I have two examples over here first one is a very popular text to image generation we have a bunch of open source models now and they are doing really really good these images are from Dali by the way a second very popular use case came out very recently by OpenAI which is called as Sora this is one frame generated from the text to video model and the kind of now these tricks are what make the magic tricks magic work every algorithm you can think of it as a magician every algorithm has a bunch of particular concepts which it uses which makes it unique and so that it can produce a particular output today in this talk I am going to focus on three important tricks or three important concepts behind diffusion models which make it work and make it get the kind of outputs which you saw over here before we go into the tricks I will introduce you diffusion models how they are trained what they are and a very brief overview then we will dive a lot deeper into it to keep the topic deep dive into diffusion models as it was originally now in the case of diffusion models think of it this way I give you a perfectly good painted painting of a very beautiful view of Hanoi and I give you a task that let us see you have 10 seconds every second you can pick a couple of colors and you can put a couple of brush strokes on that printing randomly you do that for 10 seconds 10 number of times at the end of it you will have a completely random canvas with random colors because you have been putting random colors over it in this case in the first image that you are seeing you have perfectly clear image in every time step for now let us take the analogy that this is the image at the 0th second at the first second I add some bit of noise to it at the second second again some bit of noise we continue it throughout to t time steps for this example let us say we are running it for 10 seconds so at the 10th second we have something which looks completely like noise in that kind of painting analogy you have just random colors on the canvas you cannot make out what it is in the reverse process what we have to do is that given this noisy image I have to figure out a way so that I can come back directly to this good image it is like you have a random color canvas painted and you have to undo the entire process which you did and from just some random colors on a painting canvas you have to go back to a perfectly good looking picture right this is very straight forward pick an image add some noise to it progressively and then again in the reverse process the name itself is reverse process you reverse the entire thing now there will be a bunch of questions which will be in your mind the first one should be before that I will tell you a little give you some bit of mathematical notations these will help us with a bunch of formulas that will come up ahead now if you are scared of math don't worry I will try my best to keep these things as simple as possible diffusion models has I believe the most amount of math any AI algorithm has had I have derived equations handwritten equations of over 200 pages and all just to complete this entire presentation I will give show you some of those derivations but not all of that is possible in 25 minutes for sure so very simple it's very basic you have the original image ok we represent images with the variable x since this is the first image we are computer science engineers we will give that a 0 and not a 1 so x0 is the first image that we have we go for t time steps in our example t yeah if our example t since we are starting with 0 in our example t capital T will be 9 ok so we go from 0th second to 9th second in the 9th second we have completely noise in the reverse process we are again starting from the completely noisy image at the 9th second we go back and reach the original image that is it that we are doing overall now the tricks that are behind this there are a bunch of them the first thing that should come to your mind is that Ishaan I am adding noise gradually to it right why not just pick an image add completely noise to it and then learn a mapping which can learn from generating image from that noise why do you have to go in steps you can just learn one mathematical function which is complex enough that can convert any noise into an image what is the need of going in steps the second one is called as a noise schedule so we will come let's come back to that later and the third one is called noise schedule now most of these tricks are relevant more towards the forward process and the reverse process as well to understand why we are taking steps think of this example let us say you have gone on a vacation like for me this is also kind of vacation this is my first time in Vietnam so I am staying in a hotel now I look for a couple of restaurants nearby to have some good food and some good Vietnamese cuisine some lovely food over here thanks a lot for this and I find this restaurant which is at the other side of the street okay now there are a bunch of ways to reach to that restaurant the first one is that I think of it that I am staying here these are all roads by the way and I can reach somehow directly to that restaurant is this path actually possible the one which I have drawn is this possible or you can just nod your heads if you think it is not possible right think of it as second see this talk is also here to develop your intuition behind why these algorithms work I have seen a lot of blogs I have seen a lot of youtube videos on this and I have seen very few people try to give out intuition behind why things actually work instead of just like putting out formulas that they are there this is not possible because if you think what makes you say this is not possible it is your brain telling you that it finds no way to calculate a way it can make you reach directly from that road to here because there are corners you cannot jump from here to here so your brain actually calculated this for you and told you this is not possible if I change this path a little and I give you this path so you are here I give you step by step directions you go straight take a left again go straight take a left and again go straight is this possible or not it is right why because I gave you smaller steps you are now able to your brain is now able to calculate ok I am standing here I have to go there then I have to take a left then I have to take a right and it is more calculatable by our brain in terms of machine learning and statistics we call this as tractability tractability is when something you can actually calculate using your machines the last example was intractable we cannot calculate away this is tractable because we are going in small steps in the case of our processes that is the main reason that we are doing it if I turn a completely good image into completely noise and try to go back directly just without taking any steps just in one time step that is an intractable process there is no way you can calculate something which is capable of doing that that is the reason and that is the first trick which magician diffusion uses the second thing is called as a noise schedule so when I tell you that I add noise in different steps you might think of ok but what is the amount of noise that I am adding it is like if I am giving you the painting example you are perfectly good painting and I give you one brush and a bunch of random colors so how many brush strokes am I allowed to actually paint in every time step is it 5 can I do 10 brush strokes can I just throw a bucket of painting on the paint picture right how many of it so there has to be a way that we make it actually calculatable the first authors of the this comes from the paper denoising diffusion probabilistic models the terms are very heavy but when I break it down to you it will be ok very very simple it is denoising because you are basically removing noise ok it is diffusion and we will come to that later the first method that they came up was with increasing things linearly so let us say I allow you that for the first time step you can add 2 brush strokes or the second time step you can add 4 brush strokes and you can go on and go forth till 20 brush strokes and your graph will look something like this ok it is a linear graph you are linearly increasing it but there is a very big problem with that if you see this image clearly so you have a dog over here very cute looking dog until we are close to 40-50% doesn't this already look like completely noise it does right is there any point of going from here to here no right it is like when you are taking a grab taxi you have already reached your destination but the driver wants to increase your fare so he will just make you go circle round and round and round so that the fare increases but you are actually at the same place that travel is not taking you anywhere right because of that problem the authors of open ai came up with a different paper which is very critically called improving denoising diffusion probabilistic models ok and what they did was they came up with a cosine function and a cosine function helps you with gradually increasing the noise if you compare it picture by picture you can see that the kind of dog that you can see here is a little more clear than the picture that was there ok now this makes that step by step thinking even better for the model and we are completely going to noise here the major advantage of this was in this case the original authors had to take 1000 time steps to reach that noise with this method you can do it in just 50 time steps that's an amazing efficiency so noise schedules are the second interesting trick that diffusion models use now remember we represent noise schedules with the term beta ok beta because we will come to that later now to understand the third and an extremely important trick which is called as the reparameterization trick we have to understand the math behind diffusion models a little more I know everyone might not have the same math background and so I am going the formulas will be there but I will explain it in a way that you can understand what that formula means instead of going line by line what it is doing so the forward process was this you are just adding noise step by step until you get a completely noisy image so this is how the forward process is represented so if you have to give a variable so at every time step what am I doing essentially think of it like this I have an image it adds something it adds some noise to it and gives me another image that output becomes the input for the next time step and so on and so forth we are going till the end this can be represented in this way let us say t is equal to 3 over here ok so t minus 1 is 2 I give the output I give the output at the t at 2 seconds whatever image I have after adding noise at 2 seconds and I want something a process let us say q represents the forward process and it should give me x at 3 I am giving a previous image I get the current image so the kind of noise that we add is called as Gaussian noise Gaussian noise is a very interesting kind of noise I will show you the graph and why exactly we are adding it but to understand this notation this looks very big it is extremely simpler and the authors have made a lot of things constant so you don't have to worry about it to understand the notation this is the actual symbol for Gaussian noise I call it the stylish n and you have x of t the output so if I am at t equal to 3 seconds I am getting t equal to 3 seconds output with the picture I had at t equal to 2 ok now this is called the mean and the variance so to understand what are mean and variances let us say the average age of people in Vietnam here is 33 years old ok what is the meaning of average or mean think of it a little more let us say I have a number line 0 to 100 and you have divisions of one year each ok 0 1 2 3 4 when I say that the average age of people in Vietnam is 33 it means that since this is a college the number might be different but in general in the entire country if I ask someone there is a very high chance that they might their age might lie somewhere very close to that number 33 right so if you imagine that kind of number line you will see that a lot of points are actually concentrating close to that number 33 so that is the mean of a distribution everything in the world is a pattern your age where you are coming from ok what you are learning what you are reading even your daily schedule is actually a pattern and machine learning is all about recognizing those patterns ok now that is the mean ok you understand what mean means variance is basically how far does it spread so if you think let us say this is the number line you have 0 to 100 and here somewhere you had that 33 most of the points lie here but how far those points lie ok that gives your variance if the variance is very small most of the parts will lie close to the mean if the variance is very high most of the parts will lie very far from you that is the kind I will show you the graphs as well but for now this is the mean beta had told you the noise schedules that were there so the kind beta is the amount of noise that you are adding at every time step ok and beta t of i this is variance anyway this becomes constant later so you don't need to worry about that for now now let us think of what is actually happening so we say we are using the term q to represent one step of the forward process as defined earlier x of 0 is the first original image that we have we pass it to something which is a forward process to q1 ok and we get that image at time equal to 1 and we do that n number of times right so it's like a function of function of function finally till we reach q of t ok that is the q of capital T we are reaching now if you try to see this as just with one time step let us say I am at the first image x of 0 and I want the next image which is at first second ok the formula will look like this the output is x of 1 in the mean we have beta 1 and x of 0 and in the variance we have beta 1 of i ok now notice these two terms very carefully i is identity matrix by this by the way it is completely you can imagine it as a diagonal matrix is completely once so it's a constant now if you go to the next step where we have the image from the t equal to 1 second and we want to calculate the image at t equal to 2 seconds we have output x of 2 we have beta 2 we have x1 here and beta 2 i there is a very interesting thing and there is a pattern over here ok the pattern is that these two terms if you see beta 1 of i and beta 2 of i you already know i it's just a matrix which a bunch of once arranged diagonally ok betas is something which we have already calculated because we have a fixed formula for it so this becomes a constant kick it out we are now just concerned about the mean ok now the authors added a bunch of more notations to it i will tell you the reason why now if i tell you that i have to go for a thousand time steps so every time i am doing a training loop ok and let's say in this training loop i am at the 500 second so i have to add that noise 500 times to reach that particular output let's say in the next training step i am at 500 for a second so i have ok so i have to add that that number of times so to solve for that the authors came up with more notations you can replace alpha t with 1 minus beta t and you can have the product of all alphas from 1 to the last time t as alpha hat of t if you replace that ok i replaced alpha t with 1 minus beta t in all of it and finally the mean of that equation was this and you can just go to t minus 2 t minus 3 the final output which you will get is that alpha t bar of x naught now what is the benefit of this is that now this equation is represented in this way ok you see 1 minus alpha t is gone this weekend we will find out how to get it but this is x naught so the benefit of this equation is that now i can give this equation any t because this is dependent on x 0 only and i can get the output so if my training loop let us say it wants the image at t equal to 500 i don't need to go for 500 steps i have a formula which will give me the image at the 500 step that makes it super super efficient ok but there is a very big problem with this this is not you cannot calculate the derivative for this ok you will say that wait man you have been talking about Gaussian distributions but if you can't use it if the neural network can't use this to walk it's like i gave you a path but that path does not you cannot go to that path ok it's like let us see you're trying to learn a new recipe ok and i give you five ingredients from a shell with a shelf with a thousand ingredients and you have to learn to make a recipe now the way to go about it is that you try different ingredients you replace one or two of them with different ingredients and see what works and what does not if every time i'm giving you five completely random ingredients will there ever be a day that you will be able to reach to the correct recipe or not there won't be because you won't know that because of which ingredient i'm very far from that recipe so that is the same thing which happens over here if you are just pulling it random examples from the distributions you won't actually reach anywhere so what is the re-parameterization trick is that if you have a standard normal distribution ok with a mean of zero ok i talked about that graph before there's a mean of zero and variance of one if you see these graphs all of these have been derived from this ok how you see in these three things in these three graphs the mean is mu represents the mean sigma square represents the variance the mean is the same zero but we have just moved the variance so it has become wider ok if i have changed the mean from zero to minus two it has moved towards the left so what you can think here is so what you can think here is if you are actually adding or subtracting something to the mean you can move it the graph left or right and if you are actually multiplying something with the variance you can make it get smaller or larger so that is the final re-parameterization trick that you have where you can actually you have the mean and you have the variance you keep the mean as it is and instead of actually using that distribution you can just multiply something with the variance so using just that one graph where you already know the mean and variance you can find this as a product form so this is computable why because in this case these are two random every step you are pulling this randomly this is like random ingredients that you are pulling for the food and you are trying to process it you cannot come back to it but i have kept the randomness out of it that is epsilon so now this derivative can be computed ok now i think we are a little out of time so i will yeah this is fine i will give you a very brief ok i will give you the very brief of the reverse process what actually we do actually we can just go directly to this one training and sampling how that happens yeah we have two minutes actually no worries anyway whatever tricks i told you the same will be applied over here in the reverse process i had a bunch of formulas but anyways you will need the derivation for this but the training algorithm is very smart the training happens in actually two ways the first way you take an image from the data set that you have ok you take any time step from a uniform distribution ok any time step i gave you a formula on which you can get the image from any time step that is possible and you take epsilon epsilon represents the noise from a normal distribution and then what the actual model tries to learn is that the noise which i have added ok can i remove that noise so it is actually trying to predict the noise which was added in that time step to try to remove that noise that is something which is it trying to reduce so it does that till n number of steps ok till it learns how to remove noise from that particular image and that particular time step and in the final step we have something called as sampling so once you have learnt a model which is capable of removing noise then what you can do is you sample any random so this x of t is sampled from a normal distribution just random noise ok you go from a capital t towards one and you repeat it till you reach one you have and then using this formula you can actually calculate how to go for the x of t minus one i couldn't cover that because of the time yeah but this is it and yeah that's it then so training is basically you first train a model which learns how to remove the noise and secondly you use that model to remove any random noise and make a very clear picture out of it thank you everyone for listening and now that you know the magic tricks you can learn to start the magic if you want to follow me on my socials this is the QR code anyone has any doubts or questions you can ask me separately ok thank you Isha so this really technical deep dive ok next we have Rohit from Red Hat thank you everyone for staying so late it's already like 4.30 after tea break everyone is here am I audible ok alright thank you for who are here to attend the talk for my next talk so I am an asset quality assurance manager at Red Hat and this talk is mostly to give a brief introduction like how QA testing can be used by in-house rag model deployment and why do I need to specifically use rag because training of model is pretty difficult these days time consuming as well as it's expensive so let's look into this how we can use this technology to better efficiently do software testing alright so this is about QA reasoning and AI in action when we talk about artificial intelligence what is artificial intelligence what is the intelligence over here so it's more about using your knowledge and the actions which you perform by reasoning to solve a problem the intelligence lies over there but the large language models which are generally building used by transformer they are only for predicting the next token so how can we use it intelligently to perform our software engineering task we can use something like giving it a capability to perform some actions of it we are not just creating some language but we have to perform some actions out of it and we can use the concept of agents in by combining by combining with the large language models by chaining it and giving a persona of a QA engineer like you have to think like a QA engineer you can well train your templates query templates or prompts to give the context with the query to get the output resulting output when we talk about knowledge from where we get the knowledge it's all about either the model needs to be trained or we need to provide the context to it and when we say context we need some kind of input source to it that's why we use RAG what is RAG so RAG stands for retrieve argument and generate the retrieve is retrieving relevant document from your source the source can be any backend database the source can be PDF document the source can be CSV files but to retrieve it you need some kind of backend storage so that you can just store it somewhere and what is argument argument stands for query with context so you have your questions but you want to get an answer out of it so you need to provide some context from which you can take the help of large language models capability of generating the next predictive token and generate here refers to the inference from the LLM and what inference is when you have to query a large language model the process is called inferencing from a model why do we need RAG mostly it's for hallucination problem which large language models do it's not a bug or an issue with the LLM but it's a feature of LLM because it is predicting the next available token out of the question or the best possible word which you are asking for so if it doesn't have a context it will just try to make up the things so to solve that problem you can use your in-house RAG to provide it a context and it will help on the accuracy for the next predicted token which you are asking as a question you get data privacy out of it because you are not sending your data out anywhere in the internet or training your model out of it all your data is under your in-house or on-prem premises you can trust the data source you can because it's quite faithful you know the source of from where the data is coming from then also you know the retrieval time retrieval time as well as residency that means you can trust your data on the real time your backend database can be updated and you can have a real time context curing rather than your old stale data on which your LMS are being trained and it's a time consuming to train again on a fast moving application these days so it's better to have a real time data which is updated and then when you put into the context of the query you get the relevant answer and responses from it you can also trust the source because it's just not a retrieval of the document but also it helps in providing you the source of the document from where exactly the document context is coming from and when we talk about RAG we also need to think about what is the best possible retrieval method we'll cover that in our next slide so vector search is when it is one of the uniquely prime thing for RAG models so before that vector search let's understand what exactly vectors are so vectors are nothing but a large numerical array which stores the information about your text it's all mathematics and let's say if you have some point in a space x, a point a which has a three dimensional representation like x, y, z you have the representation of that number but in terms of text when you define some word and you make a vector array for it it has information of characteristic of that word so a three dimensional have only three specific characters but when you talk about a multi-dimensional array it has lot many information and we are talking about vectors of size 386 dimension 786 or 1024 1586, 2000 so these chat gpt 4 and all have a dimension of more than 2000 so you can understand how much information is being stored over there and the process of detecting your text into this numerical and dimensional arrays called embedding and the resultant data which you get the embedding the embedding which you get is also called as embedding so both process and the resulting embeddings are called embeddings so now think about what exactly is vector search so vector search is like something when you when you have text over here and this is one of the plot which I have created from using matplotlib and try to slice and dimensional array to a three dimensional model where I try to represent some words like so let's say my database the vector database has words like husband, car, shark and I try to retrieve a value wife so the next possible retrieval relevance document which I will get with wife and husband so it's directly connected with husband when I try to search for Tesla it will try to and if my database has car, Honda and bus it will try to find out the relevance document for Tesla and it will show me as a result as car is a relevant document for Tesla and Honda and bus this is not a keyword search but it is a similarity search because it's trying to identify which is the next closest thing to your query so that's how you can do and it uses like multiple ways to do that there are usually L2 distance that means it's try to calculate the dimension minimum distance between the two closest search it has inner product means you are just trying to find out the exact value and then you have cosine distance which is just like calculating the angle between which is the closest angle between the next search and then you have a fixed value so coming more into this vector embeddings I'll just take one more minute what exactly the vector embeddings contains so it contains syntax that is what does the pronoun verb adjective of specific details about the word it contains the semantic that what exactly it means so let's say Tesla or car so Tesla it has magnetic flux density it has many many values so these all information are being saved in these numerical values it also understand about the occurrence it also understand about the like semantics and relationship that means let's say if I talk about what is false issue and my embedding model which I use for creating embeddings for my vector has information about false issue so once I do a false issue search it might the next closest thing would be conference probably it can be like open source conference something like this so that's how the relationship is being stored so think about the information which is stored in a vector size of 2,000 dimensions I only know four dimensions after watching interstellar that time and gravity but it's like 2,000 dimensions so you can understand how much information does a single vector contains I can also show you one quick pgadmin database picture where this is the text no matter how long the text is okay I'll just make it bigger no matter how long the text it you can create vector embedding for all test and this will be a dimension of 2,000 or whatever you give as your embedding requirement of max token then all right once it comes about once we have understood about vectors let's talk a little bit about the reference model what exactly the reference models are for for which we are working on the QA point of view so basically when we talk about reference models think about a large language models are just for generating language they are language generating machines when we talk about a text generator or a text model text generation model it's just predicting the next possible token value or the next value of world which is closest matching to it and when you query it you are just going to get an answer for next token which is very near to it so using so this is one of the model like when you query your chat gpt or any model what it happens is like it directly goes to the chat gpt for inferencing but when you use a rag model it first checks the information from your vector db so you write a query it will convert into the vector embeddings because that's what it's trying to retrieve from the vector database after that you will get after this step of vector db you will get some list of relevance document you can use a re-ranker over there for another embedding model to get the most relevant document out of it it will now regenerate into the template with query and a context that means you are just now sending to your model for inferencing a prompt which has a query as well as a context to it your LLM model doesn't understand English it understands only token and tokens are again a numerical representation of array they are not the embedding numerical but they are other numerical input so your all token text is divided into numerical values which is again sent to you for inferencing from the LLM the we only deal with the embedding layer of LLM when we inference from the value we only deal for training and for inferencing we only deal with the embedding and the output layer and embedding layer contains the vector dimensions for each token so that's how you you get information about vector thing the other point is like attention layer attention layer is the hidden layers on the neural network which are used to help to predict the next possible token the output layer once your inferencing complete your output layer sends back your token again and that's been decoded to give you a response there are few of the use cases from the QE point of view where you can use it's for test case generation searching so test case generation can be used by using the capability of large language models you can provide the context you can configure your prompt in a way that it can be used to generate your test cases I do have a demo for it but I don't think so the part time permits we can also use the re-ranking model for similarity search where we can use it to identity we can use those similarity search mechanism to test tag that means you have a database embeddings for your test cases and whenever there is a new PR feature it will just tag your test for the relevance test cases user story code coverage test log analysis, text execution LLP, code coverage change and using models to test the LLM response itself I'll just take one moment there are different tools you can use for inferencing with the models you can use hugging phase where the repository of the model is agents which you can use which uses inferencing from the LLM to take the next decisions out of it you can use PG vector and PostgreSQL combination for storing your vector embeddings Jupyter notebook and Langspit and Arise AI is like observatory tools for your LLM where you can get more information about how to debug your LLM response it I think I'll be missing this part because this was one of our application certification where we have a mistral model deployed and we can use this model to generate test cases so when I take test cases my vector embeddings are being retrieved from my backend site which is like I got a similarity search of my test based on my documents and these are the relevance document but again I need to send my LLM very selective data to generate so I'll just use a re-ranker to prioritize what document I need to select I'll configure my prompt in a way that okay write a test case for KDEM test this is the sample and these are the documents that you have to use this is just for the demo purpose and now I get as a question and the answer I'll get the final test case generated over so this is my final test case which has been generated okay alright I'll just not do the agent part but these are the reference which you can use for Lang chain, hugging phase, PG vector, Arise AI, Lang Smith and Gina AI which is one of the embedding tool alright you can thank you for staying so long and stay connected thank you hello everyone thank you for coming my session today I'm my presentation is now is the time to start to text I want to my presentation and text and let me introduce myself my name is Hyeonah Yoo and I made IT seminar and contents and I'm Immunocode Seoul director is global community for women in tech industry and I'm organization text care and I'm interested in AI education and community if you want connect me follow my LinkedIn today's agenda is what is the text and text in Korea next simply I'm text 101 next ecosystem and future first what is text text is Python library for high performance numerical computing and large scale machine learning simply text is Python library for AI and advantage of text first there are function for machine learning engineer and AI researcher such as such as automatic differentiation and parallelize second text using XLA so save the money and time text compiler is very fast because you save time and money last is just joking text is a new technology so if you can text you can became AI hipster I'm hard question it is better than Python or TensorFlow it is very difficult to answer but recently now using text in closing for example do you know clock yes clock is generative AI or chatbot by XAI clock is used high-quality text framework and hugging piece and Google remind use text if you searching text on hugging piece more 10,000 search results on hugging piece and hugging piece is making and flex community week so many people is using hugging piece and text on do you research AI Google remind made text and paper is text for example Google remind paper VIT text nerf or sorry palm is using text second we are now let me tell you in Korea text Korea is many people is using AI and the number of people is using AI after chatbot was and so this question is program research in Korea what framework or library to you major use many developer by other developers answer pie touch or TensorFlow it said here is not text and it is Korean IT community from Facebook user group TensorFlow KR is major and second pie touch Korean user group is 1.5K because pie touch Korean user group is discussed and popular Facebook group last it is text text KR is have 10 164 people there is many important people and great people next I have Korean joking it is Korean is a very small country so it explain it throw 10 10 of Star Trek 10 of it is totally 12 I think in AI AI field next if you can use pie touch or TensorFlow I think try text next I am simply text one on one explain but enough I have done enough time to everything so maybe on pages first text is autograd together high performance numerical computing and text is public API transport anywhere first text is API and second text is very important automatic transparency and parallelizing last is same for CPU, GPU and TPU it is very good text is likely non pie if you want to use text import text and text that non pie as JMP two is very similar but different thing is modify is very difficult non pie is simply you can change argument but text is not argument because text follow functional programming and text if you have side effect it is done so if you want to change argument other have using their code and second text use in compiler another compiler uses this but text is in time JIT compile code at run time full speed or if you use text decorator or text. J-I-T so text is XLA use and just in time we see this code first line is not use the just in time compiler second line is use the just in time compiler this code text.JIP so 3396 microseconds fast and it is it is three is autograd if you want to using automatic differentiation text.grade it is this page is fast and next is vmap vmap is auto-materialization if you want to vmap using text.vmap first blue box is just used for and next green box is transpose the weight after and with text non-py API last purple box is use vmap three box line is purple box is very fast next is low-level API then the text API it is um fast fast if you use text it is very important because text is functional programming so you can remember pure function it is always return the same result give the same input don't create cyber effect it is very difficult and Python emulator is making cyber effect so it is very difficult and if you tracking if you want tracking text make.jx PR use and next um um ah if you when using text if if there are semantic argument to use partial it is use the tensor flow so it use a lot so remember like I recommend remember 7 is random number 5 is simply random dot seed make the random number but text is first making random key but after split it is because functional program um and um wait then use side effect after you can control the testing environment next is state less class you can use text you remember stable less class you make stable less class so Python if you use machine learning the solution is this explicit state so you manager managed output sorry next is pytree it is uh instructor it text use pytree pytree is a tree instructor that create Python object container container container include list dictionary and tuple uh next this is pytree use the example uh if you machine learning if you make you text uh you manager wait and um bias you can use the pytree and next ah last parallel evaluation index if you if you want to use tpu code text not changing code and uh it is it is what's it um pmap compile itself to there is not need for text.jit or decoration decorator last summary is text has a very similar sympathy to numpy and text biggest string and sustain time and autograd and pmap and vmap um if you want to split um if you want to make a random number or you remember split key and next is using when you when using text as a neural network you need to make it a stapled this class and when getting the gradient you remember pytree and text is functional program so you need uh remember pure function be careful uh the last is text ecosystem text framework is include flex and haiku haiku we remember it important is haiku is maintain maintain now is maintain mode so um google mind will recommend flex in standard of haiku and if you want text for ai of text optimizer alex or test use actually today's detail is very not good so i recommend him youtube channel he bought this youtube channel is made youtube video made in 2022 so now changed version uh be careful but his explain is very good for me better than me next what will happen to text in the future? i think more research and papers and service using the text and carousel 3 is multiple and use the text so many people is more like easily like easily use the text so i think text is very good api so and i i translate for documentation into korea and organizing text care and actually writing a book about text with my plan i want you join us everyone can contribute to text let's start small thing if you are interested in open source or code contribution text is many tasks you can try thank you thank you my presentation before we get started and as we are settling down here i am going to invite you guys to join the conversation scan that code describe what you think an intelligent assistant is what is an intelligent assistant we have been seeing all these talks the whole day you are on a software you don't want to get some assistance but you don't want to get assistance from an individual if you have taken the poll go ahead and raise your hand so i can see what the responses are you should be able to go to slido.com and put that poll in there and i am curious to see what you guys have to say if not i should be able to pull that up we do have around two or three questions it is working in active time make sure you hit submit so we can see something a tool that automates tedious work great time consuming and want it to be automated another person is at least trying is there a lag from when you submitted the answer to what you are seeing on the screen there is a lag you think there are four participants so at least when the typing stops we will stop there but let's see what you think an intelligent assistant is very good thank you so much for participating there something that understands ideas and can give you some advice helpful assistance for mundane tasks great responses here good mentor, something that teaches you something that makes sense right and also a personalized secretary as well so we had five people i will definitely put these results up for you so continue to take this poll then the next question i have for you guys is how important we are talking about an intelligent assistant being kind of like a mentor training you things teaching you things how important is it for an AI assistant to be actually usable right so let's rate some of these six if you are in i can see at least around 15 or 16 people in the room if i can get more of you to participate that would be great we have very few questions it's not going to be all about question taking so go ahead and scan the code it's very important for it needs to be usable if it's not usable that makes no sense it's going to teach you a lot of stuff but it needs to essentially be it needs to apart from usable there are other things too that comes to mind so this is kind of like i thought this was kind of cute is getting so intelligent it was in the news yesterday we are all going to be replaced by AI very soon and then if it starts doing things like this that's going to freak you out that's too intelligent we don't want it to be so intelligent we want it to be the right level of intelligence in fact there is this whole concept of being even usefully wrong we saw some talks about hallucination we saw some talks about going through the process of training within the inference formation it hasn't really gotten all the data points to give you accurate results it will actually give you results that may not be accurate but that's just enough to actually tell you give you some information about how AI is thinking per se i know i'll need volume and sometimes i'm trying to look at that apart from the training model and apart from limitations in your data algorithms and any kind of training as it's catching up whether you're in the process of testing there's one thing that can actually cause us to use our trust and that's poor design and that's what we're going to talk about today as well so i'm just curious if i'm talking to any designers in the room i know there are lots of coders in the audience anyone who designs a solution and by that i mean the actual interface so it's not so much you creating that algorithm it's not so much you kind of giving the AI the questions that it can answer but actually designing that interface anyone here? we have 8 people 4 of you more people designing solutions great around 55-45 the reason i asked that is tell me if you're a designer or a developer are you a developer designing are you a designer writing code which we do in open search most of your developers great so no designers it's also good to know who i'm talking to we see a lot of good design being generated by developers i'm not discriminating here anyone else anyone who's not a developer here we have 67 and i think i can scroll down because 30% has to be somewhere alright cool so we have some pms here we have people in the community organization that's good to know who i'm talking to predominantly designer community alright we have a dedicated predominantly developer community we have a dedicated design team and this is what they do so when you think about our design process per se we have a lot of solutions that come up with especially when we're designing for ai this is our product management life cycle process right you start and then i'm a researcher i'm a senior researcher at open search we start with a lot of discovery work with ai we really got caught in like a storm overnight it's not like open search cannot solve for ai solutions in fact we have different ways in which we solve for it given chat jpt given the whole the way ai kind of blew up we started to run really fast so this discovery process we did do it i'm actually going to sprinkle this talk with a couple of findings there but it was not in the true sense of taking that discovery off there is a consumer need let's take that and that take that to product development we took that and we started using what we could but we didn't do the classic ideation process this second kind of bucket here is where you do rapid ideation if there were designers here they'd be very familiar and perhaps you guys do too just show of hands we didn't do a poll for this anyone familiar with Figma here have you used Figma for your UX and stuff you'll use that right so typically you work very close and now that you have Figma Figma things that you use that quite a bit for ideation and also get the user involved in your design process as well and final and this happens side by side as engineers are testing their models writing the algorithms curating their solutions on the front end we are doing this stuff and finally you have this kind of evaluative we're not yet there because we're still designing solutions but evaluative is more perceptual asking users this is to solve the problem that you wanted it to do or these features that's you know functional useful so on and so forth so for this talk we're just going to focus on the second part which is ideation and ongoing improvements involves rapid prototyping a whole host mixed methodologies as well and when you think about that design element remember early on I told you when looking at the front end of AI right you're not talking about actual accuracies or reliability while that is useful what we are talking about is when you evoke AI when you're calling AI when do users want to talk to that AI model versus you know say you know I know what I'm doing leave me alone persistence where do you want to see your AI assistants and then also we'll talk a little bit about icons as well so here remember I told you we did some discovery work and this is something that I've spoken about at length in other conferences as well but just this survey here is one in which when we ask them you know you're going to this was very early in the process we ask them like what do you want to do do you want to see the AI persistent all the time do you want it to come and go and what was interesting here is it depends on the user type you guys most of you you know kind of self-identified as a developers there are groups of people so for those of you let me also do this I never asked this as well how many of you are familiar with open search okay cool so because I'm assuming your developers and your user open search you're doing a lot of stuff that we call you as a persona type to be producers right you're wiring the data you're writing your queries and then you might create some you know visuals and put it on a dashboard right so when someone just comes in reads the dashboard does a query they're consumers you're more in the producer type so when we have these two different personas asking them when they would evoke what and why they would use AI we're kind of different in terms of what they indicated to us and then we have kind of like that split in terms of how many of them wanted it to be persistent it's also really task based as well all right and then I'm somewhere I hope it's not 30 five minutes we can get it done and then we also ask them how would you want to evoke it whether it was a button a chatbot as you can see it's all over the place here but predominantly people wanted to see a button and then they would also like kind of to ask for relevant documentation do search queries in natural language things of that so this is all that past research there one of the things our designer wanted to test is in terms of the icon I'm going to see if I have volume if not I will talk over this all right I don't think we have we do not have volume but I will talk over essentially the user is right now testing what we're offering them there's this little icon on top we ask them what they think about it they'd be curious they expect a tool to put a pop-up to show up and so I think in this particular thing we actually walk him through that particular icon they go ahead and click on the icon and then they see what they're seeing they see a generative AI kind of dialogue show up it's not a drop down and then they didn't expect to see these suggestions here that's what our designer offered them like these are very popular kind of like formats in which you say other people have asked these questions it's also training them on how to use your AI you know your assistant as well and then after consideration the user actually talks through this and tells us how he does find it to be useful and things like that all right so now I want to go to guys to rank order these things is that we did use we did usability early on we asked you how usable you want it to be go ahead and tell me about accuracy reliability speed and meaningful answers as well and as you're doing this I'll tell you where we're coming with this our engineers our developers focus a lot on accuracy and reliability something that is less thought of is actually meaningful answers some of you talked about inference formation complexity of inference formation and how that interferes with the coherence of what that model is putting out that's exactly what we're seeing here when you show and especially when it comes to high code developers who are doing query in natural language and the information that's put out is not meaningful that's not very useful at all so yes so while we started off initially seeing reliability accuracy and speed are important eventually we got down to actually having meaningful solutions which means our models need to be more complex from a semantic search perspective alright and then I did ask about this this is more for me to tag who you guys are so most of you this is an AI track so it just makes sense that you're designing for these solutions as well so thank you for that and I as I said this is the last question in the series like I promised alright so what I'm going to do next is show you in terms of actually showed them this particular design you can see there is a main this is what they're doing for those of you familiar with open search they are running their queries and looking at the logs here and then they have this model in which they're talking to that AI again I don't think we have volume but essentially this particular user talked through how the responses would be relevant and then they were talking about the primary screen so what they're seeing here needs to be relevant to the actual tasks that they're doing and then they were talking about other kinds of like you know if they saw something up here and they asked a question up there that would be more of a global search where something in a conversational style would be more in terms of that particular search that they were doing right there right and again it's very useful to see what the users did back to that survey we did initially you see a lot of anchoring of the location you know people wanting to customize open to location very firm about having the location of that assistant on the left right top bottom so on and so forth and then also the format of the interface whether it needed to be primary format or secondary format for people who are writing queries when they write queries in sql PPL all that they actually prefer to have that as that main window but then they want to see how that query updated to make sure that the you know the AI actually did make some meaningful change and then other things that were interesting is the nature of communication most people assumed it was text very few people assumed it was it was voice and then also when we asked about you know the style of language most people wanted it to be natural language keep in mind these are people who are coding day in and day out and they needed some formal language they just go to their own language so that kind of was telling there as well right and then this one I'm not going to talk through because that's a summary just to summarize do test iteratively this is just a sample of what we've done definitely use rapid prototyping and also consensus and with that I conclude the talk but I will leave a slide on on some of the useful tools you can do in testing and take that in as well and thank you so much for having me okay okay thank you so much so I think we'll come to the end of the AI trade let's give ourselves an applause so you are all now AI experts