 Hi everyone, I am Sachin Punyani. I lead the business development for deep tech services at AWS India. Today we are going to talk about JNAI on AWS. We know that JNAI is taking a big hype these days. People are talking about a lot and there are data points to support it. There is reach actually. So people are talking, giving different data points that how much it will impact in the operational efficiency, in the productivity. McKinsey says that it will impact 15 to 40 percent, bring the impact of 15 to 40 percent. And then there are revenue forecast as well. But how exactly it is going to impact, that is what we are going to focus on today. So we will start actually with one example that we know that we have been facing in the past. We know, we go to the retail websites, when you search different products, they recommend some new products to you. You go to the education websites, they again recommend some of your courses that you have been doing. Based on your past history, what you have been doing before, what you have been browsing before, based on that they give you some level of personalization. Even on the Netflix prime video, there are many places where you, based on your activity, based on your browsing viewing activity, you are recommended new video content. So that personalization is a problem that people have been trying to solve or have been trying to implement in their solutions. But how now this new tech called JNAI impacts it, we are just taking an idea here. So if you look at your classical or the recommendation systems that have been working till now, how they work precisely, they have the item data. In case of education institute or education website at tech organization, maybe the list of courses, what they are offering, subjects, kind of teaching methods, all those things. Tail sides, different products, descriptions are all there. On the video side, we took example of Netflix or YouTube or prime. They know that what content you have been watching. And then they also have users data, that means your data, what you are browsing, what you are doing. And the activity that you are doing, I mean, you, if you like a video, you click on that, you watch it for some time, maybe if it's a 15 minute video, did you watch it for 5 minutes or did you complete it? All those things. So they have the item data, they have your data and they know that how you are interacting with those items. And then they try to build different patterns, different machine learning algorithms that actually give the recommendation. But now with this new JNAI thing, how things can be improved. So JNAI, how can it augment, I'll say the existing system. This is what we are taking as an example to understand. See, so far information or recommendation that whatever is happening there is happening based on past data. But suppose I have been watching some particular genre of videos, but now suddenly I want to listen some specific song. That's my requirement or that's my interest area, that's my wish. How will I get to know the recent thing? Or maybe some other information that is right now running in my hand. You want to capture more user insight. How will that come? That might come with the conversational system. So interactive recommendation mechanism that we talk about if it comes out. There are many more points that we are talking as like you might give the personalized descriptions. I mean, right now retail sites, if you go, they have standard description. But it might happen that those descriptions are changed based on the hyper personalization as we call that these descriptions are changed to user's taste or user's interest. So many more impact, but let's focus on the first activity here. Like how JNAI is going to impact. JNAI will impact all those things that we're talking here. But understand it with one use case. In the classical recommender system, we talked about like there are items data or the user's data or the interaction or rather all of them put together than the decision is taken. But when we talk about JNAI, the large language models or the foundation models, these are the terms that you will very commonly hear. These are the machine learning models which take large input, process it fast, and then give the output. I'm trying to give a very layman kind of definition here. Now, these large language models can help give the interactive recommendation system. For example, if someone wants to say that, okay, I want to listen to a famous rock band. Now, this is a good enough information, the recent information that what I want to look at and then you can immediately recommend. So this is just a very high level idea. But how can it impact the personalized recommendation you'll get to know with this example? Suppose someone writes on a chat bot that, okay, I want to hear some music or I want to listen some music. Of course, typical chatbots or typical conversational system will start, okay, what kind of music do you want? Should I recommend? But and all these things or any music that you're recommending depends again based on that user or item or the interaction data that we are talking about. But now in this example, you see that the system recommended one song, the Mojito by a singer. But then immediately the user said that, no, I have listened it multiple times. Can you suggest something else? So see the existing system actually tried to recommend based on some things. But the interactive system brought in additional context of the immediate need or immediate wish of the user and then system can suggest a new song. This is hyper personalization and most importantly, if you see here, this brings better accuracy because now you know that what customer or what user wants immediately. Because it might happen that my interactions with say for example, Netflix, I've been interacting with watching movies on the Netflix, say for past multiple years that data, if you're considering or if you're considering the recent one, recent wish that I want, then this brings that recency into picture very clearly or very accurately. So those kind of benefits that JNAI brings on top of the existing system. Certainly you're leveraging your existing systems but you're improving them and you're making it more personalized. So one point is better accuracy. But the second important point is more persuasion. Now when you know the immediate need of the user and you suggested the immediate solution to that or immediate product to that, it is more likely that the person will use that. Be it a video, be it a course, be it some product that you want to sell. So that persuasion potential also increases. So this is the second, this JNAI is using the existing data, existing solution and is putting things on top. Most importantly, it is also helping you capture more user insight. So that's the augmentation, more hyper-level personalization is something that JNAI actually is bringing in. Now if you see here, if you need to build this kind of application, you need a bot engine, of course there's a chatbot engine that probably you have been building already. Second thing that brings LLM, the large language models that we were talking about that help do that conversation and do the personalized conversation I'll say. You are able to extract the fresh external data which was not available into the system earlier and then take the call based on that. And then the knowledge base that you have and the graph DB or some relational data that has been stored in a relational or related format where you can make some pattern out and suggest the recommendations. So these are the components that you will need to build a newer type of recommendation system which augments on the existing one. So we got an idea, this is just one use case that we say but if you look at things carefully, there are large number of use cases where JNAI can have impact and if you remember the data point from the first slide where it said that from 15 to 40% impact. Now with this high level block system you can understand. You can generate the text, you can answer the questions, you can summarize the text, you can do large many things and most importantly you can generate code, images, all those things. Now if you start taking these JNAI use case building blocks, then you will see that they are large number of potential use cases. This slide is just an indicative one. There can be thousands of use cases that you can implement. We'll cover many of them as we go ahead and how different solutions are coming up and how AWS also is bringing up those solutions that we'll cover as we go along. Now we'll talk like how AWS is reinventing that we'll start looking at now. So we saw you need to build an application, JNAI based application. So to build that application, there were different components and you need different support or different layers basically to support. So this is the generative AI stack. So at the bottom there if you see we said that these are large language models taking large inputs requiring large compute power to process or to be trained, then the outputs and all those things also they are generating very fast and they are heavy on that side. So you will need huge infrastructure for training and inference. So the training infrastructure layer, once the infrastructure is available then certainly you will need some tools that help you leverage that infrastructure to train your models and then finally you will be building this application. So how different tool sets are being fit in this we'll start working on that. So conventionally you have all been training your machine learning models on GPUs. With large language models you need even more GPU power. You need even sometimes even a single machine with multiple GPUs is even not sufficient you might need large clusters. We use the term here ultra clusters where you can have huge number of virtual machines which can have multiple GPUs and when I say multiple in thousands up to 20,000 GPUs coming together in different virtual machines you can get up to that level of capacity for training huge large language models. When we are talking about ultra clusters or these kind of machines you certainly need very fast network. So that is where this EFA component comes in. So certainly the conventional TCP IP stacks have their own limitation in achieving the inter machine communication. So you need some special communication approach. So EFA comes into play. And so just highlighting like these are the infra components. A couple of points I want to highlight here is so GPUs we have all been we have been talking about NVIDIA GPUs. But then you might also need or you would also gain some benefit from the special silicon chips designed for training and inference. So training is for training the models and inferences for inference purpose. These chips have been specifically designed for training the machine learning models and for the inference from those models. And eventually they give much better cost to perform cost and performance ratio as compared to the GPUs. And then SageMaker is a service which lets you implement your entire machine learning pipeline. It's a fully managed service. It lets you implement all your machine learning pipeline, machine learning life cycle steps, your ML of step and many more things in a much, much better and automated way. The second layer if we talk about will go a little deeper also in these two layers as we go along. So one service we talked, so we did talk about SageMaker at the bottom layer. But right now the one more service Bedrock, this lets you fine-tune your models or maybe train or improve your models. So the very first question that usually people ask is that okay, there are so many large language models which one do I choose? And maybe it's like you might want to explore multiple or maybe today you have one model that is giving you good performance. You maybe six months or five months down the line, you get a new model that you would want to switch to. You would want to fine-tune that, you would want to adapt that. So how will you do that? So Bedrock is a service that lets you implement all that stuff. Plus we did say that you need to build an application. It's not just about fine-tuning the model. It's also about making an agent or an interface for your application. Maybe you would want to access those models through APIs. Bedrock offers all those things. We'll go a little deeper here. So Bedrock offers multiple models. We'll see in a slide what those models are. These can be open source. These can be open models. These can be third-party models from Amazon itself. The Titan model from Amazon is also available. So you can explore and different models. We saw in the building block text summarization, text extraction, many more use cases. These models, some of the models are good at multiple use cases. Say 4, 5 use cases, 10 use cases. But some other models might be limited to say 2 or 3, 4 use cases. You can choose. Another important point before we go here further deep. I want to highlight is when we say large language models, usually the organizations who are training or building these models, they actually come up with multiple variants. For example, one model, Lama comes up with the 7 billion parameter model or the 13 billion parameter model or the larger one. So it's not like you always have to choose a larger 40 billion or a 70 billion parameter model. If your use case is small, you can choose the smaller model because that will require much lesser resources. That will be trained much faster as compared to a large model. And eventually it's a commercial viability of the solution as well. So Bedrock offers multiple models and different variants of the models as well that you can find. So you will get access to multiple models. You can customize because it's a fully managed service that lets you customize your models. Then important thing, when you are taking some decisions, say for example, I may ask from a model, just taking an example to explain things like what is the weather today. Now when I'm saying what is the weather today, that means the model needs the immediate information from today's weather. Some data source that is giving you today's weather. That means you need to go and extract the information from a data source. So RAG as we call is the pattern that helps out. Bedrock gives the features to implement RAG patterns. It can connect to the knowledge sources as well. And then we did talk about agents. Certainly when you're building all the solutions you're looking, the first priority is like how would you want to ensure that your data is secure, your models are secure. So Bedrock implements all, follows the compliances and security principles that other AWS services are following. So that, we did say that it offers multiple models. So just sharing a list here. So from the AI 21 labs, anthropic, so multiple models and where they are good at. Even the open models from Meta, the Lama 2 is also available. The image generation model from Stability AI and Amazon's own model. So you can choose from different models and different variants because Lama 2 has got multiple, Cloud also has got multiple models. So you can choose which model do you want to use, which size of model do you want to use. And effectively, you also would want that the model is continuously fine-tuned. So it's not like you fine-tuned once. Of course, you connect with the knowledge source through RAG that we talked about, but you certainly would want that the continuous fine-training also happens. So those kind of workflows can also be implemented. And some recent features that have been introduced are when you are comparing two models, you're fine-tuned, okay, you have a model version one and then you have improved your model version two. You certainly would want to see that how much improved that model version two is. Or maybe if you're comparing two models, then model evaluation parameters do play a critical role there. So it does give you a model information report. You can generate a custom report or you can also ask Bedrock to generate an automatic report. And then agents and guardrails that we talked about, like security guardrails and maybe you would want like the model does not respond to some bad language all those kind of guardrails you can put in. So these features have just been announced in preview being tested, but this is like a single box where you can build multiple kind of solutions. Now coming to the top layer, Bedrock becomes a very important tool, but still as we say that it is offering you models, your fine-tuning models, models are being made available as APIs or through agents, but certainly you would want to integrate those things in your application. Coming back to that, the personalized example, we were doing some chatting. So you would want an application, some chatbot kind of thing, which can be easily integrated in your website, for example. Now on the top layer, if you see the first one, we say Amazon Q. This Amazon Q is the application layer service, as we say, which can leverage Bedrock and can help you build multiple solutions. Now when we say join the knowledge graph or connect with the knowledge graph, you may have a database, RDBMS, you may have an unstructured data lying on say S3 or some object store, you may have data you would want to upload, maybe you would want to crawl some websites for your data. So multiple data sources and then you would want that all these data sources are consumed, that fine-tuning happens automatically and is integrated with your solution. Just take for example a web application and then you would want multiple other ready solutions. So Amazon Q is the service that connects to multiple sources, creates chatbot kind of experience or web experience for you that you can leverage. You can use it in the API calls as well. But then AWS itself has leveraged Amazon Q and has integrated in multiple services. We are not listing all the services here, we will cover in the next slide. So coming to the main slide here, Q is your generative AI assistant. You would want to go for that personalized example that we said where you wanted a chatbot. Maybe you would want a dashboarding application where you want to create some kind of dashboards, charts based on your NLP. You can integrate Q in multiple scenarios that we'll see. Precisely Q is available where you want it. Just giving an example how AWS has done it, that gives you an idea that how you can use Q in your own applications. In AWS management console, you know that you go and you say that I want to create a virtual machine through EC2 or you want to go create a VPC, you want to go to Redshift console and create a data warehouse there. But at that console, you might need some level of help, documentation help. Or maybe you want to say that you want to troubleshoot something. So then Q can be an NLP interface and then at the back end you can decide how to troubleshoot on different scenarios. And AWS is also implementing that feature where you can use Q for troubleshooting in the AWS console. Documentation, we saw that AWS documentation if you just open up a web page, you will find a Q icon and you can answer. Rather than just scrolling down entire documentation, finding the sections, you can just write to the Q and Q can answer those questions. A console mobile app is same. Code Whisperer is a service that lets you write your code through the comments or through a national language processing. But Q is also integrated with Code Whisperer so that you can do little better on that side. Maybe you can do the code transformation. You have written already one code say in one language and you want to transform that code. Or you have written some code you want to break down code into different chunks. Multiple things that you usually do in your coding exercise. Q is integrated and extends the functionality of the Code Whisperer to do that stuff. Code Catalyst, this is a service that lets you do your CI CD effectively. It's a single console which lets you manage different aspects of your project development cycle. But now suppose that if you really want to develop a feature and you simply say that you ask Q that, okay, this is my requirement and build a feature in my code. So that's where the Q is integrated. And of course, slacks and teams as we have been talking like for your conversation. So Q can be implemented. I'm giving very quick way that how AWS has implemented. For example, conversational Q that we talked about. So if you see, right now in this screen you are at DynamoDB console but when you are working on your console you need some help there. It gets you help there. On the right hand side there is an icon. So this is how it has been very deeply integrated. The second example is troubleshooting. You are trying to do something again on the console but you get an error. There's a button that troubleshoot with Q. So it lets you troubleshoot this error and what steps should you take to correct that error. It's not like you are taking the error out, finding the documentation, searching on the net, what to do. Immediately you can ask Q that, okay, how do I troubleshoot or what steps do I take that this, I mean, I'm getting this error. So immediate help that you get there. The feature development that I talked about. So you write your complete set of requirements. This is integrated with code catalyst. So you write your complete set of requirements and just simply ask Q to implement that feature for you. Q understands that, Q knows that where your repository is and then it will leverage code whisperer at the back to implement the feature in your code and then it will give you for review. It's not like it is just simply saying that okay, feature is implemented but your major heavy lifting is done by Q itself. One more example, code transformation. You have already written certain applications. We know that there are many monolithic legacy applications that you would want to convert to the newer version. I give an example here within AWS, we had a team of five members which converted 1,000 applications at an average speed of 10 minutes per application conversion. They converted from the older version of Java to the newer version of Java and the entire team completed the 10,000, sorry, 1,000 applications and if you look at the average time taken for transformation of those applications is around 10 minutes. The longest time taken for one application was less than one hour. So just giving an idea at what level of productivity that it brings to you. And the code whisperer service that we talked about on the application layer so it lets you build. So if you can see like, you're just writing the comment and it generates the code for you. It is also now like we said integrated with Q so that you can extend the functionality of restructuring the code or finding the security vulnerabilities and many more things. Now typically when you see that you may say that I'm writing the code and it will generate code based on the general learning that it has got. Q has also added things like here. It has already been trained on Amazon code and of course the publicly available code but now you can customize it with your own code so that it generates code which is more aligned with your existing pre-written code, your own conventions, all those things it takes care of. So this is how we see that JNAI is moving ahead, large bouquet of services to leverage. It's up to you how you join these blocks, how you leverage these application and tools to help you out. So we just saw like how the JNAI stack has evolved and how it is supporting you build your JNAI enabled applications. So as a next step, if you want to build those kind of applications, certainly you will need to learn, get some level of training, then have the idea or proof of concept of your solution that you want to build and then once the proof of concept is successful, just simply build and scale the generative AI apps using the bedrock or Amazon Q and these kinds of services and you can leverage AWS services as well to build your entire solution. Now I will conclude this session with this small overview. Thank you very much.