 Welcome back to SuperCloud Six on theCUBE. I'm Paul Gillin. Walmart, one of the world's largest organizations, often thought about as being a brick and mortar retailer, but people sometimes forget that big data has been at the core of Walmart's culture since the beginning. This is a company that really built its success on its ability to gather and understand data about its customers. And it continues to be an innovator in that area. We recently learned about a program underway at Walmart to build a fully fledged AI development platform and joining us today to talk about that is Hari Vasudev, who is the EVP of Global Tech Platforms at Walmart, joining us from Bentonville, Arkansas. Hari, welcome. Thank you, Paul. Thanks for being here. Delighted to have you. So this is a machine learning platform. It's called Element, and it's a full lifecycle platform. I understand what components make up Element. So the architecture of the Element machine learning platform is a classic layered architecture. So at the very bottom of the architecture, you have the managed LLMs. So whether that's LLMs like, you know, the OpenAI LLM or the Google Gemini and LLM or open source LLMs like CodeLama and so on. And then on the layer above that is we have something called an LLM gateway. What the LLM gateway really allows you to do is to do large scale distributed model training and inferencing. So it allows you to route requests to any of these LLMs, whether there's some managed LLMs or open source LLMs. And then we also built an intelligence in there for optimized cost and performance tuning and inferencing. So we have an LLM router, we have GPU recommender that allows us to essentially do automated decisioning of GPUs and models. And so what it allows us to do eventually is low cost, high accuracy, prompt training and automated prompt engineering. On top of that, we have a governance layer. The governance layer really allows us to manage essentially things like fairness monitoring, mitigate the effects of hallucination and so on. And then above the governance layer, we have the security guardrails. So really it's all about having keyword blocks, making sure that we are moderating content appropriately, making sure that we're filtering sensitive content out. So that's the sort of core of the element platform she will. And then on top of it, we have another layer that allows us to build standardized interfaces. So we have a platform called Converse, which is essentially a platform for conversational AI. And so you can build chat boards and other chat applications on top of Converse very quickly and Converse does all of the interfacing with element. And then finally, and eventually, you can have other similar sort of application platforms that might do very specific things beyond conversational intelligence, right? So they might do things like machine vision and so on and so forth, all of which will leverage the element platform. And then ultimately in the layer above, we have all of the applications themselves. So those are applications like generative search that we recently announced. My assistant application, which many of our associates use. We also have a JNAI playground where our own developers can sort of play with different use cases and applications and pilots built on top of element so that we're hoping that that can spark innovation at scale by also reducing the overall cost and essentially making the process of developing applications a whole lot faster. So it's really a soup to nuts approach. It's a very impressive platform that you built. Why build it yourself? What was missing from the commercial platforms that prompted you to build it from scratch? So at Walmart, we've seen significant growth in the usage of AI in MLO over the last few years, right? Even prior to generative AI taking off, we were using predictive AI quite ubiquitously across our enterprise, whether it's in supply chain, in e-commerce, in stores, in managing our real estate portfolio, even in internal applications, right? Like enterprise applications like financial systems and people systems. So more and more what has happened recently is we've had hundreds of data scientists, machine learning engineers spread across different geographies. We have teams in multiple locations in the US and in other parts of the world and spread across different business units. So what we realized very quickly is that it becomes very hard for us to be able to develop and execute AI and ML projects with speed and sustainability. And what I mean by speed and sustainability really is, typically the way you achieve speed is you leverage, you reuse and you prevent as much as possible the duplication of effort across different teams in the company. And those are all things that are best addressed by building a platform. Now, one of the things that we also realized very quickly is that we want to be fairly cloud vendor agnostic. We want to be agnostic to different kinds of models, large language models. Over time, we want to have the ability to fine-tune these models using Walmart context data. So it was very important for us to retain a level of control over this. And of course, at the scale at which we do things and the innovation just means that, the more we can standardize tools, frameworks, development processes, the better it's going to be in terms of collaboration, speed of throughput, as well as reducing time to market. What building our own platform also allows us to do is to adopt best-of-breed technologies from open source as well as from cloud providers. And obviously we can put in our own governance layer. And then, as I was talking about earlier, make sure that we're adding that additional level of governance, because as the world's most trusted retailer, we want to make sure that we're being transparent about how we use PPS data, making sure that we're putting appropriate governance and security in place. And then ultimately, it allows us therefore to do sustainable development of AI, but also ethical and responsible use of technology and data. So all of that is best achieved by building our own platform. And so that's why we built Element. As you said rightly, it's a soup to nuts end-to-end platform that provides capabilities for data science teams to iterate very quickly, to reduce the time to innovation and to scale up their use cases very quickly for mass adoption, keeping in mind the right cost envelope. Given yourself maximum flexibility to choose clouds and LLMs and underlying models, why multi-cloud? Why was it important to make it a multi-cloud model? And that's a very good question. Multi-cloud for us predates element and predates generative AI. We have embraced a hybrid multi-cloud approach. And what I mean by hybrid multi-cloud is, it's a strategy that embraces both private and public cloud. And within public cloud, it embraces multiple vendors. As we embraced cloud technology and experimented with multiple cloud platforms, we aligned on the strategy that we're referring to as a triplet strategy. So what we decided is to go for a multi-regional approach with each region having all three legs of the cloud, right? The private and the public clouds. And so today we have a region-based deployment where a triplet is deployed across three regions in the West, in the East and in South Central that allows us to seamlessly integrate and run cloud agnostic ML workloads across our enterprise, right, across native clouds and regions. And cloud in many ways today powers every part of our omnichannel experience for our customers and associates. So it powers all of our e-commerce applications. It powers our clubs and store operations and the experiences our customers have in clubs and stores. It powers all of our internal associate tools, whether it's in our distribution centers, in our fulfillment centers or indeed in our stores and clubs. And it powers, as I said, enterprise applications. So we use it quite extensively in applications like FinTech, ADE, even the way in which we roll out applications related to compliance and so on. So again, it enables our engineers to innovate by leveraging best of great technology. It allows us to intelligently move data across regions, across clouds. And it effectively allows us now to also build through platforms like Element, a really fabulous, you know, ML platform that gives us multiple use cases that is used by our developers to accelerate their development. And ultimately, if you think about it, at the very core of it, there are some key components, right? So we have a platform called OneOps that is an open, you know, it essentially allows us to basically, it's an open source cloud management platform issue, right? So it allows us to manage virtual machines in a very standardized way. We also have a platform called WCNP or Walmart Cloud Native Platform, which is an orchestration layer based on Kubernetes that allows us to essentially scale applications across the private and public cloud. And then finally, we have a data abstraction layer and Element is actually a part of that data abstraction layer which allows us to not only build AI and ML applications at scale, but also allows us to do appropriate workload placement of our large data workloads across the different cloud providers. And so we allow us to move data across cloud providers very easily and therefore to be able to allow us to develop applications fairly effortlessly. The other really interesting aspect of our multi-cloud environment and approach is we think about all of our stores, our distribution centers, our fulfillment centers, our clubs, effectively what we've done is by leveraging this triplet cloud, we have enabled, you know, nearly 10,000 edge cloud nodes at all of our facilities. And so we're bringing in computational power and data to innovate for our customers and associates even at the edge. So we bring all of this together through this hybrid multi-cloud approach that we're calling the triplet strategy. And then on top of that, you know, we've built platforms like Element which now allow us to accelerate the adoption of machine learning and AI across the enterprise to benefit our customers and associates. So how is, you know, bottom line, how is this impacting day to day operations? Where are you seeing, where are you using AI right now in the stores, in fulfillment, in the supply chain? What are some examples of how you're using AI? Yeah, I think when you look at element as the platform, right, what we've done as I mentioned is we've customized it to work really well within the Walmart ecosystem, which means that before I talk about individual applications, I just want to take a moment to talk about how it's changed the way in which we do application development as well. So one, the platform has got ready to use data science environments which are very integrated with the rest of our ecosystem with our IDEs, with our runtime or distributed frameworks and so on. That means that, you know, you can use MLops deployment framework now and you can then tap into on-demand infrastructure whether there's GPU or CPU or TPU to essentially enable teams of data scientists to very quickly deploy multiple models in parallel on a multi-cloud regional infrastructure in a very short time. So to answer your question more specifically when I talk about some of the examples, right, I mentioned this thing called My Assistant. What My Assistant does, a few months ago, we rolled out an application or a feature inside the me at campus app called My Assistant and we have now rolled it out over 50,000 of our corporate employees. It's a Genai powered assistant that helps associate built first drafts of content faster, summarize large documents, literally in a matter of seconds and spark creativity with sort of thought start as another idea of topics. And it can even be used, for example, for people to discover information about, you know, health benefits at Walmart, for example, right? So when they have a query and they ask for it, it gives you, rather than giving you generic answers about healthcare, it'll give you very specific answers that highlight the features of the various options that are available within Walmart benefit plans and so on. So it's really about bringing the power of all of that information onto people's fingertips through applications that are being built by leveraging the element platform. Another example is on developer productivity. Developers typically, industry benchmarks have shown this, they typically find, use, spend a lot of time finding information. And what element does is it allows us to build, it is, we built a platform called DX, developer experience on top of element. And so developers can now come to DX, very quickly find information that may have been curated and submitted by other developers who came before them. They can then use that to troubleshoot the issues they're seeing in their deployment process or, you know, issues they're seeing in applications and production. So again, what it allows us to do is it gives us a single interface to deploy, to triage and monitor software deployments. So those are a couple of examples internally. Externally, of course, the most popular example, if you will, is a generative AI search. We launched that recently. It's a generic powered search experience that enables customers to search for very specific use cases. And the idea is that more and more, search has become a way of doing task completion. And those tasks have gotten more and more complicated, if you will, right? It used to be that people would do things like create a recipe or, you know, ask for a recipe and then automatically we would provide a basket of goods. Now, using generative AI, I can actually give a query that says, help me plan a, you know, March Madness watch party. And what it'll do is basically the algorithm and the search capability that has been built on top of element will go through, you know, literally hundreds of millions of catalog items, create a basket for you that's very highly customized. It may even have your own personal preferences on brands and so on thrown in from the knowledge that we have gathered about your shopping experiences and will allow you to create a very curated personalized basket that you can then order from Walmart and, you know, enjoy at a March Madness watch party with your friends. Similarly, you know, if you're a parent planning a birthday party for your child, you can say, okay, help me plan a birthday party barbecue and a similar sort of experience will be provided to you. I mean, I talk about all of these experiences, right? The key thing that you have to remember here is that the challenge is, of course, how do you actually provide a very high throughput in training, right? Because 90% of these models in search typically tend to degrade over time. A classic example, of course, is in fashion, right? When I search for something like hats, there is a temporality, right? If I'm searching for something in hats in the summer that will have, that should provide me a very different set of choices than if I were doing the search in the winter. So what we're doing is we're continuously testing our algorithms with the goal of removing what we're calling cognitive dissonance between the search results we provide our customers and what they expect to see. This of course means that our scientists have to work with non-linear models, oftentimes with hundreds and multiple hundreds of feature sets and ever-growing, larger and larger feature sets, which means they have to train the data across extremely large clusters, involving very large data pipelines. And if we didn't use a platform like Element, the experimentation cycle would be very elongated. And what Element does is it allows the data scientists to actually just focus on building models and they can now test these models iteratively in parallel. And they can run multiple iterations in parallel and they can compare and contrast the models because there's a built-in workflow engine and element that allows you to actually run multiple models in parallel. And then they can compare those results and decide which version of the algorithm they want to deploy. And all of these things used to take, you know, arguably months of effort. We are hoping to squeeze it into days and weeks and days of effort now. And over time, we'll just get better and better at it and fine tune it more and more so that literally our goal is to have our data scientists be able to roll out new models, multiple new models, you know, on a daily basis, if you will. Amazing. When you look out, of course, your customer experience is so important and your gen AI search is an example of that. What other innovations do you see? I mean, when you look out for years, how do you believe AI will change the customer experience? So I think from a customer experience perspective, right? We believe the gendered AI is going to have a transformative effect on our customers, right? It's going to be totally transformational in many ways. I think the customer and member unlock is huge. It's going to change how we engage with our customers and their members. It's going to enable personalization. So what's going to happen is you're going to be able to do at one level personalization at very large scale, meaning you're going to do personalization that's going to scale across hundreds of millions of customers, you know, doing multiple interactions with us each week, right? So the scale is going to be tremendous. Yet at the same time, we're going to be able to leverage generative AI applications built using element and the element platform and the capabilities that it offers to essentially deliver literally a super custom, super personalized experience for you. So while it's going to be obviously hard to predict exactly how the future is going to play out and remember generative AI itself, we weren't talking about generative AI, you know, even as as little as 10 months ago or less, just about a year ago. But the areas where I do see the greatest impact on customer experience is going to be around highly personalized, interactive, voice-based, and that are also going to become more and more media rich. So it's going to be hyper personalization where you, Paul or I, Hari, will get exactly what we want and we'll get these experiences super tailored for us, whether it's in apparel shopping, whether it's in, you know, I can go and think of a use case where I want to shop a room and what it's going to do is based on my purchase patterns from the past, it's going to put things together and then it's going to allow me to maybe upload sort of visuals of my room and it'll then give me suggestions and look, if you're looking to design a room, this is sort of the kinds of things you ought to buy and these are the things that can go well together. And again, personalized to my own taste, personalized to what I have expressed through my shopping experiences with Walmart. So I think over time what's going to happen is it's going to allow us and our teams to just enable and evolve to the business's needs and to the changing customer needs. And so the idea of building a platform like Element is exactly that, right? Which is, you or I cannot predict what applications and use cases are going to be extremely popular or innovative five years from now, but we do know it's going to be based on ever increasing capabilities of AI and that's why building an end to unsupervised platform allows us to be able to sort of flex to different use cases that we may not have even thought about today but may become the rage six months from now or three years from now. How about behind the scenes, the back end operations, logistics, distribution, inventory? I mean, so many areas where you can achieve efficiencies. What do you see the big impacts being there? So we do use, you know, Element and AI quite extensively in all aspects of our supply chain, right? So in our, whether it's in our warehouses to increase the productivity of our associates through intelligent tools that allow them to perform, you know, tasks that were more mundane in nature so that they can elevate how they spend and where they spend their time, right? Last mile is a classic example. You mentioned supply chain. And one of the challenges in last mile is how do I route or how do I build routes for our last mile drivers in a way that allows us to essentially think about it as, you know, solving a problem or optimizing it in three dimensions, right? One is what is the driver preference? So as a last mile driver who's a gig worker, typically what is your preference in terms of routes? And that may vary with time of day. It may vary with various considerations. It may vary with whether they want to do this to upon their way home and so on. Second consideration is, you know, cost of course, right? And making sure that we're able to do it in the right way with the right cost envelope. And the third consideration, obviously, top of mind is making sure that we're able to do it as fast as we can so that our customers are able to receive the goods and they're able to get packages delivered to them as fast as possible. So all of this means that, you know, we have to do some very fancy routing at the last mile. And then the best way to drive cost down is to densify the last mile, right? So the higher density we can have in the last mile, which means I have to marry all these three considerations together to create optimum routes. And I have to do it at the scale of, you know, literally hundreds of thousands of drivers every day. And so that's again a use case where we have leveraged element to build and typically the problem is solved by combination, if you will, of heuristics, of classical optimization solutions as well as, you know, machine learning solutions. So for the machine learning portion of that, we have used element extensively and what we've been able to do is to build an algorithm that supports all of these three things. It supports driver preferences. It allows us to densify appropriately in the last mile to keep cost down and it allows us to deliver our orders to our customers on time as fast as we can. So they in turn have great delightful experiences. One logistical complexity is running networks and of course you run a huge global network. Do you see applications of AI to improving the quality of network operations, network reliability, where the opportunity is there? Oh, absolutely. And you know, I'm really glad that you brought that up all because as in, obviously my role at Walmart is to lead all of the platforms and the network infrastructure and the devices that we have that span across all of our facilities, stores, distribution centers, for women's centers is something that personally I pay a lot of attention to in part because many of these things, as you're well aware, have a very high blast radius, right? Meaning that changes are deployed centrally to literally hundreds of thousands of devices spanning multiple nodes. So what AI built on top of the element platform has allowed us to do is to locate hairline cracks, for example, in the environment before there is a disruption to network services. So given the scale of our network, we're literally aggregating tens of thousands of signals from applications and infrastructure devices. And these devices might be switches, routers, they could be other kinds of devices that we build firewall devices. And so what we're doing is we're using, we're aggregating tens of thousands of these signals from all of these devices and then putting them through AI algorithms built on top of element that then allows us to pinpoint and address issues or as we call them hairline cracks before they become major disruptions, right? So the idea is to essentially incorporate AI models in our Antoine network inside center and then continuously monitor our network. And then over time, what we're hoping to do is to aggregate these signals. If you will to consolidate and aggregate these signals into a customized model, which then allows us to proactively repair and harmonize a network. So the idea is that at the end of the day, if you think about it, what these networks, the signals that coming out of the networks are data, right? And as I talked about earlier, with a lot of the technology that we've built using element built on top of a triplet cloud strategy, we built platforms that will allow us to process data at very large scale, very effectively. So by reducing a problem of network operations to yet another data insights and data analytical problems and basically looking at it and saying, okay, how do I now apply AI into solving that kind of a problem allows us to essentially monitor our networks much more effectively, troubleshoot more proactively and harmonize them over time much more effectively as well. By the way, this is something that we've just started to do now. And I expect that, you know, we'll accelerate that effort in the coming months more and more as we learn more and as our data scientists become more and more proficient at iterating fast on top of element, I expect this process to accelerate. Now you've taken the bold step of launching a customer facing large language model, which a lot of companies are nervous about doing because of the risk of hallucinations and errors. What provisions do you have in place to prevent bias and just crazy behavior from infecting these models? Yeah, so I think that's a great question. And I think, you know, early on as we built element, we were very particular and keen that we wanted to make sure that, you know, we can do everything we can within the platform to mitigate the impact of biases and hallucinations. It's always a risk as you're aware with generative AI. And what we've done is we've done multiple things, right? Again, the benefit of having the platform like element is we're able to essentially consolidate all of that intelligence into one platform. And then all of our developers, all of our data scientists will benefit from it. And therefore all of the applications we build on top of this platform will benefit as well. So within element, we have something called an LLM evaluator, microservice. What that does is it uses open source model evaluation libraries and it computes model metrics. It uses against multiple LLMs. So model metrics will include typical performance metrics. It'll also include bias metrics like toxicity or regard, honest scores and evaluations for hallucinations and trustworthiness. So the idea is that, you know, you reduce the risk by creating a controlled environment, by creating a consistent platform and by ensuring that you're imposing rigorous quality standards with respect to data going in and out. And then by leveraging this LLM evaluator microservice, what we can do is we first run it on a sort of curated set of data, right? For different use cases within Walmart. And then we also allow teams to bring in their own data sets and run these evaluations against a custom data set. So over time, as the model itself learns, it has the nice effect of mitigating the impact of hallucinations and biases over time and that then increases the trust in these applications that are built on top of element. And so far so good, right? Yeah, by the way, the other thing that we do do, which I mentioned early on when I talked about element is also what I talk about is the prompt engineering, right? And so obviously prompt engineering combined with retrieval augmented generation or HAG is becoming more and more popular as a way to also control hallucinations because it allows you to essentially ground the results through sort of providing additional context, which in our case would be context that is very relevant and apropos to Walmart. I have to ask you this because it's been a topic of continuing debate. You're one of the largest employers in the world and the debate is whether AI will ultimately displace jobs or create jobs. What's your opinion of that? So if you think about our mission, right? And if you think about who we are as a company, we are an omnichannel retailer. We have people led, tech powered with the mission to save people money so that they may lead better lives. So our mission of serving customers has always started with our own people, right? That's why we say we have people led. Ultimately, we believe technology empowers people including our own associates. And so they can evolve physically demanding jobs into higher skilled, fulfilling jobs. And we do provide our associates with training with pathways to fill on-demand roles that are critical to our business both now and in the future. So if you look at the history of technology innovation, advances in tech have almost always created more new jobs. The jobs may be different for sure compared to what they do today, but in many ways tech has always been a rising tide that lifts all boards. And ultimately, I think AI will also allow associates to focus on engaging with their customers in our stores and online channels better by focusing on higher order tasks. And over time, I think we will continue to have the same number or more associates because what will end up happening is we will end up having more and more applications built using AI on top of element which will then in turn allow our associates to perform a very different combination of roles than they do today. But ultimately, they'll be serving our customers better. We'll be innovating faster and therefore serving our customers better and therefore growing our business. And so I really think that AI is going to be really empowering our people. So we'll continue to be people-like but we will also continue to use technology including applications built on platforms like Element to empower our associates to do more for our customers and for our business. Great story, stirring words. I'm going to go right now and plan my March Madness party using your Gen AI. Awesome. Yep. Hari Vasudev, EVP, Global Tech Platforms at Walmart. Terrific story. Thanks for sharing so many details. We will be watching closely to see how this plays out for us as customers, but also to see how it changes your company at Walmart. Thanks so much for joining us today. Thank you Paul for having me. And have a great weekend and enjoy your March Madness party. For SuperCloud Six and theCUBE, I'm Paul Gillan. We'll be right back.