 Welcome to SuperCloud 6, AI Founder's Day. I'm Hawei Xu. I'm a serial entrepreneur and an executive in AI and the cloud space for a very long time. Today, I'm welcoming two distinguished guests. And I wanted to discuss AI and the cloud today. So Amar, you are the Founder CEO of Vectera and Yang Qing, Founder CEO of Laptom. So I want you, the two of you, first introduce yourself to the audience. You are very well known in the space. But maybe you can tell us a little bit about why you start this company. Because you have done a lot in the past, but why you want to start this company. Let's start with Amar. Yeah, this is actually my third startup. So I had a couple of startups before. I had one startup that I started in 1999 that was acquired by Yahoo in 2000. And then at Yahoo, I spent eight years at Yahoo until 2008. My career shifted more towards big data, machine learning, and data science. And out of Yahoo, I started Cloudera because I saw an inflection point with new technologies like Hadoop that enabled us to spread computation over many, many servers. We called them pizza boxes back then and do very high performance computing tasks leveraging normal hardware as opposed to building a supercomputer. So that's why it was my second startup, Cloudera. Cloudera went all the way to be a public company. And a couple of years ago, they got acquired by a private equity firm. And then after Cloudera, I joined Google Cloud for two years. I was vice president of developer relations over there. And while I was there, I got to experience the very first large language models that they had internally. They had a system called Mina. That was the code name of the system they had. And I got to have many conversations with Mina. That was like two years before ChageBT came out. And it was very clear to me that, one, this is going to change the world. Like this is truly like AI at a level that we have never seen before. But second, it hallucinates and makes up stuff all the time, which means you won't be able to use it in the business without the solution that keeps these hallucinations and factual inconsistencies under control. And that was the premise for why Victara was created. So Victara is about trusted Jenny I, which means how can we have large language models that we can trust to be part of our business. Cool, we'll get to more into that because there are a lot of the technical details I would love to dive into, right? Because like you said, hallucination, it's hard the problem to solve, how do you solve and how big a problem it is. But before that, you know, Yang Qin, tell us a little bit about yourself and how you get here. Yeah, thanks for inviting me basically. I'm Yang Qin and I'm right now running a small startup called Lepton AI, which started actually only last year. I kind of started doing AI in my PhD days basically back at Berkeley. And so back in 2012, Alex Khrushchevsky has this famous Alexander paper coming out and everyone's basically looking for software. And back then we wrote a small piece of software called Cafe, which was relatively popular back in the days. And then my career went from actually doing AI research to doing AI system research, quote, unquote. I've been through Google, Facebook, we're running Facebook's AI info over there. And then in the last four years, running Alibaba's big data and AI or gets a VPN general manager. One thing that we saw very clearly is the AI computation pattern is very different from the conventional cloud and also data infra. And it's more similar to the old school high performance computation or scientific computation if you think about it. Very highly connected networks, very high performance, high utilization of the machines and things like that. It's no longer just moving data around like moving videos and images and text around, it's all about crunching float numbers as fast as possible. In fact, last year or yesterday, and we actually talked about their newest biggest computer and things like that, which looks like all those supercomputers which used to be doing weather simulation, like physics and all those kinds of things, right? So in Alibaba Cloud, we started actually seeing a surge in the new hardware and software architecture that will allow us to basically run large language models and all those other AI models, both training and inference. And so we figured similar to what Cloudera did in the big data world, basically, I think AI is very quickly becoming the third pillar in the IT infrastructure. I think cloud starts basically being serving web traffic and it's really well. And then data infra deals with a large amount of data that's accumulated via web services. And today, snowflake data is the new generation today basically, of course. And then basically I think AI is guaranteed, at least I believe, to be the third. Definitely, thanks so much. So the third pillar would be leptin and a vector of the world. Hopefully, yes. By the way, it's very interesting, we have a common investor actually, Fusion Fund. Fusion Fund is great, yes. Fusion Fund is a common investor in both of us. Nice, nice. So, you know, two of you are PhDs in this space for a very long time. So let's get into the weeds a little bit. So last year, last December, CNBC had an article, the article that title was like, 2023, a healthy profit year for NVIDIA and then lofty experiments for the rest. So with that, you know, you have to think, hey, why, right, why a lot of the experiments, right? People have been talking about journey applications, you know, already we're in a half years or so, but not too many productions deployed. So there got to be some complexity difficulty, right? So you mentioned the vector, you know, you started Victoria seeing that, hey, hallucination, all those things, can you actually just give us some rundown? What are the, why so hard to productize to do journey applications? Yeah. So first I'll note, it's a natural cycle that you see the hardware vendors lead the curve before the rest of the market catches up and even exceeds them actually at some point in many ways. So many- You mean market cap? Yeah, yeah, collectively, collectively. Hopefully, yeah, hopefully Victoria alone can reach that level of market cap. But if you look at, like we've seen this story before, so if you look at the beginnings of the internet with companies like Sun Microsystems and Cisco, they had valuations that were through the roof and they were selling left and right because they were selling- Cisco was the largest market cap company in 2000. And they're still not back to that. They're still not back to the level they were at in 2000, which again, that's why we have- Very cally. We have to be very cautious investing in Nvidia, I would say, like, you get this very hungry demand at the beginning of a new technical movement where people move towards the hardware vendors, but they very quickly realized that the value is in the software. The software can handle that. So yeah, so Cisco was great, Sun Microsystems was great, but then out of that, internet beginnings came Google, came Facebook, came way, way, way bigger opportunities and way, way bigger market caps out of the technological wave that these companies enabled. So I just want to caution with that. Like, Nvidia is the canary in the gold mine. Like, I look at Nvidia right now, we're seeing with them as the signal that there is a massive, massive market coming. And if investors don't wake up and try to take leverage of that, they will be left behind. And by the way, all of them are waking up and seeing that Jenny I is of course very important. So with that preamble, the reason why Jenny I is hard is for a number of issues that you need to solve before you can use it in the enterprise. The first one is hallucinations, as we discussed. Is these models can make up stuff and they make it up in a way that sounds very, very authentic or very accurate when it's completely inaccurate, which is very dangerous, right? When you have a system that can do that. And there's many embarrassing examples of that that we have seen in the press, like airlines giving away tickets for free because somebody tricked the chatbot into hallucinating a free offer. Lawyers getting disbarred because they had cases, prior art cases in their lawsuit filing that were completely made up and they did not exist. So clearly that gave the market a pause. Like all the companies looked at that and said, oh, I'm not sure I can have something like that in my business, right? And that was one of the key problems that Victoria angled on, not only how can we reduce hallucinations but also how to measure them. So with every response you're getting back from our system, you can tell this is a high confidence response, meaning I can take this and put this in my legal draft or put this in my email or put this in my investment memo or my medical diagnosis with confidence or we'll tell you, no, this is medium confidence. Meaning a human should review this first before you can use it. So that was needed. We have that today. Second, to be able to use Jenny I in any regulated industry like finance, law, accounting, medical, you need what's called explainability. Explainability has to be part of the system. You cannot just tell me this is the answer. That doesn't work. You have to explain why is this the answer? Which documents, what's the provenance, the lineage of how you made this decision that you're recommending to me right now? So that was a big gap that is now our company provides a solution to and a number of others have solutions for. Third, the security is a very big issue. So these large language models are susceptible to us called prompt attacks. So prompt attacks are us humans tricking the large language model with our words to get it to confess something to us that we're not allowed to see. So let's say you have there's certain data in the large language model that you are allowed to see but there's a big chunk that you are not allowed to see. Then he would come in and would say I'm going into a meeting and I need answers for these questions. And then the system would answer as if it's answering for you. And he gets to see the answers, right? So that now leaks information out to the system. So again, that is a no-no in business. You cannot have that in business. The person asking the question, their identity, not the way they say things, their identity should be the enabler of the access control that controls the system that's providing back the response. So all of these were, there's a number of other gaps like that. It's hallucination, explainability, security. So I have two very specific questions, right? One is, you know, I personally have been doing a GNI applications in various places, right? In the last year and a half. And it's very obvious to me, of course today, right? It wasn't obvious a year and a half ago. It's pretty easy to get to accuracy of 50%, 55%, right? With a little bit of work, you know, smart engineers, never done even any AI stuff, right? You can get into it. But once you go from 50%, 60% and go to the 80%, 85%, 90%, you feel like, wow, this is uphill battle, right? Steeper and the steeper. So what's your advice to the developers out there? I know that, you know, you already have some customers, you're already working with people, but not 100% people working with Vectera, right? So to the audience out there, right? What's their hope? Where should they look at it, right? So first I'm biased, obviously. My advice to them is come work with Vectera, we solve this problem, we know how to give you very, very high quality results. But many of them don't believe us. And whoever is watching super cloud assist gets a discount. Ha ha ha ha ha ha ha. The problem is many of them don't believe us when we say that, right? Because there's so many tools out there that make it simple to build a prototype. And they think because they can build a prototype that they can make this work in production. So we warned them. We actually tell them, many of our customers, the way it works with them is we have the first meeting and they will say, oh no, we can do this ourselves. But I'm sure, go ahead, try to do this yourselves. You're gonna come back and complain about many things. First, you're gonna complain that the quality is not very good. The results garbage and garbage out, like we say. The results you're getting back are not good results. Second, you're gonna have hallucinations. Third, you're not gonna be able to explain the response. Fourth, you might have copyrighted material in the response. Or the issues you mentioned. We highlighted all to them and then we let them go fail. So they spend a couple of months building it themselves. They think they're doing a great job. They show it to their business users. Their business users start to use the system. And they see- It's a sexy demo. Yeah, it's a, but once they use it in production, they're like, what is this? This system is inaccurate. It's making wrong answers all the time. And that's when they come back and say, okay, can we please work with you? Can you show us a solution that works? And the nutshell answer for why it's hard because across that pipeline of building an efficient, what we call a RAG pipeline, a retrieval of menstruation pipeline beginning to end, there is many, many models. It's not just one model that you need to get working, right? There is many models that you have to get working right in unison with proper feedback and proper back propagation across of them. So it's very, very hard for the average system developer or average engineer to be able to fine tune a system like that. In fact, even the ML experts is hard for them to tune. By the way, just for Vectera, right? You do the RAG system, right? Vector database, embedding models. You do your own large language model as well. Or you just use the frontier models like OpenAI of the world. Excellent question. So the answer is yes and no. So RAG is about retrieval augmented generation. So there's more than one model, as I said. For the retrieval, we have our own model. So for retrieval, we have our own model called boomerang. And boomerang is one of the top models in the world in terms of being able to retrieve text based on the meanings behind the text, not keyword matching. But you still need to do some key matching as well. So we do hybrid actually between boomerang and keyword matching to get the perfect, perfect results. We have another model that tells the scoring of the results. Like once you get the results back, you need to now correlate these results back with the prompt and rank them with the right order. This is the first good result, second good result, third good result. And you need to calibrate the score because that's how you're gonna decide, oh no, I should not answer this question. I'm gonna make up stuff versus yes. This is a high confidence result. I can use that to answer the question. And then once you have that, you feed that into the generative model. For that we use other generative models. We start with an existing generative model like Lama. We fine tune it and that's what we use there. It's foolish to go and build a foundation model from scratch that costs $30 million to build. So why not leverage the open source when it comes to that? And then the output of that goes to another model that we built ourselves, which is the hallucination evaluation model. It's the top model in Hugging Face right now if you search for hallucination. And what that model does is factual consistency checks. That's your model. Yes. That's also open source. Yes, and we open source that model. So that model reads the output of the system and compares the output back with the facts that were retrieved. And that's how it gives you the score saying, this is a high confidence response. This is a medium confidence response or this is a low confidence response. So one more quick question, right? You mentioned you give the confidence score, but in the enterprise world, statistically correct, doesn't still is not enough, right? If even something is 98%, 99% or 99.5% correct, it's not enough for me. Because I wanted this to be 100% right, 100% correct. It depends on the use case. So you're right, but it depends on the use case. For example, in a medical kind of situation, 100% correct is the, so the threshold is very high. And that's why we calibrate the thresholds to be very high in that case. But in that case, do you think that's a gen AI use case? Yes. Still a gen AI use case. But you have a trade-off now, you had a trade-off because if you increase your bar on the accuracy of the results, then there might be less questions you're able to answer. You will abstain from answering or more questions. So there's a trade-off between these two things, right? So if you say, I want 100% accuracy, then there will be now out of 100 questions you could answer, maybe you'll only answer 60% of them. Because the remaining 40, you're not sure you're going to achieve the 100%. Versus if you're losing it for a marketing use case, I'm publishing marketing for, well, it's okay to have 5% marketing people make up stuff anyway. And hallucination is a feature anyways. Exactly. So it's okay to say no, it's okay now my threshold for accuracy be 80%. So I can answer all 100% of the tasks that I'm trying to do. So it depends on the use case. The good thing is we give our customers the ability to balance these things, right? So they can control them depending on the use case they're after. So we'll come back to more details. Yang Qing, so I might just give us the random of the complexity of writing degenerate applications, right? So once you write it, you still have to deploy it, to deploy that at scale, right? There are a lot of complexity over there. Can you give us the pinpoint in that side of the world? Right, I think basically the community has actually gone very far in terms of optimizing the runtime and things like that. There are also libraries coming from like say Berkeley, my PhD school, right? Called VLLM and things like that. Basically I think what we are seeing is one, how do we actually reliably run those models at scale and high-performance? And the other one is how do we actually establish a way to evaluate what is the best or the most economic way of doing it? Like kind of similar to like in the past, we can actually launch rockets already like 50 years ago, but only today when SpaceX made it like really economic and things when it scales. High-scale, right. So yeah, so I think like a few challenges over there. One is basically the hardware has been evolving really quickly. And to the extent that the software is yet to catch up, like you said. Yesterday, Ambedia basically said they have the new DGX that's running at one X of Flops. Now in 2017, we actually wrote a paper back at Facebook called Training Image in an Hour. That was like one of the fastest approaches. The total compute in that hour is one X of Flop, meaning that basically you'll be able to train in theory, just in theory, right? Yeah, you can actually train that model in one second. Now of course, we know that it's probably not possible right now, mainly because smashing all those software, millions or tens of millions of function calls and things like that into that one second is extremely difficult. So today when we are actually looking at the fast run times for LLMs, we actually do quite a lot of work, such as batching the request at the same time, such as in a speculative decoder meaning that using very small models to actually predict what the next word should be and have the larger model just do verification instead of production and things like that. We need to basically pack multiple CUDA calls into one single CUDA call so that things get more efficient and those kind of CUDA kernels that we call it don't have to wait for each other inside that driver. And all those kinds of things actually made us able to run something like 10 times more efficient than some of the vanilla models and also I think maybe three to five times faster than the best open source solutions. There's also quite a lot of tools around those kind of run times. It's not only just one single engine, you can basically smash this engine into like one big box of AGPUs and they get the fastest speed for example, but then it's gonna be a little bit luxurious. It's as if basically you're just like- If you can ever find AGPUs for- If you could even find AGPUs exactly because AGPUs are such a shortage right now. Imagine if you're driving a Lamborghini to get like a nail from Home Depot, it doesn't make sense. And one of the key challenges right now in AI is like you said the applications actually do not generate a lot of money. It's not like one alarm question answer is gonna be like giving like $10 revenue. So basically we will have to actually run it in a very economic way because a lot of the applications are actually doing productivity not really generating direct revenue. It's increasing the productivity efficiency of people who were used to be doing those kind of tedious and repeated job, which is good thing. And so basically you know like how do we actually run the AI models at scale under the SLA requirement? Not any faster, but fast enough and then cheap enough kind of situation. That's like pretty important. So we built this cloud data platform allowing people to actually naturally evaluate and find the best balance. Of course we have the fastest runtime but I think most proud things like we have the Lamborghini but most proudly we have the Camry in some way, right? So people can actually choose among this spectrum find out the best way to launch models and then when they decide how to do it. So we have the cloud native platform to allow people to seamlessly scale things actually like to monitor the performance to make sure that all those kind of old and good SRE kind of work can be done without too much work. So one thing I'm curious about what's the alternative for developers out there, right? In Alma's world as we discussed there are plenty of blogs, you know there's a long chain, there is no self, you know Lama index you can build yourself, right? In your world, what is the alternative if they don't work with it? Great question, yeah, right. So basically there are honestly quite a lot of like really good solutions out there. I think one year ago what people had to do is basically to find a GPU, access it into it install PyTorch and all those kinds of things and start basically running the model themselves and hope that it doesn't go down. Today, first of all, there is quite a lot of open AI compatible APIs. So if people are already using open AI then these APIs are having the exact same function signature, you just need to like change the URL and you're good to go. And the price is actually interestingly raised into the bottom, especially for the public models that such as like Mixtro or Lama and things like that. So I think for expectations it's really good. A lot of the tools such as Lunchain, Lama, Index, Versailles, AI, SDK has already have those companies embedded. In addition to that, what we allow people to do is basically to run their dedicated models either fine-tuned or actually like trained from scratch kind of models efficiently as well and have their own hardware and software strategies. It's kind of similar to if people use Cloudera, Hortonworks and things like that people will be able to define their own like table architecture and things like that and run their own SQLs but there is a very worry-free platform to help people to do all those kind of experimentations, productization and stuff like that. So I want to add something on this. So one of the biggest concerns actually that many organizations have is lock-in. They're very worried about lock-in right now. So with data you have some lock-in but you always can move your data somewhere else. So it's expensive but it's not impossible. But with fine-tuning, if I'm fine-tuning a model at OpenAI and OpenAI holds the weights of the model and I stop working with them then my business stops because I have to go find you from scratch somewhere else. So the lock-in factor is huge and that's concerning a lot of the large enterprises and that's why they're leaning more towards no, we want to build and run and train our own models so we can control the weights of these models and hence control our destiny. So that's a number. Yes, start with Lama or Mishko. No, I guess. No, croc. And Google has like Gamma as well. Gamma, yeah. And also you're on Hasbrock. Yes, I'm great, I'm great. Yeah, there's so many good options thanks to all of these companies. So what you two, what two of you are seeing people more and more towards the Open model now? Yes and no. I mean GPT-4 is still the best. Like I'll be honest with you in terms of hallucination rates, in terms of accuracy, reason and planning, GPT-4 is still the best of the best. So it depends on exactly the analogy that Chang-Jin gave around do you want the Ferrari So it depends on the use case. For some use cases, you really need the Ferrari. Like medical, for example. For some other use cases where you're serving some questions on your website, the Camry is good enough. Then the open source models will work fine. So I love the analogy, I'm gonna be using it. And by the way, that's actually here right now of the industry is the inference cost of running these models. Like many people don't realize that that cost is very high. To just run them, forget about training is high as well. But running is gonna be ongoing forever. As Chang-Jin pointed out, the most companies have not collected the revenue. They're new revenue yet, right? So inference cost matters. So you have to lower the inference cost as much as you can. So at Victoria, we spent a lot of time optimizing our entire stack on the inference side, leveraging certain chips from AWS called the Enfrentia chips, which bring down the cost of inference significantly. But that's the Camry option, right? Or the cheaper option, right? So you still have to balance these things. But if you're not cognizant of it, the cost of the system is not gonna work for the use case you're after. And this is where the solutions you provide are very, very important. We can be like great help basically, yeah. So even though the two of you are doing the software stack, but you are PhDs, computer science for a long time, now you see this craziness of the, you know, the NVIDIA, NVIDIA with a stock price, as well as, you know, the demand, right? More important, the demand for the GPU. Do you see a demand shifting, you know, or this is a, you kind of mentioned that a little bit earlier, right? You know, there's a build out in the early days. But do you see that alternative, whether AMD or other hardware being, you know, pretty, you know, practical, what do you see? I think there's definitely like an opportunity for hardware vendors to be more diversifying. So the reason is basically, if I were to answer this question 10 years ago, basically say NVIDIA has the absolute dominance, mainly because people really don't think about hardware as hardware. People think about hardware as hardware and software combined. Correct. In the CPU world, basically there's XKD6 and things like that. So we don't really worry too much about what kind of instruction sets are down there. We worry about the C language or Python or things like that, or Java, right? And then basically in the AI world, then basically 15 years ago, or actually 20 years ago, NVIDIA started this CUDA library. It's a very C-like language that average software engineers will be able to learn in an hour and then start writing their own first parallel program. And they're like, I was like, that was awesome, right? And it's not that me being awesome, it's because there is an excellent software abstraction that allowed people to interface with this hardware really efficiently and effectively. So when you wrote the cafe, it was on top of that. It was completely on CUDA. The whole AI software stack today, for better or worse, is built on CUDA. Yeah, NVIDIA's dominance is because of CUDA. Exactly, yeah. It's not because of the Lizericon. Well, actually, it wasn't quite fast at all, but CUDA is like a dominant factor. Yeah, exactly. Yeah, the stickiness comes from CUDA. Exactly. So then that's why, when people were basically writing random program and things like that, CUDA is like the absolute limited factor. Well, not limited factor, sorry. CUDA is basically the barrier preventing others from coming into the field. Now, today things are slightly different, mainly because people are actually moving up stacks. Instead of basically writing, say, like convolutions or things like that, which are components in those models, people are like, I just want to allow my model to run. And so inside it, there's quite a lot of optimizations that one can do without disturbing the development flow of the algorithm engineers, right? And so basically I think over the years, then people are actually also catching up with the media by providing CUDA-compatible drivers like SDKs and things like that. Notably, AMD has this library called RockM or the technical name is Hippify. They take the CUDA code, they Hippify it to basically be able to run on AMD hardware, right? Now, of course, I think there's still quite a few months, years, I'm not quite sure, hopefully things get fast. There's some time for people to be comfortable with the compatibility layer. These kinds of things happened in the past with varied successes. For example, Java, Oracle and Google has just like famous debate and things like that, which actually made it an open standard. There's also kind of like hardware pieces that didn't pan out such as transmitter, which is basically to try to emulate X86, right? So I think it's an open field. There's so much interest today that something good is coming out of Fusho. Yeah, so I would say two things. First, the need for a semiconductor that can do AI at scale will continue for many, many years to come. That market is absolutely gonna be one of the most healthy markets in the future. There's no question about that. But that's different on whether NVIDIA can be the dominant player in that market going forward forever. That's a very big question. The thing holding back the other solutions today is exactly having a good compatibility layer that can span these. You mentioned a couple of good options. Another one is Mojo, right? Mojo is great, yeah. Mojo is gonna add an abstraction layer that makes it very easy. Once I program for once, I can deploy on AMD. I can deploy on other kind of architectures as well. There's a number of startups, as you know, building very, very cool stuff in that space. My prediction, and I could be wrong, but my prediction is NVIDIA will get NVIDIA'd. So what I mean by NVIDIA will get NVIDIA'd. There was a company called Silicon Graphics. Do you remember Silicon Graphics? Silicon Graphics were selling these $20,000, $30,000 workstations for doing graphics work. And they were very expensive. They were charging money left and right. They were doing very well. And then NVIDIA came in and said, forget about that. Here's a very cheap 300 graphics, $300. And in case you saw the opportunity 30 years ago. Yeah, yeah. Here's a $300 graphics card that you can plug into a PC and you have a more powerful workstation than what Silicon Graphics sells you for a fraction of the price. And essentially NVIDIA killed Silicon Graphics. So that's why when I'm saying NVIDIA will get NVIDIA'd. What I mean, there will be a company and a bunch of companies that will come out because they see the opportunity of the Silicon that will create very, very good, if not better solutions for doing AI on semiconductors. And at that time, the NVIDIA market share will get corrected a bit. But right now, we're at the beginning of the bubble, exactly what happened back then in the internet. And that's why you're getting this very rich frothy valuations, but that will converge toward what makes sense in the future. As long as we figure out that abstraction layer that makes it easy to deploy what we're working on across these different modalities and not this CUDA lock-in into NVIDIA as the only choice. And both of you think CUDA locking will disappear in the next few years. Because startup will innovate, will give you the compatibility layer. People may not be comfortable about it today, but people will be comfortable about it. Yeah, Intel is not standing still. ARM is not standing still. AMD is not standing still. They all know that this is where the money is. So they're all building amazing new things on the software side that we'll see very soon. Like the ones you hinted to about AMD. Absolutely, yeah. I like the analogy. Who's to NVIDIA today? Like NVIDIA to your SGI about a year ago. So history will always repeat. Like you see the GROC, there's a semiconductor company called GROC. If you're following them, they seem to be doing very well. Pretty good. Absolute fastest speed in the market right now. On the inference side, they're just amazing. And then what's the name of the company that makes the mega mega servers? The servers, yeah. So there's company called Cerberus. They make these chips this big full of cores and they seem to be doing very well as well. So I think there's a number of really good solutions coming up which need better programming interfaces against them. Right, right. I think also like the hardware land is like, there's a lot of parameters to explore. So basically Cerberus is basically saying, instead of like building smaller chips, why don't we just go big and really big, right? And then basically like GROC is basically saying, why don't I use SRAM and a lot of Fast Compute? And there's another company called SenseManova, which is basically saying, why don't we just have a large amount of like memory and a basically Fast Compute? And those parameters are thinking like, only one company wouldn't be able to explore them all. And surely there's gonna be some interesting results coming out of it, right? Yeah, I mean, Jensen actually recently said, he thought a CUDA would have an overnight success in 2006. But of course, it took a few more years. Those companies you mentioned, right? They may not be that thriving today, but there are enough opportunities for them. Yeah, there's no question that printing is there. The only thing I would not debate is, is semiconductor for AI gonna be a flourishing market for many years to come? 100%, 100% right? So history will repeat. So speaking of the history, right? We are kind of in a different phase for the cloud, right? Yang Qin, you and I had a conversation before about, what is the new cloud, right? Because in the cloud that we knew to last 10, 15 years, will be quite different. Can you share with the audience about your view about, what's the next generation cloud that we're gonna look like? Right, I think in the cloud, to answer this question, we probably need to have a brief review of what cloud did, right? So basically in the very old days on compute, it's basically a high performance computation. You have data centers for actually like HPC clusters and things like that. Cloud came in the web service or internet era, basically saying instead of having all those kind of crazy compute, what we really need to do is cheap hardware and moving data around as fast as possible. So if you think about the largest cloud provider today, it's called AWS web services, right? So then cloud basically provides two key value propositions, one is basically supply chain. I'm gonna be able to give you an elastic amount of small CPUs. I'll basically break down those big CPU boxes using virtualization and things like that. And then the supply is very elastic, CPU machines. The second one is basically cloud allows people to install and acquire and run software as easy as possible. And by software, I mean all those middleware software like load balancers, you know, like say message queues, small databases and things like that, log stores and things like that. So cloud is basically saying, hey, web developers, since I'm basically saving a lot of money and elasticity, how about I charge a huge premium on top of that? And people are like, that's fine because it saves me people time, right? Now today AI is coming back as an HPC kind of world that disrupts these two key value propositions. Supply chain, we all know that this kind of remark is tricky and it's not only just buying small pieces of things, GPUs for better words today cannot really be virtualized into like small chunks of compute and you can't really repurpose GPUs to other like say databases or things like that. So it's very specialized in a way that people start being very careful in budgeting, planning and operation of these GPU machines. As a result, an elastic supply might come in the future but doesn't apply immediately right here. And the cloud surcharge right now would be a little bit tricky to justify. Software-wise, all those middleware is not really too much needed. I mean, they are gonna be needed down the road but not much needed in GPUs because it's JBug, just a bunch of GPUs, right? The IO isn't too big, you basically have a few kilobytes coming in, tens of seconds of compute and then a few kilobytes going out. So it's more about optimizing and running the software specifically, PyTorch, Huggingface Transformers, Fast Transformers, Flash Attention and all those things in that box. So the easy way to acquire and operate software isn't a big problem or it becomes a different problem how to run those software more efficiently. And as a result, what we see is there are alternative cloud providers coming up, especially on GPUs. So there's Lambda Labs, there's CoreWeave and there's ML Foundry and a lot of others that I forgot to mention. Their challenge, of course, is not a cloud native layer, right? It's basically the raw machine similar to the old school slurm cluster and things like that. So I think the future of the cloud is gonna be a hybrid of both. The supply is gonna be a more varied suppliers will be able to give you the most cost effective and easy to plan kind of GPU resources. On top of that, the cloud native layer isn't going away anytime soon because we don't want to go back and deal with 64 IP addresses, right? So that layer, it's exactly what we have been building and we see people actually happily utilizing this what we call the multi-cloud strategy to actually optimize their AI hardware and platform, like budgeting and operation really well. And of course, fast runtime and things like that where they mentioned it. So I think cloud is like the cloud nativeness of running things is continuing to stay. And then people will, clients basically will have more and more flexibility or sovereignty to basically own their own hardware and software and the whole stack. So basically from your perspective, the workloads are going to be already not going to, already very different from what we run or we run for cloud for the last 15 years. And then the demand, the technical demand, right? The kind of the GPU, you know, you really need scheduling things in and out fast. It's not quite different. It's different, but it's needed though. I mean, like I would say, as you correctly said at the beginning, AI is the third pillar, it is the third pillar. So we're just at the beginning of it right now, right? So that's why there is still a lot of learning curve going on and there's still lots of software services being built that will make it more uniform and more reusable. And the supply problem will be solved over time, right? So I'm a big believer cloud is here to stay. But in addition to the reasons why you might, hybrid will still be here to stay. In addition to the other reasons that you mentioned, there is also the concern about lock-in, which I mentioned earlier, when it comes to AI and training models, there is a concern that if I'm using that cloud to do my training, I'm locked into that cloud. No, I want to own this myself in my own hybrid environment. So that is a big concern that I see creeping up on us in the future. And then if you have any specialized AI models that are being built for certain use cases unique to you, then you might be worried from a differentiation point of view of having those being done in the cloud. So Amar, you started, you founded a cloud error, right? Cloud error probably is the first company, Silicon Valley startup with a name, with a phrase, clouding it. So I'm pretty sure when you start a company, you are thinking about cloud a lot, right? Cloud error, I have to believe that's the reason. What is the key difference between the cloud that you imagined back 2008 versus today? Like, can you give it, because you have a lot of perspective, you know? Yeah, I mean, first I would say the genesis of cloud error was actually on-premise software, not cloud software. But the goal was- Why did you name it cloud? Because the goal was to enable organizations to be cloud native, meaning they have architecture inside of their organization in their own data centers that scales up and scales down. And the developers are thinking a serverless way. They don't have to think about how many servers I need to finish this job. They just think about the job and then the cloud native infrastructure takes care of handling that for you. So essentially, cloud error was about bringing the cloud benefits- Cloud benefits, cloud characteristics. To be with you on-premise. And you have to be, also, you have to recall cloud error was starting in 2008 when the public cloud was still not a thing, actually. It was still being formed. Like, it was a toy. Yeah, exactly. It was an experiment back then. Today, clearly the public cloud for all the great reasons Hong-Jung mentioned is winning. There is no question about that. It gives you lots of elasticity. You can go and ask for a thousand servers and you can get the thousand servers over a few minutes. You can program across many, many different types of microservices available to you. There's so many benefits to working in the cloud. Victoria, we are 100% cloud. Like, we don't have any on-premise. Like, it's all of our infrastructure is running in the cloud. So you have way more conviction in public cloud at this time around. The only problem, the only caveats I would give, as I mentioned, lock-in earlier. Lock-in is a concern. If you have a specialized use case that the cloud is not ready for because the AI market is still in an infancy, then the cloud might not be ready for your use case. And third, unfortunately, de-globalization is affecting the cloud right now. And what I mean by de-globalization, meaning the lack of trust between different countries is starting to really kick in, especially after the Russia-Ukraine war where we in the US, we shut down many services away from Russia, right? Including the cloud services. So there's many other countries across the world right now worried like, should we be using US cloud, Amazon and Google and Microsoft when they could turn that on us if we don't agree with them? Or is it like this? Yeah, yeah. So there's some tension happening in the world right now because of that. There are some countries in the European Union, for example, that are mandating that the big cloud providers have to build cloud versions that are managed by somebody else in the country, just in case in the future there's a conflict, then they can tell them, okay, goodbye, we're gonna manage this cloud ourselves. Don't give us the new versions of the software, but we can take this now and run with it. So that's my only concern with the health of the cloud in the future is whether we as a world can go back to trusting each other versus distrusting each other. Right, in some ways it's kind of a de-cloud or a de-public cloud, sort of a wave with that. De-goblization, de-cloud, that's it. Right, yeah, yeah, right. So that's cloud, right, you know. I'm a believer in cloud though, I wanna say I'm a believer in cloud. You're believing public cloud way more than a decade ago, but we are in a very different world, geopolitics, all that kind of things. So, okay, so we talked about the building AI, we also talked about the cloud, the future of the cloud. So let's talk about the open source, right? Both of you have years of experiences in open source, right? You know, Hadoop, right? In some ways, I don't know, Hadoop is such an important stack for the entire Silicon Valley, right? And you are the person who first commercialized this thing. I'm pretty sure you learned a lot from this joining. Can you share with the audience, you know, first the kind of learning you have and how do you apply this learning in your new company, right? This is your third company, you know, how do you apply that learning in this company? Yeah. So it's very simple. So open source is very useful in distribution, meaning getting your attention and getting you developers using you and leveraging you. It's very good for that, right? So it gives you almost like free marketing, can think about that way. That can be very viral as well, that can allow you to grow very quickly. That's the benefits of open source, the main benefit of open source to be honest. You get a secondary benefit, which some of these developers might contribute to the code and help you make the code better. But that's a distant second. That's a distant second. It doesn't always happen to be very blunt. Like a cloud data, we still carry the most of the workload of making sure the system evolves in a healthy way. Same thing, Red Hat with Linux, they carry most of the workload for Linux. Now the problem with open source is two-fold. First, there are some big companies without mentioning names, big cloud vendors without mentioning names. The name starts with an A that we call vampires. They just take, take, take. They never give back, right? They just take the open source. So as soon as you put something out in open source, they launch a competing version in their cloud and you can't compete with them because you're gonna go and say, I'm gonna be more secure. No, you're not gonna be more secure than them. I'm gonna be more reliable. You're not gonna be more reliable than them. They own you, they have the underlying infrastructure. So it's very hard to compete with them. And then they are good at scaling it. And they're very good at scaling it, exactly. So I'm not selling it as well. So immediately you put it yourself at a disadvantage as soon as you do that. The other concern, which is more after this primary one is your customers become your competitors with open source. Like many times we'll have customers deploying tens of thousands of nodes of our cloud data open source. But we'll tell us, instead of paying you $5 million per year to maintain the software, we're just gonna go hire three, four engineers and have them do that. So now your customer becomes your competitor. They're saying we're under your software ourselves. Or we'll go get Accenture and have Accenture do this for us instead. So the key point is it's very hard to monetize open source because of these effects. So you have to be very careful what you open source and what you don't open source. I used to be a fan of what's called open core, open source models where your core is open. But then you have things around the core like management and security that are not. I now shifted my views that the right model for open source monetization wise. By the way, you shifted it because all the reason you mentioned. Yes, yes, because of all of my experience with Caldera, I shifted my view now to be, no, to create a successful business around open source, you need to be open perimeter, not open core. So your core needs to be closed and then you have things around your perimeter which are open that help you build the community. If you open source your core, then that big company that shall not be named or your customers will become your competitors very quickly. Very interesting. So as an example, what sort of things did you do? Yeah, so one of the things we open source is our hallucination evaluation model which that sits at the periphery of our system. It's not in the core of our system. And it's just one of the pieces that make the entire... Now, what's the value for you in this case? The value is the distribution, the marketing. And you get attention, you get a lead generation, marketing. You get developers branding knowing about you. Like as I mentioned earlier, the hallucination model we released is the number one model on Hugging Face right now. If you search for hallucination, it's the number one model. We also released a leaderboard which ranks all of the large language models by their hallucination rates. It's now becoming the benchmark, the industry benchmark for how other companies, when they release their generative models, benchmark their hallucination rates. So that now accrues back to you as a company a reputation that you're good at that thing, right? And developers begin to know you that, hey, if I need to reduce hallucinations, I can go grab the model from Hugging Face and try to do it myself, or I can just plug in Vektora and it just works out of the box right away. So it brings you the distribution without giving away your core value which is the platform end-to-end that you have built. So you shift data from, I open source the core part of it, but I figure out some monetization to where you start with close-off things. However, selectively, thoughtfully, open source certain components to still get attention, marketing value out of it, but it's two very different models. Exactly. Very interesting. Yeah. Again, that's my opinion. It's an experiment. We'll see if it works. There's many examples of it working. Again, so if Cloudera was limited in how much it was able to grow because of this, like Cloudera right now is I think around two billion ARR in revenue every year. Red Hat, which was pure open source, also was limited for how much it hasn't grown. But if you look at other companies that did the open perimeter like Datadog, Datadog, all of the connections that you build, the libraries to connect an instrument, your flow into Datadog is open source and free and you can go see it and get up and grab it. But the core of the Datadog platform is completely proprietary. Look at the valuation of Datadog as a business and how much money they're making. It's way, way higher. So the evidence is kind of leaning towards this thesis I have that open perimeter is a more successful business model than open core. Nice. Yang Qin, you also have a lot of insights. When Alma mentioned about open perimeter, a few things I was thinking about a few things you talked to me about it. So can you share with our audience about it? I absolutely agree that basically open core model is like going to be very different. So I think in the past, basically what we're thinking about deploying software and making business around open source software, the idea is the core is open, the tool chain is open and then we sell basically SREs in some way, operation. Operation is kind of like tricky and dirty and people don't want to do it. What happened in the last 15 years is basically cloud providers just made it really easy and cloud native software, such as Kubernetes made it really easy. So, and also companies such as the Rocket of Us and the other A that I served in the past, right? We were able to basically reduce the cost drastically because of the scale and also because we sell hard and things like that, we can actually massively subsidize the software side cost, right? And as a result, basically open core, being a business no longer applies basically because copycats or basically whoever was able to just copy open source and just like reduce the cost of SRE is going to be just like winning that small margin. So open parameter is definitely one, like really interesting way to go for it. Basically like building solutions that are actually like still kind of tedious and difficult for people to run themselves and also having special knowledge about how do we actually like make sure that hallucination is properly monitored. The other thing as a kind of like a runtime system provider what I see is basically in addition, instead of open core, it becomes open standard. So if you think about Databricks, which is massively successful, they start with Spark and then I mean everyone has Spark, right? EMR and things like that on every single platform. And Databricks instead of basically continuing going open core, it basically says, well, you know what? The standard, the interface is the same but we have a much faster runtime called Photon and this is our key advantage over the open source versions. Open source version is still great if you want to run it and at low cost it's still okay but if you go my proprietary version, I'm going to be able to do three to five times faster, right? And I think a lot of the open source companies today actually start doing it. In the AI world, Mistro is doing the same as well. There's the open source Mistro model and Mistro model and they were like, we have a larger model and more intelligent that is proprietary. That's Mistro medium and large and things like that, right? Basically, they use open source in the old classic way, use open source to basically attract traffic, marketing, market leadership kind of positions and then they were like, we have an even better version. Now, this of course actually brings a lot of challenge to these companies because they need to produce an open source version that is at or hopefully slightly better than a market standard. And on top of that, there will be even like, yeah, they need to have a proprietary version that's even better, right? So that is much difficult than basically like the open core model but I mean, it's good for the society because it pushes things forward. So I think open parameter is basically like the application side story and on the system side story, I think the similar kind of like story is basically open standard. I'm going to call that dual core. Dual core. I mean, you have two cores, one core which is open, that's what you give the market and the one core which is way better that is proprietary and that's how you pull in the market. So multi core by same interface. Exactly, yeah, right. And with that, you feel like that is the better model. It poses some challenge in terms of the, it demands not post-challenge, it demands more innovation, right? It demands more innovation, exactly, right? The good thing, the difference between this versus the old completely proprietary software is that lock-in effect gets like less, right? It's just a cost play. You are paying intelligent people for their hard work and if you don't really want to, then you can have a slightly like less performant version but still open. You're not going to be just like killing your business because you're not going that way. And that gives people peace of mind and it's like more collaborative than the old proprietary software where if that provider company dies, then all hell broke loose, right? So how do you apply this philosophy in your own business? Yeah, right. So while we've actually built a really fast-running for LNM's and AGC's and things like that, the interface is completely compatible with in like, say the open AI standard which arguably is open, right? Or the open AI's runtime is closed. And also there are also kind of like close software such as like, well, sorry, there are other open source software such as VLNM which is basically also like provide interfaces. So the interface is exactly the same. For example, on leptin, you'll be able to just like give me the hugging face model string and then automatically it gives you an open AI compatible endpoint that you can run. Now, if you don't really want to go with leptin, then that whole thing, that interface is exactly the same, right? So we can pull up a GPU and then we basically just provide value and make business because we have a high performance and a more stable kind of runtime. So I think that's kind of like the way. We're still thinking about whether we want to like contribute back some of our pieces because we love open source and then it's also from Berkeley and things like that. We're kind of like finding the balance mainly because also it's a people play basically with a small startup. We kind of feel that like it's irresponsible to throw things over the fence and not maintain it, right? So we're working with open source communities such as VLNM to basically see what is the best way we can contribute. But I think at the end of the day, we want to basically achieve such a position where we help the open source community. We contribute back, help is a self-reglant word. We contribute back to the open source community by raising the bar of the technical stack and at the same time still keeping an advantage by always staying ahead. So before we finish, I wanted to ask a few questions that everyone is discussing inside a committee, right? The AGI, when is AGI going to happen? Any take on that? 2029. 2029, which month? So why 2029? That's what most of the trends and data are pointing towards. If you look at the rate of growth of how it's accelerating over the last few years and then also Ray Coswell, who predicted when it would happen, he revised, I think his original prediction was in the mid 30s and he just revised his predictions recently and said 2029, that's what he's predicted. Yan Qing. I would probably have some like a little bit of controversial answer, which is like never. The reason is I think intelligence has always implicitly been defined as anything that is not artificial. So in that sense, basically like AGI or GI, I guess, AGI is never going to be just like general intelligence. And I've also worked with some Berkeley psychological professors, really smart leaders in this field. And what we found is basically like there are quite a lot of things that are not only like just like computation. It's more about like our experience and interaction with the world and other people and things like that. Emotion. Emotion and also like it's the social contract that's only built through interactions. Like even like how do we define things? What is a cup? Is a cup a black cup? That's this like old classical question like is white horse horse, right? Something like that. All those kinds of things are actually built with human interactions and things like that. Now I'm sure in the future, there's going to be like a robot society where they figure out their social contracts and things like that. But I kind of feel that basically like general intelligence. Human always have a little bit like, a little bit beyond pure computation. And yeah, I'm hopeful. So that's AGI. What about, you know, Jason Huang advised the kids not to learn coding the other day. What's your take? Should people still learn coding? I think he wasn't saying, I think it's a miscarriage. I listened to that segment of his and I think his key advice is not, don't learn coding. His key advice is, the new generation needs to learn how to learn. What's more important is to learn how to learn. Because you will need to re-adapt very quickly. We had the luxury in our generation that we can keep the same job for 40 years and 50 years. They might not have that. The job might disappear in the lifetime. So they will need to have the ability to learn something and use it. I agree. I think his emphasis really, coding is not the only thing. You need to learn. You only know coding. You can get a job at Google, Facebook. That these are going to be gone soon. I agree with you. But in terms of the advice, in terms of what to learn, anything from your point of view. So first, learn to learn is a skill set that all of our kids, we have to be very focused on giving that curiosity and hunger to find out how things work and learn things for learning's sake. And to really refine that skill, that you're never gonna stop learning. Your entire life, you need to be learning, adapting, and evolving. Then that was true before, and that's going to be a lot more true. Even more important. More true. Way more important in the future. So that's my key advice, like the summary advice that I would give. Now on the coding question, the advanced coders, meaning the system architects, the developers, no, we still need those. Like the technology we have today is not good at that. It's okay, but it's not really good at that. The building a small function here, a small function there, even a group of functions, yes, that's gonna go away. Which that means for the younger generation, it would be much harder for them to find jobs because of that. The more experienced generation will still have jobs for a number of years because that's where you have the architecture, higher level concepts coming into play and maybe new algorithm design or whatever. But the younger generation that simply we're doing implementation of functions and implementation of small projects, yeah, that is gonna go away. So from that point of view, AI is replacing some developers in the near future. Many developers. Many developers. Many developers. Yang Qin, your thoughts? I have a mixed feeling about this and I definitely agree with learn to learn kind of argument. And once in society wise is like in the past, a lot of these kind of job changes happen over multiple decades of people. For example, farming is no longer one of the biggest employer category in the US anymore. But that then happened over the process of 150 years. And even that, we've seen horrible stories in history books about how people lose jobs and things like that. Today, I think AI is evolving so fast that things that we learned 20 years ago is very quickly becoming irrelevant. Now, it's fine to basically switch job categories within like over multiple generations of people. But if I know that after I graduate from college and my skill set is already relevant, that brings a lot of uneasiness or basically like uncertainty. And I think as a society, we probably need to like think more about this. In general, well, it's a smart society. Hopefully, like we figured it out. Do you see AI replacing developers already soon or not anytime soon? I mean, it's replacing half of me for sure. Which is for me, it's a good thing because I have time to basically think about other like architecture, applications and things like that. But yeah, it's definitely something that... But you see that it's already reducing junior developers' job today. I think it is. Although we got smart people, so leptin isn't endangered, but yeah, definitely as a market, it's tricky. It's like changing very fast. Well, it's a fantastic hour. Any last minute, last second word that you want to leave with our audience? You wanna go first? I don't know, yeah, embrace future. It's really great. Embrace AI? Embrace AI, yeah. No, I would say again, if you were trying to implement AI within your organization and you have been relying on your team, but you're very worried now about hallucinations because you're seeing it. You're very worried about copyright infringement. You're very worried about bias. You're very worried about security. Then please reach out to Victoria. So thank you, Amar and Yang Qing. We had such a wonderful conversation about why it's so complex to write a gen AI applications, to deploy it at scale, right? What's the next generation of the cloud that workload would look like? Strategy in open source. And of course, last but not least, AGI, whether AI is replacing developers or not. Thank you so much. And it's a wonderful hour. Thank you. We really enjoyed the conversation. Thanks so much, that's great. Okay, thank you everyone for watching SuperCloud 6 AI Founder Panel.