 Welcome back to Las Vegas. We're here live in the sugar cane and I'm really pleased to have Sanjeev Mohan who is the founder and principal analyst at Sanjmo, his own firm. He's a collaborator of theCUBE Collective. It's great to see you. We're here all week at Reinvent. We're here just today at the sugar cane, MongoDB at the Emerald Lounge. We're calling this, right? They give us a nice in-kind contribution so we could do our editorial. And I'm really excited to talk to you about your impressions so far of Reinvent. We're going to get into some of the details around the data platform. We haven't talked much today about zero ETL, something that was announced last year. Sounds like it's finally coming to help simplify the pipeline and I want to talk about Mongo and what they're doing with Vector Search and what the impact is going to be on the market. So first year impressions on the overall event. It's very positive. I think this has been an amazing event up to this point. A lot of announcements. In fact, we've been doing a lot of one-on-ones with a product team. Right. Literally, if I talk about Redshift, just on Redshift there could be a whole keynote. Yeah, that's true. You know, there are things that are not even mentioned. For example, we learned today that on Amazon S3, which was one of the very first service way back in 2006, a new instance class was introduced, a storage class called Single Zone, which is like 10X faster than S3 standard. But what was not mentioned was S3 has something called Access Grants, which do very low-level, fine-grained access control at S3 level. Not even mentioned. Nobody's mentioned it because there are so many announcements. It's like a bat phone to S3. Yeah. So if you start looking into the details, Amazon has really, really innovated across, everybody wants to hear GenAI, but it's across the stack. Well, okay. So let's talk about the data platform. Yeah. I mean, essentially you have, and Amazon's leaning in to their strategy of many different tools, the right tool for the right job, many different data stores. Yeah. How do they bring that all together? Yeah. In a way that they have business metadata, operational metadata, technical metadata, so that these co-pilots, or in their case, Q, can operate and take action. Right. So they can make sure that the semantics are coherent and consistent. Right. How do they do that? We haven't seen that. It's not here today. Nope. It's not here. You have to focus on that. I, yes. In fact, we have asked AWS side question and they don't think that's needed because they're going through an integration strategy. Let me give you an example. DynamoDB is a hugely successful, extreme scale key value data store. DynamoDB is used hugely. Every time you go on Zoom and you log in, that's DynamoDB. We use DynamoDB. CrowdChat runs on DynamoDB. Yeah. You know, amazon.com, there's so much. So, MongoDB has introduced vector embeddings inside the JSON document. DynamoDB does similar stuff, but instead of putting vector database or vector embeddings inside DynamoDB, they have actually extended zero ETL to OpenSearch. OpenSearch, as you know, is an open source version of Elastic, which already has inverted search. Now it has vectors, so you can even do semantic search. Right, so. But it's configuration. There's no, you don't have to do anything because it's zero ETL. Okay, so zero ETL they announced last year for Aurora to Redshift. Aurora, MySQL. Aurora MySQL, Aurora is MySQL, right? No, Aurora Postgres and Aurora MySQL. Right, okay. So, what they did Aurora Postgres. And? And DynamoDB. DynamoDB. And RDS. Yes, RDS, MySQL. MySQL, yes. Okay, so what does that mean for customers that are running a data pipeline where they're doing a lot of ETLing, a lot of standing around waiting, a lot of wrangling of data, what does that mean for them? Yeah, so what it means is, if you, we just talked about DynamoDB. So DynamoDB is literally the way you get to DynamoDB is by doing get and puts. If you want to do a data warehouse kind of an aggregation. Object store. Yeah, how do you do that? But now with zero ETL, it's becomes easier because now you can do into Redshift, no just configuration and it's done for you. And all of a sudden now your data is available in Redshift for aggregation. Okay. Let's see. I want to ask you about your criteria that you wrote. You did a Medium post and you were talking about the various criteria to evaluate AI generally, but specifically vector database. Yeah. And that functionality. Mongo, they haven't announced the general availability yet, but they announced vector search. Yes. It was amazing to see the survey from Retool. Which showed, toward the end, it showed 20% of 1400 respondents actually were deploying vector databases. We were one, I was actually thrilled to see that such a low number. We didn't take that survey, but maybe we did. Not to my knowledge. But Mongo, along with Pinecone, was the most deployed and the NPS on Mongo was much, much, much higher. Right. I was shocked by that. So my question is, first of all, were you surprised by that? Number one, number two is, does it make sense for customers to just, that are Mongo customers to consolidate onto the Mongo search and what do they lose by doing so? So I'll ask you a hypothetical, like I'll ask you a rhetorical question. How much do you think the vector search capability of MongoDB costs? I hope it's free. It's free. Why is it free? Because it is not a skew. Pinecone is an entire product. Many database vendors are selling vector search as a separate capability. And I think this is where MongoDB really shines. We use Milvis, it's open source. Correct. And I don't think we pay for the Zilla's service. Right, right. We probably could or should. But you'd rather use Atlas. Yeah. So you know the beauty of what MongoDB, and I'm not like a spokesman for MongoDB, I'm just saying that MongoDB has some very distinct advantages because their foundational database is really strong. For example, they introduced earlier on the separation of search nodes. So you can do separation of concern, you don't overload. So you can scale as well. You can scale. Yeah, now the vector search can sit on search node. So they're getting a lot of these advantages for free. So whatever goes. Extending the value of the core platform. Correct, yeah. Without having to produce a standalone separate like a skew, it's just, so you know the multi-region distribution, you get it out of the box. So I think MongoDB has that advantage. Do you lose something? I mean, presumably if I'm a, if I had to make the case for a standalone vector database, I'd say all the committers in the open source community or all the value, all the R and D goes into just that. So we have a better, same thing about time series databases and graph databases, right? As opposed to sort of making it part of a general purpose, not that Mongo's general purpose, but let's call it such. What do I lose by applying that? So I don't think you lose anything. The reason I'm saying that is because you mentioned graph database. If you have your data in a graph database, how do you move it to a different database? You cannot graph. One of the reasons why graph databases have struggled is because there is no good graph data interchange format like JSON. MongoDB uses JSON, which is a binary version of JSON. So if you have vector embedding in a JSON, it's not too hard to move it, right, for the JSON. So why not? It's also hard to query graph databases. But okay, so you're an analyst. Yeah. I would presume you wouldn't be advising investors to invest in a standalone vector database company. Yeah, I think time will tell, but I don't see any reason for standalone vector databases unless you have a very specific use case. By the way, I do have to say graph data model has been incorporated into other databases, but you still have Neo4j, which is a market leader. So it's quite possible in vector database, there might be one or two specialized ones that make it. But for the vast use cases, it may get incorporated. Have you looked at relational AI? Yes. Very much. What do you think about what they're doing? So relational AI is actually at Snowflake, they did an amazing demo of taking the Snowflake, they're running inside Snowflake container. Right. They take Snowflake data, they convert it into graph database with relational access. You just said the big problem with graph- Hard to query. This makes it easier to query. You could have the flexibility of SQL with the expressiveness of a graph database combined. That seems like it's very powerful. Does it work to your knowledge? Yeah, yeah, definitely. Wow, so that could portend to support future data apps that are intelligent. So that's kind of interesting. Speaking of vector embeddings, so Adam Salimski talked about Q today, which is essentially their co-pilot, and he said the semantic knowledge, or maybe it was Matt Wood. I think it was Matt Wood. So the semantic knowledge comes via vector embeddings. Yes. Okay, makes sense. Yeah, yeah. But are those vector embeddings, how do you take advantage of those in different data stores across Amazon? Are they separate for each? They're separate, yeah. So that's the problem we started to talk about earlier. Do they have to address that problem? You're saying that Amazon doesn't think that it has to solve that problem. So this is how they're solving it. It's actually, it's a brilliant use case. What they're saying is, let's say you are in Aurora. Okay. And you want to do any use case of LLM. It could be summarization, sentiment analysis. So what they're saying is, what you do is you create a user-defined function that you, so the developer will do it. The end user doesn't need to know that. The user-defined function is going to make an API call to SageMaker JumpStart, which is a model garden, if you may, of Titan, which is their model and a bunch, hugging phase, Cohere, Anthropa, all of that. So you make an API call and it returns back what you asked for, sentiment analysis. So now you are saying, here is my user-defined function. Do sentiment analysis on this column and without having to do vector embeddings inside Aurora MySQL, you've done that wrong trip. So this is, there's a very interesting example I learned today in a country like India, for instance. There are dozens of languages. Dozens of languages. Yeah, yeah, just inside, right. Yeah, just in one country. So if you're a loan origination officer and you get some document in a local language, you can call an LLM and so translate it into English. And they do it out of the box. So I think this is a strategy they are pursuing, which is through SageMaker, so Redshift ML, Aurora ML, and that way, it's a different thing. How many languages do you speak? I think about four or five. Have you tested the efficacy of the translation? Not yet. No, yeah. We play a lot with the QBAI and the translations. They're still, they're okay. And that's for P9 languages, I wonder how they do it. Yeah, so we had a panel with Travelers and Travelers Insurance, CIO, was telling us how they've got claims and policy and how they're bringing it together using LLM. So I asked her two questions. One was, how do you guarantee accuracy? And she said, we have to have a human in the loop right now. The human in the loop will say yes, correct, or not. And if it's not, then they correct it. That's accuracy. My second question to her was, so what is ROI? Are you seeing any benefit? And she said if it takes, let's say, X amount of time and there's seeing that time come down. Instant ROI, instant. Yeah, and I don't have a problem with human in the loop. As long as you can get that compression in time, which is exactly what you're saying. What do you want to see, last question, because I know you got to go, what do you want to see from AWS that you haven't seen at this event? I think it's a day to continue on the simplification. Like, we spent an inordinate amount of time yesterday when you were there and this morning on silicon. Intrinsia 2, Graviton 4, Trinium 2, and even Adam Silipski said we have an advantage over all other cloud providers. They haven't even started. That's not true. Yeah, it's not true. You know, so I think- He said that a lot in the keynote today. Yeah. There's the only place you can get this. It's like, well. We are on fourth generation in our fifth year and others have not even started. I think there is just too much emphasis on the nuts and bolts and building blocks. And I still don't see enough on the business use case side. Interesting. I mean, but that is their sweet spot is the infrastructure. Right. And we had, did you see the Peter DeSantis keynote last night? No, I- Well, you missed 20 minutes on synchronizing clocks across- No, I heard about it. I mean, Google Spanner solved that 10 years ago and I've been in Global Syslex as well. So- Yeah. 20 minutes. Easily, easily, easily 20 minutes. So there's a lot- It was interesting for those of us who care about such a thing, but it was educational, I should say. So talking about that. Too much time. Yeah. We also learned that six or so low earth orbit satellites have been launched, six of them. Yeah. You know, Keeper, K-U-I-P-D-R- Kuiper. So Kuiper is now, and the goal is 30,000? Yeah, yeah. So that's kind of cool. It is cool, but today we heard we will get internet into every part of the planet which is missing. What is Elon Musk doing? Yeah, right, Starlink. Yeah, so, and OneWeb is another company. So again, it sounds like they are going to conquer- They're competing in rockets. They might as well compete in satellites. Yeah. Such cheap. Thanks so much for coming over. Thank you so much. All right, Keeper right there. We'll be back for SuperCloud 5. Jerry Chen is in the house. He's coming up shortly with John Furrier and myself. We're here in Las Vegas inside the sugar cane at the Emerald Lounge. Right back.