 Hello, and welcome to theCUBE's presentation of the AWS Startup Showcase AI and Machine Learning, the top startups building generative AI on AWS. This is the season three, episode one of the ongoing series covering the exciting startups on the AWS ecosystem. Talk about AI and machine learning. Can't believe it's three years in season one. I'm your host, John Furrier. You've got a great guest today. We're joined by Joseph Nelson, the co-founder and CEO of Roboflow, doing some cutting edge stuff around computer vision and really at the front end of this massive wave coming around large language models, computer vision. The next gen AI is here and it's just getting started. We have any scratch the surface. Thanks for joining us today. Thanks for having me. So you got to love the large language model foundation was really educating the mainstream world. Chat GPT has got everyone in a frenzy. This is educating the world around this next gen AI capabilities, enterprise image and video data, all a big part of it. I mean, the edge of the network, Mobile World Conference is happening right now. This month and it's just ending up. It's just continue to explode. Video is huge. So take us through the company, a quick explanation of what you guys are doing when you were founded. Talk about what the company's mission is and what's your North Star. Why do you exist? Yeah, Roboflow exists to really kind of make the world programmable. I like to say make the world be read and write access. And our North Star is enabling developers predominantly to build that future. If you look around, anything that you see will have software related to it and can kind of be turned into software. The limiting reactant though, is how to enable computers and machines to understand things as well as people can. And in a lot of ways, computer vision is that missing element that enables anything that you see to become software. So in the virtue of if software is eating the world, computer vision kind of makes the aperture infinitely wide as something that I kind of like the way I like to frame it. And the capabilities are there. The open source models are there. The amount of data is there. The compute capabilities are only improving annually, but there's a pretty big dearth of tooling and a early but promising sign of the explosion of use cases, models and data sets that companies, developers, hobbyists alike will need to bring these capabilities to bear. So Roboflow is in the game of building the community around that capability, building the use cases that allow developers and enterprises to use computer vision and providing the tooling for companies and developers to be able to add computer vision, create better data sets and deploy to production quickly, easily, safely, and valuably. You know, Joseph, the word in production is actually real now. You're seeing a lot more people doing in-production activities. That's a real hot one and usually it's slower, but it's gone faster. And I think that's going to be more of the same. And I think the parallel between what we're seeing in the large language models coming into computer vision, and as you mentioned, video is data. I mean, we're doing video right now. We're transcribing it into a transcript, linking up to your linguistics, time to the timestamp. I mean, everything's data and that really kind of feeds. So this connection between what we're seeing in the large language model and computer vision are coming together kind of cousins, brothers. I mean, how would you compare, how would you explain to someone because everyone's like on this wave of watching people bang out their homework assignments and write some hacks on code with some of the open AI technologies. There is a corollary directly related to the vision side. Can you explain? Yeah, the right as a large language models are showing what's possible, especially with text. And I think increasingly we'll get multimodal as the images and video become ingested. Though there's kind of this still core missing element of basically like understanding. So the rise of large language models kind of create this new area of generative AI. And generative AI in the context of computer vision is a lot of creating video and image assets and content. There's also this whole surface area to understanding what's already created, basically digitizing physical real world things. I mean, the metaverse can't be built if we don't know how to mirror or create or identify the objects that we want to interact with in our everyday lives. And where computer vision comes to play and especially what we've seen at Roboflow is, a little over a hundred thousand developers now have built with our tools. That's to the tune of a hundred million labeled open source images, over 10,000 pre-trained models. And they've kind of showcased to us all of the ways that computer vision is impacting and bringing the world to life. And these are things that even before large language models and generative AI, you had pretty impressive capabilities. And when you add the two together, it actually unlocks these kind of new capabilities. So for example, one of our users actually powers the broadcast feeds at Wimbledon. So here we're talking about video, we're streaming, we're doing things live. We've got folks that are cropping and making sure we look good and audio visual all plugged in correctly. When you broadcast Wimbledon, you'll notice that the camera controllers need to do things like track the ball, which is moving at extremely high speeds and zoom, crop, pan tilt, as well as determine if the ball bounced in or out, the very controversial but critical key to a lot of tennis matches. And a lot of that has been historically done with the trained but fallible human eye. And computer vision is well suited for this task to say, how do we track, pan tilt, zoom and see, track the tennis ball in real time, run at 30 plus frames per second and do it all on the edge. And those are capabilities that, we're kind of like science fiction, maybe even a decade ago and certainly five years ago. Now the interesting thing is that with the advent of generative AI, you can start to do things like create your own training data sets or kind of create logic around once you have this visual input. And teams at Tesla have actually been speaking about, of course the autopilot teams focused on doing vision tasks, but they've combined large language models to add reasoning and logic. So given that you see, let's say the tennis ball, what do you want to do? And being able to combine the capabilities of what LLMs represent, which is really a lot of basically core human reasoning and logic with computer vision for the inputs of what's possible, creates these new capabilities, let alone multi-modality, which I'm sure we'll talk more about. Yeah, I mean, it's really, I mean, it's almost intoxicating. It's amazing that this is so capable because the cloud scales here, you got the edge developing, you can decouple compute power and let Moore's law and all the new silicon and the processors and the GPUs do their thing and you got open source booming. You're kind of getting at the next segment I wanted to get into, which is how people should be thinking about these advances of the computer vision. So this is now a next wave. It's here. I mean, I'd love to have that for baseball because I always like, oh, should have been a strike. I'm sure that's going to be coming soon. But what is the computer vision capable of doing today? I guess that's my first question. You hit some of it, unpack that a little bit. What does generative AI mean in computer vision? What's the new thing? Because there are old technologies been around proprietary bolted onto hardware, but hardware advances at a different pace. But now you got new capabilities, generative AI for vision. What is that mean? Yeah, so computer vision, you know, at its core is basically enabling machines, computers to understand process and act on visual data as effective or more effective than people can. Traditionally, this has been, you know, task types like classification, which, you know, identifying if a given image belongs in a certain category of goods on maybe a real retail site. Is the shoes or is it clothing or object detection, which is, you know, creating bounding boxes, which allows you to do things like count how many things are present or maybe measure the speed of something or trigger an alert when something becomes visible in frame that wasn't previously visible in frame or instant segmentation, where you're creating pixel-wise segmentations for both instance and semantic segmentation where you often see these kind of beautiful visuals of the polygons surrounding objects that you see. Then you have key point detection, which is where you see, you know, athletes and each of their joints are kind of outlined is another more traditional type problem in signal processing and computer vision. With generative AI, you kind of get a whole new class of problem types that are opened up. So in a lot of ways, I think about generative AI in computer vision as some of the, you know, problems that you aimed to tackle might still be better suited for one of the previous task types we were discussing. Some of those problem types may be better suited for using a generative technique. In some are problem types that just previously wouldn't have been possible absent generative AI. And so if you make that kind of then diagram in your head, you can think about, okay, you know, visual question answering is a task type where if I give you an image and I say, you know, how many people are in this image, we could either build an object detection model that might count all those people or maybe a visual question answering system would sufficiently answer this type of problem. Let alone generative AI being able to create new training data for old systems, and that's something that we've seen be an increasingly prominent use case for our users as much as things that we advise our customers and the community writ large to take advantage of. So ultimately, those are kind of the traditional task types. I can give you some insight maybe into how I think about what's possible today or five years or 10 years as you sort of see. Let's get into that vision. So I kind of think about the types of use cases in terms of what's possible. If you just imagine a very simple bell curve, your normal distribution, for the longest time, the types of things that are in the center of that bell curve are identifying objects that are very common or common objects in context. Microsoft published the Koku data set in 2014 of common objects in context. So hundreds of thousands of images of chairs, forks, food, person, these sorts of things. And, you know, the challenge of the day had always been how do you identify just those 80 objects? So if we think about the bell curve, that'd be maybe the dead center of the curve where there's a lot of those objects present and it's a very common thing that needs to be identified, but it's a very, very, very small sliver of the distribution. Now, if you go out to the way long tail, let's go deep into the tail of this imagined visual normal distribution, you're gonna have a problem like one of our customers Rivian and in tandem with AWS is tackling to do visual quality assurance and manufacturing and production processes. Now, only Rivian knows what a Rivian is supposed to look like. Only they know the imagery of what their goods that are going to be produced are. And then between those long tails of proprietary data of highly specific things that need to be understood in the center of the curve, you have a whole kind of messy middle type of problems I like to say. What I think about computer vision advancing is it's basically you have larger and larger and more capable models that eat from the center out. So if you have a model that understands the 80 classes in Cocoa, well, pretty soon you have advances like Clip which was trained on 400 million image text pairs and has a greater understanding of a wider array of objects than just 80 classes in context. And over time, you'll get more and more of these larger models that kind of eat outwards from that center of the distribution. And so the question becomes for companies, when can you rely on maybe a model that just already exists? How do you use your data to get what may be capable off the shelf so to speak into something that is usable for you? Or if you're in those long tails and you have proprietary data, how do you take advantage of the greatest asset you have which is observed visual information that you want to put to work for your customers and you're kind of living in the long tails when you do adapt state of the art for your capabilities? So my mental model for like how computer vision advances is you have that bell curve and you have increasingly powerful models that eat outward and multi modality has a role to play in that. Larger models have a role to play in that. More compute, more data generally has the role to play in that. But it will be a messy and I think long period. Well, the thing I want to get, first of all, it's great, great mental model. I appreciate that. Cause I think that makes a lot of sense. The question is it seems now more than ever with the scale and compute that's available that not only can you eat out to the middle in your example, but there's other models you can integrate with. In the past, there was siloed, static, almost bespoke. Now you're looking at larger models eating into the bell curve, as you said, but also integrating in with other stuff. So this seems to be part of that interaction. How does, first of all, is that really happening? Is that true? And then two, what does that mean for companies who want to take advantage of this? Because the old model was operational. I have my cameras, they're watching stuff, whatever. And now you're in this more of a distributed computing computer science mindset, not put the camera on the wall, kind of, I'm oversimplifying, but you know what I'm saying. What's your take on that? The, well, to the first point of how are these advances happening? What I was kind of describing was, you know, almost unidimensional in that you have like your only thing about vision. But the rise of generative techniques and multimodality, like Clip is a multimodal model. It has 400 million image text pairs. That will advance the generalizability at a faster rate than just treating everything as only vision. And that's kind of where LLMs and vision will intersect in a really nice and powerful way. Now, in terms of like companies, how should they be thinking about and take advantage of these trends? The biggest thing that, and I think it's different obviously on the size of business. You're an enterprise versus a startup. The biggest thing that I think if you're an enterprise and you have an established scale business model that is working for your customers, the question becomes, how do you take advantage of that established data mode potentially, resource modes, and certainly of course, established way of providing value to an end user. So for example, one of our customers, Walmart, has the advantage of one of the largest inventory and stock of any company in the world. And they also of course have substantial visual data both from like their online catalogs or understanding what's in stock or out of stock or understanding the quality of things as they're going from the start of their supply chain to making it inside stores for delivery of fulfillments. All of these are visual challenges. Now, they already have substantial trove useful imagery to understand and teach and train large models to understand each of the individual skews and products that are in their stores. And so if I'm a Walmart, what I'm thinking is, how do I make sure that my petabytes of visual information is utilized in a way where I capture the proprietary benefits of the models that I can train to do tasks like what item was this or maybe I want to create Amazon Go like technology or maybe I want to build like delivery robots or I want to automatically know what's in and out of stock from visual input fees that I have across my in-store traffic. And that becomes the question and flavor of the day for enterprises. I've got this large amount of data. I've got an established way that I can provide more value to my end customers. How do I ensure I take advantage of the data advantage I'm already sitting on? If you're a startup, I think it's a pretty different question. And I'm happy to talk about- Yeah, what's the startup angle on this? Because they're going to want to take advantage. It's like cloud startups, cloud native startups, they were born in the cloud. They never had an IT department. So if you're a startup, is there a similar role here? And if I'm a computer vision startup, what does that mean? So can you share your take on that because there'll be a lot of people starting up from this? So the start is on the opposite advantage and disadvantage, right? Like a startup doesn't have an proven way of delivering repeatable value in the same way that a scaled enterprise does. But it does have the nimbleness to identify and take advantage of techniques that you can start from a blank slate. And I think the thing that startups need to be wary of in the generative AI and large language model and multimodal world is building what I like to call kind of like sandcastles. A sandcastle is maybe a business model or a capability that's built on top of an assumption that is going to be pretty quickly wiped away by improving underlying model technology. So almost like if you imagine like the ocean, the waves are coming in and they're going to wipe away your progress. You don't want to be in the position of building a sandcastle business where you don't want to bet on the fact that models aren't going to get good enough to solve the task type that you might be solving. In other words, don't take a screenshot of what's capable today. Assume that what's capable today is only going to continue to become possible. And so for a startup, what you can do that like enterprises are quite comparatively less good at is embedding these capabilities deeply within your products and delivering maybe a vertical based experience where AI kind of exists in the background. And we might not think of companies as even AI companies that just so embedded in the experience they provide but that's like the vertical application example of taking AI and making it to be really usable. Or of course there's tons of picks and shovels businesses to be built like Roboflow where you're enabling these enterprises to take advantage of something that they have whether that's their data sets, their compute or their intellect. I think that the- If I hear that right, by the way, I love that's horizontally scalable. That's the large language models. Go up and build them the apps hence your developer focus. Sure, that's probably the reason that the tsunami of developer is action. So you're saying picks and shovels tools don't try to replicate the platform of what could be the platform play. Oh, go to VC, I'm going to build the platform. No, no, no, no. Those are going to get wiped away by the large language models. Is there one large language model that will rule the world or do you see many coming? Yeah, so to be clear, I think there will be useful platforms. I just think a lot of people think that they're building let's say, if we put this in the cloud context you're building a specific type of EC2 instance. Well, it turns out that Amazon can offer that type of EC2 instance and immediately distribute it to all of their customers. So you don't want to be in the position of just providing something that actually ends up looking like a feature, which in the context of AI might be like a small incremental improvement on the model. If that's all you're doing, you're a sandcastle business. Now there's a lot of platform businesses that need to be built that enable businesses to get to value and do things like, how do I monitor my models? How do I create better models with my given data sets? How do I ensure that my models are doing what I want them to do? How do I find the right models to use? There's all these sorts of platform wide problems that certainly exist for businesses. I just think a lot of startups that I'm seeing right now are making the mistake of assuming the advances we're seeing are not going to accelerate or even get better. So if I'm a customer, if I'm a company, say I'm a startup or an enterprise, either one. Same question. And I want to stand up and I have developers working on stuff. I want to start standing up in an environment to start doing stuff. Is that a service provider? Is that a managed service? Is that you guys? So how do you guys fit into your customers leaning in? Is it just for developers? Are you targeting with a specific like managed service? What's the product consumption? How do you talk to customers when they come to you? The thing that we do is we give developers superpowers to build automated inventory tracking, self-checkout systems, identify if this image is milling the cancer or benign cancer, ensure that these products that I've produced are correct, make sure that the defect that might exist on this electric vehicle is makes its way back for review. All these sorts of problems are immediately able to be solved and tackled. In terms of the managed services element, we have solutions as integrators that will often build on top of our tools or we'll have companies that look to us for guidance, but ultimately the company is in control of developing and building and creating these capabilities in-house. I really think the distinction is maybe less around managed service and tool and more about ownership in the era of AI. So for example, if I'm using a managed service and that managed service part of their benefit is that they are learning across their customer sets, that's a very different relationship than using a managed service where I'm developing some amount of proprietary advantages for my data sets. And I think that's a really important thing that companies are becoming attuned to which is the value of the data that they have. And so that's what we do. We tell companies that you have this proprietary immense treasure trove of data, use that to your advantage and think of us more like a set of tools that enable you to get value from that capability. The Hashi Corpse and Git Labs of the world have proven what these businesses look like at scale. And you're targeting developers. When you go into a company saying, you're targeting developers with freemium, is there a paid service? Talk about the business model real quick. Sure, yeah, the tools are free to use and get started. When someone signs up for Roboflow, they may elect to make their work open source in which case we're able to provide even more generous usage limits to basically move the computer vision community forward. If you elect to make your data private, you can use our hosted dataset managing, dataset training, model deployment, annotation tooling up to some limits. And then usually when someone validates that what they're doing gets them value, they purchase a subscription license to be able to scale up those capabilities. So like most developer-centric products, it's free to get started, free to prove, free to poke around, develop what you think is possible. And then once you're getting to value, then we're able to capture the commercial upside and the value that's being provided. Love the business model. It's right in line with where the market is. It's kind of in really no standards bodies these days. The developers are the ones who are deciding kind of what the standards are by their adoption. I think making that easy for developers to get value is the model open sources continue to grow. You can see more of that. Great, great perspective, Joseph. Thanks for sharing that. Put a plug in for the company. What are you guys doing right now? Where are you in your growth? What are you looking for? How should people engage? Give the quick commercial for the company. So as I mentioned, Roboflow is I think one of the largest, it's not the largest collections of computer vision models and datasets that are open source available on the web today and have a private set of tools that over half the Fortune 100 now rely on those tools. So we're at the stage now where we know people want what we're working on and we're continuing to drive that type of adoption. So companies that are looking to make better models, improve their datasets, train and deploy often will get a lot of value from our tools and certainly reach out to talk. I'm sure there's a lot of talented engineers that are tuning in too. We're aggressively hiring. So if you are interested in being a part of making the world programmable and being at the ground floor of the company that's creating these capabilities to be very large, we'd love to hear from you. Amazing. Joseph, thanks so much for coming on and being part of the AWS startup showcase. Man, if I was in my 20s, I'd be knocking on your door because it's the hottest trend right now. It's super exciting. Genevieve AI is just the beginning of massive sea change. Congratulations on all your success and we'll be following what you guys do. Thanks for spending the time. Really appreciate it. Thanks for having me. Okay, this is season three episode, one of the ongoing series covering the exciting startups from the AWS ecosystem. Talking about the hottest things in tech. I'm John Furrier, your host. Thanks for watching.