 Well, welcome back to theCUBE's coverage of DockerCon 2021. I'm John Furrier, host of theCUBE. Great lineup in this event. Got some great guests. Matt Folk, VP of engineering at Orbital Insight. Matt, great to see you. Great keynote. Thanks for coming on this CUBE. Appreciate it. Great, thanks for having me, John. So you were at Orbital Insight, you guys doing some cutting-edge work. Geospatial, big data, real-world problems. I mean, it's almost sci-fi for me. I mean, I just love how space, cyber security, data, all kind of rolling into like this whole very cool vibe, drone satellites, all this kind of stuff going on in the cloud. But there's like real action happening, right? Live on GPS is like, this is like very cool and relevant technology happening right now. Give us your take. What are you guys seeing? How's business? Give us a quick overview of what your journey is and how you guys are executing. Sure, and I think you're right there. It is a little bit like sci-fi actually, even to myself, even having been in the industry for a few years at this point. You know, we all think about big data. It's become much more a thing, especially over the past decade or two. Everything that we try and solve is big data. Artificial intelligence, machine learning, they all thrive, they all need this big data. An untapped area of that big data though is geospatial data and really data that comes from overhead sensors coming from space. So to me, that feels a little bit like sci-fi like you're saying, because that time is now. That time for us to be able to use and harness that data and provide actionable or meaningful insights is here. We got it as a company for orbital insight. We got into it about seven, eight years ago and the tidal wave of this data was just forming. There weren't as many satellite provider companies. There weren't as many different types of disparate geospatial data. And by geospatial data, anything with a latitude and longitude associated with it, right? This data was there, but it wasn't as abundant there. It wasn't clear about how we can use that data. And over the past few years, so many new use cases have really popped out. So many new disparate types of data and it's really about the fusion of all of them and getting more and more of that data. So right now, the most exciting thing is really just how much of that data exists and how much is going to exist in the next few years. And honestly, we want to ride that tidal wave along with our customers. We can deal with many different types of data here. It's overhead satellite imagery. It's cell phone pings. It's identify identification system from ships. It's everything that you can get your hands on and incorporate into this platform. And then using this to feed the artificial intelligence and machine learning algorithms to derive new insights. So it's sci-fi, but it is here. Yeah, and it's real computer science problems too. A lot of networking as well. You got this, and it's transitioning too. You were out early doing these new use cases. But what's interesting about your journey and I want to get your thoughts on this is that you guys really evolved from tackling these first kind of time problems, making solutions out of them to sequencing it to a fully on, fully built, scalable insight platform. Okay, and this is the pattern that we see in cloud native. Companies go from going in and doing things out, one-off, one-off projects, POCs, and then sequencing to either cloud native or full-blown platform. You guys have had that journey. Take us through that effort and what's the result today? No, that's exactly right. The way we started, just like you mentioned with many other companies was really around this proof of concept idea. It was going out, talking to customers, finding what their pain points were and figuring out what can we do to solve those pain points. So it was all about, okay, for this particular customer, take this dataset, take satellite imagery of this location, take cell phone things for this location, take a digital elevation model from this area of the planet, fuse them together in some very specific custom way to try and solve that problem. And that's how we started. Over the first few years of that, this doesn't really scale as well. We had to keep building new solutions to solve the problems. But we started to identify was that there's so much commonality, there's so much overlap between a lot of these problems. So the solutions for them, if we're taking truck counts, if we're able to look at parking lots and detect cars from satellite imagery and use that to determine level of trends or sales at a retail store. But we can also use that same overhead imagery to detect cars and look for their movement patterns, look for how they're going from a port to a particular warehouse or how they're driving on the road. Same thing with trucks. And we started identifying that a lot of the value from these different analytics and data sources can be used to solve many different problems in different ways. But the underlying core technology is very similar. So for us that about a few years ago, about two, three years ago at this point, we started really developing this into a platform that can solve generally any question about geospatial data analytics. So instead of I have a very specific problem, I wanted to count the number of trees in this particular area. It's, you know, you want to do something similar to lane use. You want to try and classify different areas of the planet that are covered in this particular type of lane use and then use that as an estimate for how many trees there are. So we started to find these commonalities and that allowed us to really build this generic platform we call Go and use that to start solving many, many more questions. That's good. So real world problems are emerging. I want to get your thoughts on what those geospatial problems are because you can almost think about how the traditional distributed computing world looks at the edge, for instance, like the intelligent edge or industrial edge, edge of the network as they call it in the hybrid world. You're bringing in not just moving packets, you're talking about other data medias of pictures, images, different data sources. This is a huge aperture that changes the game on the analytics. Could you share some of the problems that you solve that open up with the new types of sources? Is this not just packet to a device or an edge point in the traditional sense? Sure, sure. Now it's a great question for us. And I think a lot of people that don't really understand or haven't dealt with geospatial data before don't understand necessarily a lot of the nuances that come with this data. There are quite a few of them. So I'll focus on satellite imagery in particular for the first part here. But when people think about the, even today, what we can do with artificial intelligence, computer vision, on an image, most people go to the images they can see from their cell phones, most people start to think about self-driving cars and seeing the resolution of those images. When you're dealing with overhead imagery, satellite imagery in particular, the game changes, the game is completely different. Yes, you're still working with an image, but so many of the pieces of that image are just different and more complex. So for instance, when you're taking an image miles away from the planet, first thing you have to realize is your resolution is not quite what people think. In particular, if we go outside and we stand in a satellite image, the best commercial sensors today, we're at most one pixel. And with one pixel, you really can't identify people. Even a car or a truck at most, you're looking at 15 to 20 pixels and it becomes extremely difficult to classify objects at that resolution or at that scale, compared to if you're using a cell phone picture, for instance. So that's really for us the first major difficulty or change that comes into play. A second one would be the temporal aspect. So not only the spatial resolution, but temporal resolution. You don't get an image every day. You don't even get an image sometimes every week. So how do you impute different data, impute different points to make your overall analysis worthwhile? Again, a slew of additional challenges that come with that. Other things like, for instance, geo-registration or orthorectification. So the idea when you take a satellite image of somewhere on the planet, you actually don't get very precisely where that image is. It could be off by five, 10, 15 or 20 meters even. And you have to do work to actually relocate that image in the right particular position. Orthorectification refers to the angle at which the image was taken. If you're taking an image from a bird's eye view, yeah, you're gonna be able to see straight down and you can see buildings and they look like squares or certain right-age polygons like that. But you can also take an angle 30 centimeters or, excuse me, 30 degrees off. You can look at the side when you're taking this image and then the satellite image looks completely different. So that's another technique that we have to combat and another difficulty we have to work with in order to make this data usable. So for satellite images, for other data sources too, there's a slew of additional challenges that we have to compete, that we have to fix and work with to make this data usable. And that's what our platform does. It takes this data and it fixes all those issues and allows you to compute the analytics on top of them. I love it. I mean, as a consumer, I can relate to like say Google. And I know there's trees there, but they look like they just put trees there because they kind of guess there was a tree there and they fill it out, right? I mean, kind of similar things going on there. All great stuff. I love the tech. And I think it's going to be one of these areas that's going to be super valuable as more of these use cases can get this type of data for not only insights, but also maybe for other features and software, which brings me to the next question. How do people use you guys? I mean, as you have these use cases that are emerging, a lot of disparate use cases, different data sources, now analytics as a platform, are you guys selling softwares or the service as a fee? Can you explain, you know, how I might want to geek out and integrate you into my product or feature? Or do you do it that way? What's the, how does it work? No, sure. Thank you. So for, there's two different aspects here really. And one of them is how people actually have access to this data. So the first is that we actually make this data available. And the second is the analytics that we put on top of this data. So for the first piece, a lot of these data sets are extremely expensive. So the average even consumer or a lot of businesses, it's much too expensive to go and actually buy all the satellite imagery or all this geolocation ping data or shipping information. It's just too expensive to buy these disparate data sources if you only have particular single need for them. So the first thing our platform does is it integrates with many different data providers. It integrates with, like I said, anything that has a latitude and longitude with it, we try and get that into our platform. And we become the broker, almost the provider of all that disparate data into a single unified source. So that's that first aspect. That's how they use us to get access to that data that otherwise they wouldn't be able to get access to. The second piece is the analytics. So for this, this our platform, it does really, you put three things into our platform. You ask where you want to look on the planet, what you want to look for and when you want to look for it. And our platform takes care of going and getting all that information that it needs to compute that answer then using our custom analytics to derive what you're looking for, the question you're actually asking and produce similar to a data feed, but it's much more custom than that, particular insights for the customer. So what comes out of the platform is it effectively a time series that you're able to go explore and drill down into further. So a particular example of this is for supply chain right now, and this is a problem that we're very passionate about right now, especially with last year, how COVID impacted things. But from a supply chain perspective, we're actually able to identify locations on the planet that you're interested in, typically operating facilities and start looking at trends for where people are going to and from that particular facility. So we can see, oh, there were a hundred people that visited this facility on the last seven days and maybe produce a time series of how many people were there each day. But we can also then say of those hundred people, 16 of them came from this location, 52 of them came from this location, 53 of them visited this location two days after visiting that location. And we can start to build this entire traceability map of that particular location and that our customers can use to identify patterns and anomalies really in their own supply chains or different things about their operating facilities. So being like graph data, for instance, get some insights into how to restructure their value chains or reconfigure their economics. Something like that would be like a use case. Exactly, exactly. Finding it further efficiencies ways they can optimize their supply chains or anticipating disruptions in it. If they know that part of their supply chain is dependent on a particular facility, a particular location or a particular region and they know from other news that something is about to happen to that region, they can know proactively how to change their supply chain in order to alleviate that pain before it even happens. Well, real time in the news, just recently, just this month, early in the month, we saw that gas shortage or stoppage or shortage slash supply chain disruption happen in the East Coast, right? From the pipeline hack, the ransomware attack. That's a good example. I mean, some people don't even know the difference in a supply chain disruption and a shortage. They're two different things. So I saw that big debate happening. This is kind of a real world example where you could say, okay, we have a supply chain potential predictive disruption. Then maybe look at ways to do that. Is that, am I thinking in the right way here? No, that's exactly the right way to think about it. You can start to see, so from that event, if you're operating a facility or a facility warehouse, you can look at that event, ask the question of how is this going to impact my supply chain? And the first thing you need to know is are you dependent on that? Is that something that actually impacts or plays a part in your supply chain? So you'd use our software, plug into your own operating facility, start to trace where people are coming from or going to. First thing you can do then is identify, is that location part of my supply chain? If not, potentially in the clear. If so, then you can start to identify different locations that might be a suitable replacement for your supply chain. So can you proactively avoid that going forward and make that move sooner than you would have been able to otherwise? I love the complexity challenge here. You guys doing heavy lifting here and offering us a service makes total sense. You can almost democratizing the whole complexity of the data acquisition and then providing value on top of it. The question I have for you is, what are the learnings that have you had? I mean, what are some of the difficulties? You mentioned the artifacts, atmosphere, haze, noise, spatial, temporal frequency before. What are some of the other things that you're seeing in learnings that people might not know about that you guys have solved in this data capturing from the satellites? That's a great question and there's plenty of them. A lot of things I think you come down to is how to use this data or how best to combat some of the challenges that like we talked about earlier, come with this data set. In particular, if we look at the foot traffic data that we look at. So pings coming from different cell phones or what we call geolocation pings. On large, you can think of that as any IoT device that's pinging in their location. We can aggregate that data in and start using it within the platform. And what we've learned for that data is it's very dependent on how you can actually get that to be normalized. And then what I mean by that is, none of this data is providing a complete picture of what's actually there. So again, if we look back at, even from image perspective, when you're getting a satellite image, when you're getting cell phone pings, you're only getting at best 15 to 20% of the actual picture. And the challenges are really about going from that 20% view to the full contextual 100% view. So tactically, what that looks like for geolocation pings, we're not getting geolocation pings from every person on the planet. We're not getting pings from every IoT device or every cell phone. We're getting a particular, most randomized subset of those pings and they're all anonymized. So how do you go from that to an actual insight? How do you go from that to a full complete view of what's happening? And that's where our normalization algorithms come into play and other capabilities that we have that take that data and try and extrapolate what's truly happening. So if you're looking at a, for instance, if you look at a gas station and it's a gas station in an area that's not very highly populated and you're only getting two or three pings a day or some days you're getting none. Is that truly a signal of no one's going to that gas station or are you just missing the data? And you don't always know. So part of what we've learned is how to take that data and actually translate it into the complete picture. We have very complex algorithms and they're constantly being improved on to account for differences like people turning their cell phone off or more than one person being in a car or things like that. So that's what we've really learned in it. It's all about taking that incomplete picture and trying to produce the most complete picture with as most context as possible to solve problems. So what's the secret sauce on all this? Is it algorithms? Is it data usage? All the above. Take me through some of the secret sauce that's going on that you guys are building to make all this work. Sure, sure. And I'll go into it as much as I can. That's fine. But there's a few different, exactly. But a few different pieces really. The first one is the data itself, right? At the end of the day, no matter how good your machine learning capabilities are, if you don't have the data, you can't do anything. And this is true for all types of artificial intelligence and machine learning algorithms. If you don't have something to allow the system to learn from, you're out of loss. So the first piece of it really is getting the right data and making sure we have enough different or disparate data sources to really complete that overall picture. The second piece of it is allowing our platform to do this at a high scale. So it'd be one thing if you can produce a particular algorithm and get it to run in a single location one time, but it's all about for us asking that aggregate question. So we're not, you know, if somebody's asking about a particular gas station or a retail store, more often than not, they're not caring about just one location. They're caring about the aggregate. They're trying to look at this country as a whole and seeing what the trends or patterns are. So the second piece of secret sauce really is our platform and the ability to scale that up dynamically and allow you to ask any size question. So not just one AOI at a particular location, but thousands of different locations and how that answer really compiles together. Third one is definitely the artificial intelligence and machine learning. For us that is a extremely core competency, something that allows us to really take that data and produce the insights. And that's a key factor of it. Like I mentioned with the different challenges, part of our secret sauce there is not just the algorithms themselves, but additional techniques or different R and D that we can do to solve or combat some of the additional issues that we have with overhead sensors. And in particular, I'll point out two here, but one is the rare object issue. So a lot of times if you're doing with satellite imagery and you're trying to find an object, it's very difficult to find a satellite image of that object. If you're looking for a particular type of ship, you might only find one or two of them in thousands of images. So how do you build a machine learning algorithm that really uses that really small amount of data to produce an algorithm? And this is where our R and D capabilities come into play and one to highlight is synthetic data. So the ability to produce almost fake or generated satellite images that actually produce these objects you're looking for so that we can train or learn off of that. So things like that really build our, I'll say, our secret sauce, our R and D core competencies. The ability to produce newer novel techniques to generate data where satellite images or other geospatial data have deficiencies that we can combat. Yeah, I like that feature because you're almost saying it might look like, the ship might look like this depending upon where they're looking and muting that in there, good call. I guess the question I have for you is first of all, great tech, love the story. You guys are on some really cool stuff and very relevant. The question is, in minds of people right now who are watching is why now for the critical time, why is now a critical time for geospatial analytics? What's your answer to that? Sure, that's a great, actually the answer is great for us too as a company. As I was kind of alluding to in the beginning there's this tidal wave of geospatial data. And if we were to look at five, 10, 15 years ago the data itself and the technology was not really there to allow us to do what we're trying to do now. If you look back, I think it was in 2013 there was a particular computer vision paper that came out. There really was the birth of the CNN world. And for that, that is a core compute capability that allows us to do the computer vision we need to be able to do. So that was a extreme catalyst for companies like ours or being able to do this type of data fusion analytics. And the second one is the birth of the new sensors coming up right now. If you look back five years ago and where we're going five years from now it's almost like Moore's law. Wherever you are the year things are just starting to double. There's more and more satellites being thrown up. There's more and more data being thrown out. And frankly it's almost too much data at this point. There's just more and more data coming up. We already have petabytes of satellite imagery in our system, hundreds of terabytes of IoT device data. And it's every day just more and more of this data is coming up and being produced. So now is that perfect time because the data is finally there and it's only getting better over time. Yeah, I know we do have a little bit of time left. I do want to ask just kind of become curious. I'm sure people are too. As the leader in that company, as the engineering leader you got a team that's working on some pretty cool stuff. A lot of computer science, a lot of new technology opportunities, kind of new problems that are emerging that are exciting. So everyone likes to solve hard problems, right? You got one, right? You got synthetic data, massive ingestion pipelines, normalizing algorithms, spatial imputation, et cetera, all this good stuff. How do you organize, how do you attract people? How do you, how are you looking at this? Cause you have to lean into this. It's not like you're waiting for the market to come to you. You guys are going out there making the market technically as well. So how do you organize? How do you recruit? Take us through some of the inside the ropes there. Sure, sure. So I'll start with kind of just how our engineering team is organized right now and where we try and do find people and pull additional folks into our team. Right now we are split into four or five different areas. So like most cloud-based platforms we do have an infrastructure team. So DevOps, site reliability, IT, everything that goes into that core cloud layer. So we do have an infrastructure team that builds that. On top of that, we do have our platform engineering team. So that team largely builds our microservices that play together to produce our external API. On top of that, we have a product engineering team that builds really with developing our UI, adding in our UX, making sure everything on top of the API plays nicely together and also building a few additional Dockerized computer vision and machine learning models that can plug into the platform. Separately, we have our R&D team. This is like, we talked about our synthetic data and all other research areas that we get into. They focus there. Then we have our data engineering team. This team is largely focused on pulling disparate data sources, massaging them, cleaning them up into the right format so that it can be plugged into our platform. So from this, this is kind of how our team is structured. It's a ton of technical challenge, a lot of fun challenges. We're about 50 engineers right now. Actually, we're looking to grow almost doubling in size by the end of the year. We're going to be bringing out additional 30 people over the next few months. And what we're looking for is people from a wide area of expertise. So people that have microservice core platform backgrounds able to build on the backend system deal with tons of transactions per second and really allows to scale our platform. That's one set of expertise we like. Another one is people that really just have geospatial data backgrounds. In which to be honest, at this point, it's somewhat of a rare niche on finding people that have worked on a platform but also worked in geospatial data. But that's something that we love to bring into our system so we can add additional expertise and eventually get new data sources in. And then lastly, it's really around that core competency of machine learning and artificial intelligence. So we will look for anybody that has machine learning, deep math, deep computer science background to come in and be a part of that team. If they're capable from a research perspective, we are actually, it's possible to teach them some of the computer vision aspects as well. So if they have a computer vision background, great. They have a data science and machine learning background, great. We want that diverse set of interest and diverse set of thinking to come in and really build our R&D team as well. Yeah, and obviously DockerCon is here talking about containers and that leads into Kubernetes, microservices, all kinds of cloud native technologies. Because what you guys are doing is you're taking an old construct, I mean, fairly old, it's data, but you're leveraging it in new ways. In a way, that's kind of what cloud native's about. How are you seeing that world evolve? Obviously we're here at DockerCon. Containers helps, big time, thoughts on the containerization way that continues. And you got Kubernetes and more and more cloud native, more SREs are going to be hired. Again, people are scaling up. What's your take on what's going on around DockerCon? No, this is actually for us, it's really powerful and it's a really powerful tool, whether it's Docker or Kubernetes, the idea of containerization as a whole, it really allows our platform to get to that next level of scale. One piece that originally we were not a microservice platform, like I said, we were starting to do some more POCs. As we got into this platform play, one of the things we knew we needed to be able to do was scale different parts of our system. So whether it was scaling to ingest more data, scaling to involve new algorithms or scaling really to involve or be able to compute massive computation requests that come from our customer, this requires different pieces of our system to scale. If we were a monolithic application, if we were running on premise, that type of scale just wouldn't really be possible at the level that we needed to. So for us, it's the solution is all around being able to dockerize different parts of our system, keep them isolated, keep them talking to each other via different interfaces and then as Neve horizontally scale different pieces of our system to combat with that. So really the Kubernetes docker together, the ability for us, it's allowing our developers to focus on the code that they need to be writing and not focusing on the SRE, the DevOps perspective of it, and then letting our DevOps team use these additional tools to make themselves more efficient. We can do that with a smaller team now. We don't need a team of 50 people on DevOps or infrastructure. You can do it with five or six solid engineers that can really manage your entire environment. Yeah, I think having that horizontal scale ability is critical in containerizing it. So many benefits there, allowing things to be completely portable and integrating really well. Great stuff. Unbelievable gems dropped here. My final question for you while I got you here is, as you look at other peers and people in the marketplace, people who are on the right side of history are experiencing certainly entrepreneurs and people who are in businesses and enterprises are waking up and going, hey, I can really change the game and flip the script with Cloud Native. So people are experiencing similar journeys where they got product engineering saying, we are more of a platform. I could sequence and build out that platform and then build my infrastructure on the cloud. So you're starting to see these point applications turn into platforms. What's your advice to people out there that are going to move from product engineering, departments or groups to bring on that platform construction or that work and then build that infrastructure like you guys are? What's your advice to folks that are going to make that journey? It's a great question, John. I think the quick advice I would give to anybody that has a product engineering team considering moving to the platform is do it now. There's no time better than right now. And what I mean by that is the longer you delay, the harder it gets. You're going to be missing out on a lot of the new technologies that are really being solidified as part of the cloud computing world. Yes, there are trade-offs, especially, you might have to go to your exact team or your product team and make these trade-offs and you won't be able to develop features as quickly as you're spending time porting to a platform play, but the benefits are amazing. And once you actually get there, you really be thankful that you took the time to do it. Yes, it's going to be challenging because it's one of those things where it's an engineering benefit at first. It's not going to be something you're going to say, yes, product in two months, you're going to get this benefit from it or in three months you're going to get this benefit for it. It's one year from now, this is how our new product is really going to be able to expand and grow. And the best way to get there is to just do it now. I'm really starting encapsulating your system, break it onto different pieces, put it in the cloud, allow it to scale. And so yeah, my advice is to just bite the bullet and do it now. So people who buy into that notion of moving from monolithic to microservice-based applications want that horizontal scalability, as you mentioned. What are some of the first principles in that platform? What's on the mind of the architect or the leader as they start thinking about those first principles for the modern platform? Sure, the first one is done over design. So some people have a tendency when they start thinking about microservices is they really go to microservice, almost nano services. They really start breaking off as many different pieces of the code, making them as small as possible. And to some extent, that's what you want to do with microservices, but you don't want to go too far. And it's easy to go down that rabbit hole. So in particular, there are certain services that are microservices that you find out they're tightly coupled. They're constantly passing data back and forth and that's when you realize passing data back and forth between two different logical pieces, separation of code, it takes time. So it might make sense for them to be one unified microservice as opposed to two. So the most important thing to think about is, what pieces really make sense to logically separate and how does that actually impact the flow of data or flow of information through your system? If you're adding too many hops between a certain endpoint and the call to the backend system, might be time to rethink the way you're breaking the system down, but you really want to start out with what can be broken down into logically encapsulated pieces. And that's where you want to pull out microservices. Highly cohesive, decoupling. That's a concept of an operating system. As we say, it's a platform. That's the cloud. It's not new. That's right. It's been around. Matt, great, great interview. Thanks for dropping the gems and sharing your knowledge and congratulations for the work you're doing at Orbital. Insight, great focus. Love the company, love the excitement. Thanks for coming on. Perfect. Pleasure chatting with you too, John. And thanks for having me and thanks for having me be a part of DockerCon. DockerCon 2021, keep coverage. I'm John Furrier, host of theCUBE. Thanks for watching.