 Okay, we're now going to try and stretch our minds a little bit and stretch SuperCloud to the edge. SuperCloud, as we've been discussing today and reporting through various breaking analyses, is a term we use to describe a continuous experience across clouds or even on-prem that adds new value on top of hyperscale infrastructure. Priya Rajagopal is the director of product management at Couchbase. He's a developer, a software architect, co-creator on a number of patents, as well as being an expert on edge IoT and mobile computing technologies. And we're going to talk about edge requirements. Priya, you've been around software engineering and mobile and edge technologies your entire career, and now you're responsible for bringing enterprise class database technology to the edge and IoT environments, synchronizing. So when you think about the edge, the near edge, the far edge, what are the fundamental assumptions that you have to make with regards to things like connectivity, bandwidth, security, and any other technical considerations when you think about software architecture for these environments? Sure, sure. First off, Dave, thanks for having me here. It's really exciting to be here again my second time and thank you for that kind introduction. So quickly to get back to your question. So when it comes to architecting for the edge, our principle is prepare for the worst and hope for the best. Because really when it comes to edge computing, it's sort of the edge cases that come to bite you. So you mentioned connectivity, bandwidth, security, I have a few more. So starting with connectivity, as you import or no network connectivity, right? I think offshore oil rigs, cruise ships, or even retail settings, right? When you want to have business continuity, most of the time you've got an internet connection, but then when there is disruption, then you lose business continuity. Then when it comes to bandwidth, the approach we take is that bandwidth is always limited or it's at a premium, right? Data plans can go up through the roof, depending on the volume of data. So think medical clinics in rural areas. When it comes to security, edge poses unique challenges because you're sort of moving away from this wall, garden, central, cloud-based environment and now everything really is accessible over the internet. So the internet really is inherently untrustworthy. So every bit of data that is written or read by an application needs to be authenticated, needs to be authorized. The entire path needs to be secured end to end, right? It needs to be encrypted and so that's confidentiality and also the persistence of data itself, it needs to be encrypted on disk. Now, one of the advantages of edge computing of distributing data is that the impacted edge environment can sort of be isolated away without impacting the other edge location. So looking at the classic retail architecture. So if you've got a retail use case, you've got a retail store where there's a security breach, you need to have a provision of isolating that store so that you don't bring down services for the other stores. So when it comes to edge computing, you have to think about those aspects of security. Well, any of these locations could be breached and if one of them is breached, how do you control that? So that's to answer those three key topics that you brought up, but there are other considerations. So one is data governance, right? That's a huge challenge because we are a database company and cloud-based, we think of database, data governance, compliance, privacy, all that is very paramount to our customers. And so it's not just about enforcing policies, right? Now we are talking about not enforcing policies in a central location, but you have to do it in a distributed fashion. Because one of the benefits of edge computing, as you probably very well know, is the benefits it brings when it comes to data privacy, governance policies, you can enforce that at a granular scale because data doesn't have to ever leave the edge. But again, I talked about this in the context of security, that needs to be a way to control this data at the edge. You have to govern the data when it is at the edge remotely. Some of the other challenges when thinking about the edges, of course, volume, scale, think IoT mobile devices, right? Classic bar edge type scenarios. And I think the other criteria that we have to keep in mind when we are architecting a platform for this kind of computing paradigm is the heterogeneity of the edge itself. So it's no longer, you know, uniform set of compute and storage resources that are available at your disposal. You've got a variety of IoT devices. You've got mobile devices, different processing capabilities, different storage capabilities. When it comes to edge data centers, it's not uniform in terms of what services are available, right? Do they have a load balancer? Do they have a firewall? Can I deploy a firewall, right? So these are all some key architectural considerations when it comes to actually architecting the solution for the edge. Great, thank you for that awesome setup. Now, so we've been talking about stretching to the edge, this idea of super cloud that can out that single logical layer that spans across multiple clouds. And again, it can include on prem, but a critical criterion is that the developer and of course the user experience is identical or substantially similar, right? Let's say identical, let's say identical, irrespective of physical location. Priya, is that vision technically achievable today in the world of database? And if so, can you describe the architectural elements that make it possible to perform well and have low latency and the security and other criteria that you just mentioned? Is it just, what's the technical enabler? Is it just good software? Is it architecture? Help us understand that. Sure, you brought up two aspects. You mentioned user experience, right? And then you mentioned from a developer standpoint, right? What does it take? And I'd like to address the two separately. I mean, they are very tightly related, but I'd like to address them separately. So just focusing on the easier of the two when it comes to user experience, right? What are the factors that impact user experience? You're talking about reliability of service. So always on, always available sort of applications. So it doesn't matter where the data is coming from, right? Whether the data is coming from my device, it's sourced from an on-prem data center or if it is from the edge of the cloud, it's from a cloud, a central cloud data center from an end user perspective, all they care about is that their application is available. The next is, of course, responsiveness. Users are getting increasingly impatient. You want to reduce wait times to service. You want something that's just extremely fast. They're looking for immersive applications or immersive experiences. So AR, VR, mixed reality kind of use cases. And then something which is very critical and what you just touched upon is this sort of seamless experience, right? Like this Omni channel as we talk about in the context of retail kind of experience or what I like to refer to as park and pickup sort of reference. So you park, you start your application, running your application, you start a transaction on one device, you park it, pick it up on another device or in case of retail, you walk into a store, you pick it up from there, right? So this sort of a park and pickup, seamless mobility of data is extremely critical. So in the context of a database, when we talk about responsiveness, two key, the KPIs are latency bandwidth, right? And latency is really the round tip time from the time it takes to make a request for data and the response comes back. And the factors that impact latency are of course the type of the network itself, but also the proximity of the data source to the point of consumption. And so the more number of hops the data has to, data packets have to take to reach from the source to its destination, then you're going to incur a lot of latency. And when it comes to bandwidth, we're talking about the capacity of the network, how much data can be shoved through the pipe? And of course, when edge computing, large number of clients, I talked about scale, the volume of devices. And when you're talking about all of them concurrently connected, then you're going to have network congestion, which impacts bandwidth, which in turn impacts performance. And so when it comes to, how do you architect a solution for that? If you completely remove the reliance on network to the extent possible, then you get the highest guarantees when it comes to responsiveness, availability, reliability. Because the application is always going to be on and in order to do that, if you have the data database and the data processing components co-located with the application that needs it, that would give you the best experience. But of course, you know, you want to bring it as close to a lot of times it's not possible to enter that data within your application itself. And that's where you have options of, you know, an on-prem data center, the edge of the cloud, max, and so on. So the closer you bring the data, you're going to get the better experience. Now that's all great. But then when it comes to something to achieve a vision of a super cloud, right? When we talked about, hey, one way from a developer standpoint, I have one API to set up this connection to a server. But then behind the scenes, my data could be resident anywhere. How do you achieve something like that? And so a critical aspect of this solution is data synchronization. So I talked about data storage and as a database, you know, data storage or database, that's a critical aspect of a data, database is really where the data is persisted, data processing, the APIs to access and query the data. But another really critical aspect of distributing a database is the data synchronization technology. And so once all the islands of data, whether it is on the device, whether it's an on-prem data center, whether it's the edge of the cloud, or whether it is really a regional data center, once all those databases are kept in sync, then it's a question of, well, when connectivity to one of those data centers goes down, then there needs to be a seamless switch to another data center. And today, at least when it comes to Coach Base, a lot of our customers do employ global load balancers which can automatically detect, right? So from a perspective of an application, it's just one URL endpoint. But then when one of those services goes down or data centers goes down, we have active failover and standby. And so the load balancer automatically redirects all the traffic to the backup data center. And of course, for that to happen, those two data centers need to be in sync and that's critical. So did that answer your question, Amina? Yeah, let me jump in here because I'll thank you again for that, I want to unpack some of those. And I want to use the example of Coach Base Lite, which as the name implies is like a mobile version of Coach Base, and I'm interested in, so the number of things that you said, you talked about in some cases, you want to get data from the most proximate location. So do you have, is there some kind of metadata intelligence that you have access to? I'm interested in how you do the synchronization, how do you deal with conflict resolution and recovery if something goes wrong? I mean, this is, you're talking about distributed database challenges, how do you approach all that? Wow, great question and probably one that I could occupy the entire session for, but I'll try and keep it brief and try and answer most of the points that you touched upon. So we talked about distributed database and data sync, right? But here's the other challenge. A lot of these disconnected locations or these distributed locations can actually be disconnected. So we've just exacerbated this whole notion of data sync. And that's what we call offline first, not just what we call what is typically referred to as offline first sync, right? But the ability for an application to run in a completely disconnected mode, but then when there is network connectivity, the data is synced back to the backend data servers. And so in order for this to happen, you need a sync protocol and since you asked in the context of a couch base or sync protocol, it's a web sockets, extremely lightweight data synchronization protocol that's resilient to network disruption. So what this means is I could have hundreds of thousands of clients that are connected to a data center and they could be at various stages of disconnect, right? And you have a field application and then you are veering in and out of pockets of network connectivity. So network is disrupted and the network connectivity is restored and our sync protocol has got a built-in checkpoint mechanism that allows the two replicating points to sort of have a handshake of, what was the previous sync point? And only data from that previous sync point is sent to that specific client. And so we, and in order to achieve that you mentioned couch-based light, which is of course our embedded database for mobile desktop and any embedded platform, but the one that handles the data synchronization is our sync gateway. So we've got a component sync gateway that sits with our couch-based server and that's responsible for securely syncing the data and implementing this protocol with a couch-based light. And then you talked about conflict resolution and it's great that you mentioned that because when it comes to data sync, a lot of times folks think, oh well, how hard can that be, right? I mean, you request for some data and you pulled on the data. And that's great and that's the happy part, right? When all of the clients are connected, when there is reliable network connectivity, that's great. But we are of course talking about unreliable network connectivity and resiliency to network disruptions and also the fact that you have lots of concurrently connected clients, all of them potentially updating the same piece of data, right? That's when you have a conflict. When two or more clients are updating the same clients or writers, I mean, you could have the rights coming in from the clients, you could have the rights coming in from the backend systems. Either way, multiple writers to the same piece of data, that's when you have conflicts. Now, when it comes to, so a little bit to explain how conflict resolution is handled within our data sync protocol in CouchBase, it would help to understand a little bit about how are, you know, what kind of database we are, how is data itself stored within our database. So CouchBase Lite is a no SQL JSON document store, which means everything is stored as JSON documents. And so every time there is a right, right? An update to a document, let's say you start with an initial version of the document, the document is created, every time there is a mutation to a document, you have a new revision to that document. Okay, so as you build in more rights or more mutations to that document, you build out what's called a revision tree. And so when does a conflict happen? Conflict happens when there is a branch in the tree, right? So you've got two writers writing to the same revision, then you get a branch and that's what is a conflict. And so we have a may of detecting those conflicts automatically and then that's conflict detection, right? So now we know there's a conflict, but we have to resolve it. And within CouchBase, you have two options. You don't have to do anything about it. The system has built in automatic conflict resolution heuristics built in. So it's going to check pick a winning revision. And so we use a bunch of criteria and we pick a winning revision. So if two writers are updating the same revision of the document, version of the document, we pick a winner. But then that seemed to work from our experience, 80% of the use cases. But then for the remaining 20% applications would like to have more control over how the winner of the conflict is picked, right? And for that applications can implement a custom conflict resolver. So we'll automatically detect the conflicting revisions and send these conflicting revisions over to the application via a callback and the application has access to the entire document body, right? Of the two revisions and can use whatever criteria needs to merge. So that's policy based in that example. Yes, yes. So you can have user policy based or you can have the automatic. Okay, I got a wrap because we're out of time but I want to run this scenario by you. One of the risks to the super cloud Nirvana that we always talk about is this notion of a new architect you're emerging at the edge, far edge really because they're highly distributed environments, they're low power, tons of data and this idea of AI inferencing at the edge. A lot of the AI today is done as modeling in the cloud. You think about ARM processors and these new low cost devices and massive processing power eventually overwhelming the economics and then that seeping back into the enterprise and disrupting it. Now you still get the problem of federated governance and security and that's probably going to be more centralized slash federated. But in one minute, do you see that AI inferencing real time taking off at the edge? Do you see, you know, where is that on the S curve? Oh, absolutely, right? When it comes to IoT sort of applications it's all about massive volumes of data generated at the edge. You talked about the economics doesn't add up. Now you need to actually, the data needs to be action at some point and if you have to transport all of that over the internet for analysis, the responsiveness, you're going to lose that. You're not going to get that real time responsiveness and availability. And so the edge is the perfect location and a lot of this data is temporal in nature. So you don't want that to be sent back to the cloud for long-term persistence, but instead you want that to be action close as possible to the source itself. And when you talk about, I mean, there are of course the really small microcontrollers and so on, even there, you can actually have some local processing done like tiny ML models, but then mobile devices when you talk about those is you're very well aware, right? I mean, these are extremely capable. They have the capable of running neural, they have neural network processors. And so they can do a lot of processing locally itself, but then when you want to have a sort of an aggregated view within the edge, you want to process that data in an IoT gateway and only send the aggregated data back to the cloud for long-term analytics and persistence. Yeah, this is something we're watching and I think could be highly disruptive and it's hard to predict. Priya, I got to go. Thanks so much for coming on theCUBE. Really appreciate your time. Yeah, thank you. All right, you're watching SuperCloud 22. We'll be right back right after this short break.