 Give yourself a quick round of applause. You made it to the last session of the conference on a Friday. We're happy you're here. Definitely. This is the cloud native data getting to the X factors, which is not like the TV show, the X factor or something. It's more like a fill in the blank. We don't know what the variable is, X, versus use of that. And so I have to quickly provide some information about the fact that in the event of a fire, we should all try to leave the building. You are all for the public. Who? The public concourse. Public concourse. Oh, I know. And searching this got me really confused with concourse concourse. OK, so we're just going to jump right into it. I first want to let each of the panelists, you can find their information and their Twitter handles here if you're of the spirit to live tweet these sorts of things. So why don't each of you introduce yourself a little bit and also kind of the context that you're bringing in terms of this question that we've been sort of exploring over the course of a number of different conferences really, which is there's this great set of principles, practices that's sort of a mix of levels that are included in the 12-factor manifesto, if you will. And that's provided a great set of concrete ideas that folks can take as they're building applications and opinions that can be absorbed into technologies like Cloud Foundry to make it something where, great, we can now assume that if you're building with these, we will make it really easy to run applications. So that's wonderful for the applications. But what about the data? And so can we as an industry arrive at a set of principles, concepts that if we adhere to those, we can start to build the technologies around it to make it easy to scale your data and solve the types of data challenges that actually Dahlia introduced quite a few of those if you were in the previous session. So that's kind of the context for everyone to say, like, why am I interested in finding what that set of concepts and principles should be? OK. So I'm Stephen O'Grady. I'm the co-founder of Redmonk. Redmonk, if you are not familiar with this, is a developer-focused industry analyst firm, or probably more accurately for this particular audience, a practitioner-focused industry analyst firm. As far as context, one of the things that I have found interesting, and I've written about this a number of times, is that in the late 90s and early parts of 2000, as I was a systems integrator, I was running around building enterprise systems. And a couple of years ago, I looked around and sort of noticed that the way that we developed applications had changed radically. It looked nothing like it did when I was building applications as an SI. But on the data side of the house, things really hadn't changed very much at all. We still had many of the same tools, many of the same processes. It just didn't make a lot of sense to me. So basically, I've written about this a number of times. We're beginning to have conversations with folks like Pivotal and a number of other interested parties coming out from different angles. Basically, just to ask the question, what should data practices look like in terms of building modern, cloud-native applications? So that's me. I'm Paul Puckett. I lead our application and development and platform architecture team for federal, for Pivotal. I come from 10 years working in the federal space. And really, our focus is how we can guide our federal customers through this maturation process to really leverage all 12 factors of being cloud-native. And when you're working with a lot of legacy systems and normally systems that are mission critical or safety of life, the ability to just fundamentally change overnight is really tough. And so we work day in and day out how we mature along that way. And some of the challenges of what holds them back from really tapping into some of these great technologies, but also great opportunities that if we bulk it down and figure out with them that we can really leverage. Hi, my name is Brian Dunlap. I'm a solution architect at Southwest Airlines. I'm helping us integrate all the things and all the places better. I have a background with operational data, which is a way of saying here's stuff we need to help make decisions during the day. Hi, I'm Mark Ardido, Vice President of Digital Delivery at HCSE, that's Health Care Service Corporation. We are the largest customer-owned health insurer in the United States, and we're the fourth largest health insurer overall. We operate Blue Cross Blue Shield of Illinois, Texas, Oklahoma, New Mexico, and Montana. We have a data problem. We have a lot of data, and so we have 15 million members. We process just north of one million claims a day. That's a lot of data, and we need to deal with it. Like they said earlier, we've transformed the way we build apps, but data kind of got left behind in that whole equation. So we've been tackling that problem for going on just over a year now and making a headway towards that. Okay, so let's figure out what we need to do to not leave data behind. No data left behind. And we're gonna, especially since it's the last session, we're gonna have a little bit of fun. If you guys remember Linda Richmond, the Linda Richmond sketch from Saturday Night Live, you know, peanut, is it a pee or a nut? Talk amongst yourselves. So we're gonna be sort of rapid firing going through some things that I'll put out there as like, is it a factor? Talk amongst yourselves. So the first one we've got is data caching, right? So if you look at some of what Netflix has put out there in terms of caching is the secret microservice, you know, is doubt, shalt, use a cache? Is it a factor? Talk amongst yourselves. I guess I got nominated. I would say absolutely. I mean, I think one of the things that is characteristic of cloud data applications, again, is very, very different than when I was building applications myself, is that, you know, it used to be, basically if you had a data problem, the answer was a relational database, always. And that was basically all you'd use. You know, there were roles for things, you know, hierarchical databases, Berkeley DB was everywhere, and there were other things that would sort of be, be, you know, sort of leveraged in specialized contexts, but by and large, 99% of the time we're talking about persistence, we're talking about relational databases, and that's it. And when you look around today, you know, certainly the challenges that the web scale has presented, you know, introduce a entirely new class of problems. And, you know, caching is certainly a piece, it's one of many pieces that has come to be, you know, essentially considered, you know, sort of mandatory, you know, depending on what you're doing, depending on your requirements, but certainly for, you know, a class of cloud native applications, you know, it's gonna be really, really difficult to get the performance that you want at a certain scale, you know, without having some manner of, you know, essentially front end caching, and you know, that just brings us back to the phrase that everybody's heard, that there's only two hard problems in computer science. One is naming products, and two is caching validation. So, it's necessary, but it's super, super hard. So, is it a factor? I would say yeah. Yeah, I think it's a factor, it's not the only one, I think it's definitely a factor that's there. If you look at data stores like Hadoop and other things where most organizations are throwing pretty much everything into Hadoop today, it's really tough to get things out of that really big thing, right? And so, when we think about traditional schemas that people have made, it's I made the schema, I poured concrete on it, and let me know if you need to get that thing open. And it's a really big job to get that open and get the new things in it, but it's okay, and let's deal with those schemas streaming them to caching layers, we can deal with them in cache, mold them the way that we need to be molded, to expose to APIs to start to get that out, and it can get the performance that we need, so. It's okay if we have a little fun. As long as the chairs stay down. With respect to my panelists, no, it's not a factor. So, I've got your factor right here. This is what I see with a lot of cloud talk. Dynamo, Kafka, caching, like, so I'm gonna use a lot of 80s movies references. So, with Ghostbusters, we can strap an unlicensed data accelerator onto our back and just start having at it, trying to capture whatever, right? So, what I see is people thinking, here's the cloud, sign me up for Dynamo, Kafka, like caching. The X factor is the conversation around it so that caching isn't this magical thing and here are the trade-offs around it. Yeah, I mean, there are definitely some trade-offs, especially for Netflix where, you know, say you cache, you know, some user input or whatever, like, it gets lost, like, I liked this yesterday. I gave it four stars and it says it didn't rate, like, that's not that big of a deal, but if you talk about, you know, safety of life and submission critical systems, like a lot of our federal customers are dealing with, if you've got a, perhaps a stale cache and you've got perhaps some incorrect information, that could be a major issue when you talk about the world that they're living in. So, there are some situations where caching makes perfect sense for opportunity for users, but there are other times where you're gonna have to find some other ways around it. I mean, for us, there's, really, it's a complete duplication of a database. We're not caching anything. We're sharing, like, exact duplications because we can't afford to be wrong. Okay, would it be fair to say, maybe there's some qualifications around how you handle caching validation, that that is the factor? Yeah, so I'm right there with you guys. It's a factor, but it's the only a factor for the right scenario, right? So, I'm not gonna have a massive caching store for a system of record, right? Writing in and out of it, right? Just, that's not applicable. But if I'm exposing APIs for an inquiry-only type of approach, I need to expose claims. I need to expose membership data, product data. That's inquiry. Yes, caching makes total sense in that, and that is a good factor to have inside. Okay, so caching for inquiry. Can we, maybe, can we sort of boil it down to the bullet point there? Do you guys agree on caching for inquiry? Caching for inquiry? Yeah, I think that's, we'll come back to it. I don't know if I agree with that. Okay, well, we can move to the next one, data APIs. So, in the last time we had this panel, which was at Spriglin Platform, which Brian is the only carryover member, there's sort of like cycling through different cast of characters each time. So, we talked about how the DBA is kind of living on the other side of an API in terms of that relationship with developers. So, one of the things that I've been kind of noodling on is should data teams be orienting around building the data API as their product, right? Taking a product mentality, and is that a factor? Could that be a factor? I think the short answer is yes. I think if you look at some of the very, very high growth platforms in the database standpoint of the managed database services from sort of various hyperscale providers, and effectively that's essentially APIs. All data via an API. So, basically, even sort of taking a step back from that, if we think about how modern applications are being built, 12 factor applications are being built, in large part they're being built in composed of services, right? And I use this example all the time. Most of us probably have smartphones. If you think about the applications sort of in your smartphone, in your pocket, what are they? They're essentially wrappers around collections of services that might do geo, they might do look up to internal systems, on and on and on. But, basically, you're taking a bunch of individual services, packaging them up, and boom, here's my application. And in many cases, data's just gonna be another service that you're gonna call on. So, in my view, again, there's exceptions, but in my view, yeah, and I think to the extent that you can take your data stores, whatever they might be and expose them as a service, it's something you absolutely should do. I agree. I think that when you start to have the person building the thing that's also exposing the thing that runs the thing, you get very good usage out of that. But it's, back when we had operators and we had developers, operators were standing up, middleware and infrastructure, developers were doing engineering, and we kind of had this lack of empathy, right? The operator was like, I don't know, man, I just stood the thing up, you just work on it. And the developer's like, that doesn't work. Kind of like the old way of data of like, I made the model, how come you don't use it the way I designed it? It's like, well, it doesn't actually work. But when you, and I don't like the phrase, eat your own dog food, but if you drink your own champagne, I believe like you start to- I love champagne. Okay, good. So you start to move forward and you get actual solutions that work for people instead of designing things the way you think they're gonna be used. And so I think having data operators, DBAs, starting to expose APIs and or API people working on the data side, who starts to solve that? Well, it's, I mean, it's your last talk, kind of what you're poking at a few times is, there's that question of, do we need DBAs anymore, right? I mean, really product owners should be trading their data like a product, right? And so your DBA has really become a part of your team, making sure that you've got that stability and that relationship and that rapport. I mean, like you said in the last talk, they really become your reliability engineer, right? I mean, that's your D, your Dre, I believe it was, right? I mean, that's, it has to be the type of relationship. It's not so much where you've got a DBA that's the owner of the content, right? They're just kind of a little bit more of the custodian of the house, if you will, and really that product owner. So the product owner and their APIs that they're writing absolutely should be trading it like that. Automation is the thing, including your infrastructure, including your data. Yes, I think it's the thing. Okay, all right. Well, actually, one quick thing to add, you should all, if you haven't read this already, you should all read Google, what is it, Steve Yeggy, Y-E-G-G-E. If you just Google Steve Yeggy rant, you'll have a very, very entertaining, long, essentially rant from him about the importance of services and how Amazon in particular got services religion. And it's as amusing as it is insightful, but basically the notion is treat everything you can as a service and otherwise, well, in that case, you get fired, but that's a conversation for a different time. But yeah, Steve Yeggy, Google rant, I think you'll find it entertaining. Okay, I'm sure everyone noted that. I can't find a pen in my backpack, so I'm just gonna have to watch the replay. All right, event logs. So actually we saw this in the talk right before this in terms of, you know, Dahlia was sort of illustrating how, you know, you can, by using an event log, you can start to handle some of these cross boundary transactions, et cetera. So when done right, high volume logs give flexibility to add new unexpected services that can later become consumers, right, and catch up by replaying, given that the ordering is so strictly handled there. So is given some of the challenges that Dahlia highlighted about, you know, how do you, as you move to a microservices architecture and you start to have these cross boundary transactions, is having an event log a factor to walk amongst yourselves? It doesn't have to start with Steven, by the way. Just putting that out there. I didn't come up with this thought, but I sure like it. So I have glasses and I need to get my prescription checked. All right, so I have some vision correction. Talking point number one. Talking point number two is, when you go to the eye doctor and there's eye chart, there are the big letters at the top, and as you go down, those little letters are kind of hard to see. All right, so the point, the analogy is, as we talk about immutability, event sourcing, a distributed immutable log, those are building blocks at certain abstractions on the eye chart. And as you go down, you're becoming more concrete, and I love that analogy. So does event sourcing solve stale reads, stale writes? No, right, and people often think that it does, right? Does immutability help you in an event-based system spread your wings and scale? Yes, right, so what's tricky is we have to understand which abstractions we're looking at and building with as we choose which levels on the eye chart to come up with patterns and then concrete solutions. And I just so love that thought. So that's kind of how I would answer that. Yes, those are important letters, it doesn't mean I use them all the time, and it doesn't mean I use Kafka as a concrete implementation for a distributed immutable log all the time. Brian, you're keeping things so complicated and nuanced here, I'm just trying to come up with 12 things to put on a bullet list. I know, but I do like the eye chart example because as we go from talk to talk, like we get clues around the any-nines talk, where he's talking about how do I automate data APIs, here's a common letter called am I about to run out of disk space, right? Well, that letter exists regardless if you're doing that for Oracle or MySQL or whatever, or like as a concrete implementation. So I thought that was a good clue. Okay. I yield the floor. Yeah, I mean, I take that point, I think one of the things that, to the question is, obviously it's important, obviously it's going to address specific concerns, specific application concerns. I remember in the height of the financial crisis in 2008, 2009, one of the banks is actually using its raw transactional logs from everything from ATMs to essentially the web feed to make essentially real time decisions in terms of this person maybe underwater, like, hey, we should probably reach out before they take all their money out and banks crash. So the net is, is that, yes, we can do a lot with these things. They have not, at least historically, been essentially universally used. You know, I mean, in other words, we still talk to some organizations where, you know, it's almost exhausts, right? That I mean, sure, this is, you know, sort of outputting these event logs and we just don't do anything with it. So, I mean, I guess one way to answer it would be, should it be a factor, probably, because, you know, basically this is valuable information that can be repurposed in, you know, sort of a number of different ways. But I think we're quite a long ways away from everybody getting that religion, as it were. Okay. So like, federal government's been using this for quite some time because we live in disconnected systems, especially in classification or, you know, disconnected areas of the world. So for us, it's one of the things that we have to rely on. So it's not a factor, if you will, or the factor, but for some situations, it's really your only means, right? If you don't really have the opportunity to always have some persistent connection, even though you may be, you know, geographically regionally separated, right? You sometimes don't even have a network connection, some of our classified customer situations. So you've got to just do the delta, make sure the ordering's right and get that done. Now, Mark? I think it's, inside the 12 factor, there are some things that are just common sense. We look at them today and we kind of roll our eyes, like, we all do those things. I think in my mind, event here, in this case, is one of those things where, why aren't we all just doing this? So I'm not sure if it has to be a factor. I think we all should be doing it. More comments, Brian? Event sourcing is really hard, so go test that. And like, here's the 18 bazillion different test preconditions that can get you into and out of, out of order funky state. It's non-trivial. So it's not a golden hammer. And I would say, be careful to only use it where it provides business advantage. But the question was actually event logs, right? So when in the context of a number of other factors, this isn't saying like, this is the golden hammer that's gonna solve everything, but this is on a list of things that you should do that should set up an architecture that'll allow you to flexibly add services down the line. Now you need to balance it with other practices that help with things like stale data, right? And help with things like in the caching situation that we talked about before, again invalidated data in the cache. So you need to balance it. But is this something that to Mark's point should just be kind of one of those obvious like, obviously we should be using a shared code repository. But if you go back 20 years, like not everyone was doing that. So it seems kind of like duh now, but if we were to fast forward and just say, well this should just be everywhere. It doesn't solve everything. You need to know what it solves for. But it provides a useful construct in order to be able to solve a number of different challenges that emerge in a cloud native architecture. I think it's a party trick that can help save you. And I think a good clue is go find your data warehouse friends and they have party tricks that save themselves on being able to rebuild tables. And they may have the equivalent of the immutable log that they've saved with their tables that they don't share with people so that they can go recreate the shared tables when things go wrong. So like these clues and these patterns have been out there maybe in the traditional data warehousing environment. And I think those patterns will live on regardless of where we run our stuff. Just time check. Okay. All right, dumb storage. So I'm seeing folks starting to do some interesting things by sort of abstracting the data management away from like a really dumb storage later. So almost kind of moving some more into a middle layer, a metadata management layer. So this obviously might run until limitations in places, but is that something that could be a factor? It's how you treat your storage. I know this kind of pertains a little bit to. I mean, I take the point and certainly you do see people doing very creative and mitigates very dangerous things with dumb storage. I don't know that I would say it's a factor. I think that there are a lot of different ways to, as we talked before, there's such a diversity of data storage now and such a diversity of the way those data stores are managed, implemented and persist the data themselves. That I don't think, I'm trying to think of cases where I would say, if not universally something close to universally, this is how you would sort of do this. And this is an important thing that basically must be in almost every cloud native app. It could be that I've been sort of missing an obvious use case, but apart from, like I said, the things that we see are sort of interesting, but they're almost interesting because some of the usage is novel, right? So yeah, I don't know. I mean, I'd have a tough time saying, making the argument that this should be a legit, near universal factor, I think. I don't know if anybody disagrees. Seems some nodding heads on the stage. Okay, so not a factor. You say no, I'm gonna say no. You wanna say yes? No, I'm gonna say yeah. Oh? All right, so Southwest is talking with AWS. And when they come to town, they show S3 as the middle of their universe, right? Big dumb S3 storage, which is a very important choice if you'd ever like to maybe run, not on AWS, right? So it's kind of like, yes, it's a huge thought on do you have Hadoop as a unifying, multi-cloud, multi-presence thing or not? So yes, I think it's a huge choice. And I think the economics of all that too, like, here's what you're going to live with for the next long time, is a significant thought for cloud native data. Interesante. Okay, schemas. So... I actually get, I'm kind of like really excited to answer the schema. Okay, all right, all right. Let's see. What was your question? If you move any further forward on that chair, you're gonna be sitting on the floor. Okay. So not defining a schema front would provide so much flexibility, but is there a scenario when starting from scratch that you would want to define a schema or could we say not defining a schema on right is a factor? This is your hot take. Yeah, go ahead. Okay, I'm gonna just unleash some rage. Is that okay? A little bit? I think within reason. Okay, yes. So distributed applications are made up through distributed teams. And here's all the right guard rails, all the right boundaries for security, common practices, and simple things that will make your life way easier. And I'm gonna pick on a really simple use case. Let's standardize our date time formats so that when I go try to unify domains A, B, C, D, E, it's not different, different, different, different, different. And we're just talking big dumb date time formats, right? So that's an easy example of yes, to simplify threading the needle for us to join across those domains. I would like some sort of boundary that enforces that as we move into New Turf for cloud native places. Yeah, but I think I think I'm actually largely in agreement with that. I think the interesting thing to me is that if you watch the evolution of the quote unquote no SQL market, which was a term I thought was dumb at the time, I think it still is, it's become even dumber because most of the databases that have been classified that way have actually gone back and added query languages, but we'll leave that aside. I think one of the interesting things is that a lot of the promise and a lot of reasons for adoption for many, many of these data stores was the fact that they were schemalists. You could basically get going and start throwing things in and out of it. And really what you see from sort of a number of adopters is that they eventually hit a wall where hey, this seemed like a good idea, but now it's actually causing me more problems than it's solving. So look, we need some guidance, some rules, some structure to ensure consistency across, particularly organizational boundaries. So yeah, I think I agree with that. Okay, I think we're mostly at a time, but since we got started a minute late, I'm gonna do my last question. So test integration, writing and automating tests is critical for cloud native speeds. If you've listened to any of the talks here today, lots of love for being able to automate those tests. So what does that look like at the database layer? In this case, I don't necessarily have a like, blah, is it a factor for you? We'll just sort of close on thoughts and comments on what does that mean and look like in order to make sure that we don't slow down those pipelines, what can we be doing at the data layer to make sure those pipelines can continue to flow? I think one of the things that I've seen at least in terms of talking to users, talking to vendors as well, is going back to what we talked about at the top. The fact that the way that we manage data within organizations needs to change, and part of that is essentially testing. Because we're beginning to do very, very different things with data sources, so you have all sorts of new problems in terms of everything from cash invalidation that we talked about, but in many cases, to things like privacy concerns. If you go back, I can remember what year it was, a couple of the airlines got in trouble because they were essentially supplying the TSA with data to test some TSA systems, and that data was not managed or handled properly, and there were essentially leaks and intrusions and so on. So anyway, there's a whole range of things that from a test perspective, you want to ensure that the integrated data is sound from process to process, data store to data store, but also sort of as you move along, that hey, we need to test to make sure that this is being treated and handled properly as well. So yeah, I think there's a huge spectrum of aspects here to testing, but to me, this is one of the areas where we really need to see a lot of evolution. So it reminds me honestly of Molly's keynote about pulling kind of like making security everyone's job and sort of the process of shifting security left. Here you're really talking about kind of the data governance side of things, and how do you make data governance really part of everyone's job. And so that those tests get written because the development team is really taking more ownership of the data governance for all the data that their application is touching. Exactly right. Any other comments? Yeah, I mean if your test and development can be absolutely anywhere, right? I mean if you're changing a right function, you should be writing a read test first to make sure that you're getting what you want. I mean, and vice versa. One of our good friends did a compliance driven developments presentation in spring one last year where essentially he was validating his configuration of his entire Postgres database for purposes in the federal government of security accreditation. It's just accelerating confidence in what we're going to be using again, getting that database reliability engineer right part of our team. So absolutely. I think we need a set of eye chart abstractions that are very unclear right now for concrete things on test data management. So PII, who can see what, data governance, data flow, resetting, automation, APIs, like how do you automate all of that? A whole set of conversations that need to be flushed out to help with that. Yeah, I agree. Test automation has to be there. It's a factor. Yeah, it has to be. Okay. All right, now I know we've run out of time. Thank you gentlemen, each of you for participating in the panel. Thank you audience for participating with your presence. Thank you.