 So I won't forget that we'll try to try to keep it pretty efficient. I'm Mike Miller from cloud And I'm gonna be talking about will transactions kill no sequel Before we start well a you can guess what I'm gonna say maybe since I'm a no sequel vendor But I'm gonna try to give you a little bit of a little bit of history about how we ended up where we are Some personal projections and also company projections about directions that the markets going but then overall some reasons to Realize that maybe we're not trying to solve exactly the same problems that transactional databases were built to solve So I'd love for this to be as interactive as possible So if you have questions just shout them out like if I don't notice you don't be afraid to interrupt me I'll repeat them for the camera So when we before we start can I just get a feel who's a developer in here? Like who writes code on a daily basis and commits more than once a day, okay? Who's more in kind of business or management? I see some hands going up more than once You know those folks who who didn't raise their hand for anything like you a student great. Where are you student? Belgium cool. Okay, so this is not a bad place to learn any other demographics that I failed to represent How many people use no sequel on a daily basis? One form another okay, and how many of you are interested in learning about it? Okay, great So this I'll try to talk a little bit about some history and kind of how we got here as we go through But the idea is like there's been a lot of activity in the field I'm gonna highlight some of it in the places where I'd really emphasize that you know It's worth going to learn yourself by reading some papers. I'm an academic guy In in my own background. I got to this without going in detail This is a picture of the large Hadron Collider where I worked last I came to this very pragmatically Okay, so I did big science and we had you know the world's biggest data problem And we did not have solutions that we could buy from vendors so point blank We had to invent our own stuff for file systems for distributed computing For data synchronization over 200 data centers globally, so I'm a pragmatist I don't come to the steeped in you know the long philosophy of sequel and relational databases So I hope you find that approach a little bit fresh and but as an entrepreneur I came to this and this is just kind of my one big high-level slide Realized that big data was breaking our model of computing right so the way we would build a stack the way we would architect something Whether it's an application or whether it's analytics those things just weren't scaling to meet the needs and at the same time The world was getting very distributed right the way people Build applications the way they generate data the way they consume it suddenly it's all over the place and The technology we're using really kind of assumes that you still have a global right master for your database Okay, and that doesn't seem very relevant anymore when you have people that are distributed all over the globe and They have a finite 200 milliseconds before they're completely bored and shut off your app if you don't serve You know a page load for them, so it's clear that these two things have really you know inspired a lot of innovation I think over the last five years. I'm gonna talk about that a little bit But the cool thing is it's a market right and the market has responses things are floating to the top Right as winning ideas that help solve some of these new problems, and I think no sequel is one of those big ones I think cloud is another big one I think there are other innovations to come but you know the great thing is that these solve specific problems Right, it's easy to point at no sequel and say okay People come to it for scale right or speed or low latency or durability or something like that and cloud is also if you listen to Or shoot the the Netflix presentation today You know if you are in the land grab This is a huge huge right advantage that you can just get things going very quickly, right? And so these these are a handset or a small set of the market responses that you know We are actually part of my company. This is company. I work for Cloud is distributed database as a service the one neat thing if I have it on here Is it it's the first database that ships with the mobile strategy? Okay, so it has thoughts of what? You know type of devices are going to be connecting to it and putting data into it and getting data out of it It's kind of built from the ground up to be actually you can't see it But distribute it all over the globe okay, and that's something that we're going to talk about and those are some of the things that make transactions particularly challenging so before we jump into Kind of the meat of the question and question of whether our transactions will kill no sequel I Just want to kind of take a run through through how we got here. Do you have a question or no? Okay, so I'm going to show you a picture on the next slide Go ahead and guess what it is right if I had to show you one picture that would motivate the entire Cloud and no sequel market any guesses what it would be Oracle not a bad one you mean like the Oracle from Delphi or you mean okay those guys in red Three towers what else do we have? That's that's not a bad one I'm gonna I like people that That's a good one. That's close. I like I like people who build things and create things and I like people who are pragmatic I Close actually I've got one of those this is my this is my personal take these guys anybody know who this is I Shouldn't yeah, this is Jeff Dean from Google right and Sanjay Gemma these guys are foundational They are all over the place I started my company out of MIT because I stumbled on some of their work right and it was so incredible It was just I stumbled on one paper and it solved all the problems I had at building a certain data center and then I stumbled on the next one I kept following the line. I was like these guys are just so far ahead of us right they were publishing things in 2004 that were already outdated for Google right because they're already kind of moved on but there's a great article which I had the joy to contribute to Talking about kind of the academic roots of you know Bell Labs to Xerox Park to Google And some of the work that that these fellows did and they continue to not disappoint And it's really fun too because there's some some great stuff pulled from Google about like an April Fool's Day joke, you know like Hello, is it Jeff Dean once compiled his code to check for errors in the compiler, you know things like that pretty entertaining but if you take a look at you know The speed of light used to be three times ten of the twelve meters per second until Jeff Dean optimized physics, you know things things like that but You see exactly what I mean if you read some of their work And so you know these this is really in my mind the cannon of three or four papers that defined the ecosystem That brings us all to this show, okay There I can trace this back to like three or four papers the first which was really transformed is which was Google file system How do you build a very cheap highly distributed? File system that's pragmatic and its ability to redundantly store your data And if a node dies the store some data you just throw it away Right instead of trying to keep everything online all the time you can do this and you know this thing was running in production I'm sure by 2004 it's already been replaced with something called Colossus which is not yet fully published Right we are just getting a poor man's knock off of this called the Hadoop file system Which we're all running now ourselves and it's inferior to what existed ten years ago within Google And I would take my name on that companies like map are starting to bring some of this technology back out But really at the core you've got to find a way to store the data All right huge amounts of data within one data center between data centers So that's you know I've helped build distributed file systems to custom ones that as long as nothing ever broke We're awesome, but it's something broke they fell over right and that's the huge difference This is built from a very different mindset the next was how you actually analyze data once it's on there And this is a seminal paper about Google MapReduce And people you know MapReduce is an old algorithm It's an old design pattern But there was a point in this paper which was just like the Eureka moment for me where they show Sequential C++ code right I've done distributed computing for a decade and a half at that point in science where as a first day grad Student you launch you know hundred thousand jobs that run on the data computing grid all over the globe and your analyzed petabytes of data And get results back that part wasn't crazy But you had to do all the scatter gather and scripting yourself right you know We bought really expensive tools like platform LSF, you know from finance that would do that But every single person had to know how to do that scatter gather computing and Google MapReduce was like oh wait You don't have to right all you have to do is tell me what you want to do with one line of a file And I'm going to take care of everything else all the fault tolerance That's really neat and actually show you something that we still haven't quite gotten back to you Which is thinking of embedding within a sequential program a whole bunch of distributed computing So they have a sequential C++ program that goes down and then you realize that one of these lines that runs You know essentially what's a query or does like a word count example Which is actually going out and you know launching 10,000 jobs to run over a petabyte of data and doing all of that Condensing back for you so the idea of like bringing together sequential and distributed computing or procedural and distributed computing In one thing is something that we're still working on Next was Google big table how you take these primitives You know to store and analyze data and then turn them into something that looks like a database, but an incredible scale And then finally one that's not in from Amazon, which was Had a ton of operational data in it the concepts from Amazon's Dynamo paper Themselves were not new right this stems from work done by Robert Morris the inventor of the internet worm Who I don't know at 15 rubber Morris jr. At 15 or so. I think he invented the first internet worm Briefly went on the lamb turned himself in maybe went to jail and then got tenure at MIT Somewhere in the middle. He started via web with Paul Graham and got very wealthy He runs distributed systems now at MIT, but the idea is some peer-to-peer stuff here about how you build redundant robust Systems distributed systems That looks semi transactional, okay And we're gonna get into that in more detail, but this one is really cool because it showed that you know It's not just like neat ideas this thing is like running Amazon shopping cart You know when you get down to it and that that operational data was amazing So this is what I call kind of the old cannon, which I think is really interesting There's a new cannon that goes with it which I can talk about over beers after this You know where you see like what they're doing next But this you know, what did this inspire? Well, I stumbled upon this first this stuff after seeing this business week article my girlfriend's mom over Christmas when I was back in Michigan like slapped down this Magazine from me she said lots of computers big data centers all over the place sounds like what you do And I was like I read I was like this is exactly what I do I just didn't know what it was called right and it was called cloud computing It was called wisdom of the clouds. He went on to found cloud era. They're doing reasonably well Out of those papers we ended up with you with storm and we ended up with all of no sequel right sand or couch should be MongoDB Royale, etc I think all of those are traced directly out of the problems that we have and the recipes that existed, you know in Literature especially some formative ones so the people at the root of all of these companies, you know know those papers inside and out So if you're new to the the field of no sequel, I encourage you to go read. They're incredibly readable, right? That's the another great thing about the work by Jeff and Sanjay so Then a funny thing happened right we created we created the stuff we created, you know who do we kind of kind of the analytics and batch I want you know the the data analytics data warehousing batch analytics side and then the kind of OLTP side of things right which are which are databases and You know this happened around 2008 it started beginning 2009 no sequel is kind of phrased, you know there that that name Came about the one of the first conferences and now it's turned into an actual actual serious Industry and their conferences like this one that go into in a lot of detail But the interesting thing is while these were coming to market. There was another span another Slug of innovation which kind of goes by the new sequel Name of things and so, you know this really kicked off with another paper by those same gentlemen called Megastore, this is that the Sequel based distributed database that backs a lot of Google App Engine, okay? That was a really interesting paper that made the word Paxos, you know jump out in everybody's mind Which is something I'll talk about and then their commercial versions of this volt DB came out of the stonebreaker side This is Jim Starkey's new DB The foundation DB folks are doing something very similar which are trying to make sequel engine scale and bring, you know Make sure that you have atomic Single-row operations and multi-part transactions These are neat they have their drawbacks Google Spanner blows everything out of the water, okay? So I'm going to talk about that a little bit more This is the one that dropped somewhere in 2012 that have been hinted at for three or four years at conferences Jeff Dean and Sunday again water worked on it and they basically showed that how you could build a sequel database with Multi-part transactions that was truly globally distributed. It's an incredible piece of work that I'll talk about a little bit but before we get there You know when we talk about transactions I think it's worth taking a look at how we ended up in or why we ended up in exactly the place that we are Which is we have people writing applications or doing analytics with data stores that are Sometimes eventually consistent and often don't support multi-part transactions. So like why the heck would you do that in the first place? Well, here's here's a kind picture, right some of the things that you give up our schemas But what you gain is flexibility, right? And this is really the story of the web Schemas are great if you have the luxury to sit down and pre-define everything, right? But oftentimes you don't right oftentimes just like we heard from from the Netflix speaker You've got to move right and you have data sources that are new you have data sources that are changing And you have to be able to adapt to that And if you're coming at it from kind of the enterprise side from a data integration standpoint Schemas are actually the thing that seems to most hinder data integration, right? You think if everybody just agrees up front then you can all play your data together, but nobody ever agrees up front Okay, it's just an absolutely unsolved problem. So the incredible thing about being schema optional if you will is that You can put all of your data into a no-sequel database or into Hadoop, you know And just the file system HDFS and then you can start to read it and say hey Give me a document or give me a row of that file and if it has these properties I'm gonna treat it like a duck, right? And if it has these properties, I'm gonna treat it like a dog and I can apply that type of schema on read Okay, so from an analytic standpoint or a data integration standpoint That's a huge thing that you've gained at the sacrifice of having to declare schemas up front Schemas are actually very handy and I'm gonna talk about that later especially with respect to Google Spanner The next thing really which People don't actually talk about much because we're starting to take it for granted is manual sharding. Okay sharding by itself is not a dirty word It just means that you're going to take your data They're going to distribute it over more than one machine right over more than one file and those files can be on different machines Now there's a painful way to do it, which is that you do it in your application So if I'm writing a yellow pages application, right? And I say okay I'm gonna put all the a through m's on this you know my SQL server and all the n's through z's on this my SQL server That's a logical thing to do But that puts all the effort of maintaining that sharding onto the developer into the application tier And that's tough and that's actually exactly what Google did This is a quote from the Spanner paper where they're talking about the f1 ad network So it's like a relatively small database a few terabytes, but it is like their number one revenue generator Right, it's how they do ad targeting an ad display So they had a sharded my SQL service and then the trick is when you want to add more machines Right when your data grows too much for the way you sharded it now You've got to take the a through m's and turn that into like a through f after something something through Z through M When they did that last resharding of that f1 ad network, and these are this you know 30 incredible Google engineers It took over two years right that is an incredibly risky time to be running your application while you're doing that Right if you're making mistakes during that period you've just taken on a huge amount of technical debt So manual sharding by itself is not an awesome thing, right? But what you get is auto sharding, right? So I don't have a nice picture to go with it But you know if you use cloud ints if you use something like react or any of these Dynamo inspired systems They take care of distributing your data for you on machines and on discs so that you don't have to think about how you're actually going to represent it Right, so that's something that no sequel has given Developers which is absolutely huge and if you're if you're a business owner, you know You're owning some portion of a product That's something that you really want to pay attention to you manual sharding is really easy to get off the ground But it is incredible technical debt going forward Something else that we've given up in moving to no sequel largely is locks Okay, a lot of the systems with the exception of Mongo are lockless, right? So Riac is lockless Cloud ints patchy couch to be absolutely lockless Cassandra is relatively lockless That means that when you have a large number of concurrent users, right? You're not going to have somebody who's doing something Outside of the scope of your application blocking what you want to do Right in the classic things are like if I want to update a document or I want to delete a document that better Not lock the whole database if it does you're really in trouble, right? And if it doesn't you have this kind of smooth sailing road You have to think a little bit differently and there's some sacrifices, right locks make things easy, right? That's why they were introduced it allows you to make sure that everything is fully consistent at every point in time But I think that the general better approach is to block like Optionally, okay to make it something that a developer can do if they say this absolutely has to block everything else because it's a very special operation And you know you can do that on reads but not on writes And so that's a statement about the cap theorem about availability over consistency, right? And so if you follow a dynamo-inspired system You know that means something very specific in the quorum algebras, which I can talk about over beers at a way board But I think this is really nice to to allow people to write data and then to apply their own consistency requirements on reads Okay, it's just like we did with schema schemas are optional and we can apply them on read But we don't necessarily have to apply them, right? That makes it easy to build applications We sacrifice tables right and And we've picked up Data exchange formats that are much more natural for the web, okay? So sequel is great, but it's really poorly matched for Jason You know Postgres allows you to store Jason in it, but it's very tough to interrogate it because Jason is you know It's what applications trade on the wire, right? and a lot of the time that traffic goes over HTTP and Sequel interfaces are just not meant to do that So if you ever find yourself trying to write a sequel query or build a declarative, you know sequel like language on top of a Highly nested rich data structure like Jason It's really gnarly and eventually like man. I would just kill for one line of Python, right? Like and so the general model is okay take your one line of Python your one line of JavaScript push it up to the server Right, it's gonna give you a document or row one at a time and it's a much more natural way That it's essentially like a stored procedure, right? All the most equal databases kind of differ in what that stored procedure is going to do But generally you're gonna want to upload some code that interrogates your rich data structures And then pulls a small portion of the data out to maybe index or stand, you know into an analytics report or something like that Okay And finally the biggest thing here is that when you're running distributed systems, you can't have single points of failure I mean that's a very typical problem with traditional relational database management systems And they're getting better and the new sequel papers really go into how you can do that But the huge thing and this is especially I think obvious if you read the Google file system paper if you go back and look at the original roots of Dynamo These are really like self healing systems, okay? If you're running these things at scale And I mean how many of you running distributed systems yourself in house? Okay, one or two. It's hard. There's an awesome article by Jay Kreps one of the lead architects at LinkedIn And even though Amazons, you know our number one competition He's dead on when he's like here's why you should use Amazon because they write the software and they run the software And these systems are too new the failure modes are too exotic and the scale We're talking about it's too big for the typical, you know, the typical operations team to run them successfully And I think that's absolutely true. It's the number one reason that we don't package our software and give it to other people to run It's just you know, it's moving too quickly The scale is too large and the failure modes and distributed systems are are pretty intense So the software is written to be healing, right? But it's it's challenging still and I want to give any false impressions All right, so kind of next question on this path is before I talk about what I think is going to happen is, you know I think a lot of people assume some things that are false about systems in the real world that are transactional and Or immediately consistent. So let's just take a little step back. Let me let me also pause here and take a drink of water You guys have questions on this past stuff What about the history question? No All right, everybody's tired in here. There's a lot more energy in the room yesterday All right, so let's step back and talk about whether or not we need transactions All right, so if I was to choose one example of an eventually consistent system, what would I choose a? Bank all right. That's my number two example Amazon Yes, that is eventually consistent. They even talk about that in the dynamo paper how you can end up with duplicates in Failure scenarios. Yeah. Yeah, that's true Yeah, no that it actually makes my point one more Nothing. All right, here's my example The truth this happened the first time I gave this talk it actually happened on that exact day Do you guys remember when the AP Twitter account was hacked and said, you know White House bomb explodes at the White House Obama's hurt, right? And it took about 10 minutes for the world to realize that wasn't true Right anybody else notice what happened in that 10 minutes? Market dropped like 10 billion dollars, right? It took a pretty significant hint of like a few percent So like this is the next George Clooney Matt Damon like caper movie, right? You hack somebody's Twitter account You manipulate the truth and in the time that it takes for the truth to anneal right for it to come back To the real value right the recovery time you can actually make a lot of money in that time if you wanted to Right, so think about it like a hack somebody hacks the account that it's an impulse into the system Right, they're ready to trade. They buy at the right time. They say they sell at the right time make an incredible amount of money But the trick is the truth is actually eventually consistent and if you follow politics it probably never converges Okay, right nobody ever actually agrees on what happens bank accounts at least seem to converge, right? And so in my mind what really matters is the convergence time, right? This one actually turned out to be more like 10 minutes There's a lot of money made and lost in those 10 minutes which is you know makes me almost want to quit and run a hedge fund But banks are the next obvious example So, you know when I talk to most developers they always say well I want it to be like my bank right and I would say well your banks not your bank is the ultimate eventually consistent system If I'm trying if I'm you know wiring money around you know between continents It is not immediately consistent and when I'm using an ATM. It is absolutely not immediately consistent and you can see The banks logic in the rules they set up because they know it's not immediately consistent So when you're ready to check right you can give that to somebody else and you you know They will put an ATM the ATM will immediately give you money back right and the check it turns out doesn't even have to be valid I had a roommate in college that we were headed out for a weekend and on a Friday He stopped by the ATM and he's like I don't have the money, but I'm gonna get paid on Monday So he wrote that on a piece of paper put it in below put it in for 500 bucks got 500 bucks back out You know his daily limit and went on his happy way Monday his account was shut down. He didn't go to jail You know, but it takes them a while to figure out that it's not immediately consistent And so what happens right banks have They set up penalties They set up maximum amounts of money that you can pull out and they have a hedge strategy if somebody actually, you know Frauds the system right so you can get out at most five hundred dollars a day right unless you actually hard hack ATMs hard hard ways, so that's what that's five hundred times three hundred days a year right you can their maximum exposure is actually Incredibly small right and so in in building systems, you know That are eventually consistent these are the types of things you have to think about as a developer Which is if somebody if two people order the same book at the same time, and they're globally distributed What's the penalty for being wrong? We send an email to one customer. It's probably okay, right? That's probably okay And there are solutions when you build no sequel Databases there are data modeling techniques, you know Nathan talked about this this morning You know the number one thing that you see people do is move to an immutable data model And this is exactly what I mean imagine you're modeling a bank account or you know an account with Currencies in a game right and somebody gives you ten dollars You know in real money you turn that into bank currency in the game Well one way to do this you have a single document or single row in your H base or Cassandra table However, you want to think about it that stores the balance right and you're just updating that from all over the place Well, that's one way to do it, but then what happens if you know one of those transactions dies on the wire Right or how do you know that that is immediately reflected throughout the whole system if it's globally distributed? Well, there's another way to do which is actually the way bank accounting works, which is Immutable documents that just represent a single transaction at a point in time not the changing state of account balances Okay, so here's an example where you have a document in JSON Natural construction is you you give things a type right so this one this this documents of this type will be treated as transactions They have a source account They have a target account and the amount and say the the currency of of this So this is like a transaction log if you will right and then you have the time of this So you have kind of a right only state machine Okay, and then the state is calculated by using secondary indexes right that gives you the account balance So you can write one of these transaction documents and then do a query right at the same time to say Okay, what is the account balance if something goes wrong? It's eventually gonna percolate through the system and the time scale of how eventual is usually on the order of you know Tens of milliseconds to seconds, okay in most reasonably connected systems What really happens is network partitions and this is very robust to network partitions because if you Reconnect your system eventually this thing is going to percolate through everywhere. Okay, but again you have to think about the cost of being wrong Okay, so let's take a look at Google spanner Because that's another piece of the puzzle that we have to get through here I think I'm doing okay on time. We started at 415, right? Okay, cool. Okay So I just want to draw your attention to a few things from the Google spanner paper here So it's a scalable multi version globally distributed and synchronously replicated database How many people actually are exposed to it read the paper or taking a look at any of the blog write-ups? About half. This is a really cool thing. Okay, this is what will be talked about at this conference in product form in a few years That's my my personal take and Google is working on turning this into a product, right? So it's the first system to distribute glow, you know to globally distribute data And give you externally consistent distributed transactions, right? So that means data centers all over the globe multi-part transactions. It's really the You know what I want to say the analog from very forward-thinking sequel based Engineers to what's happening. No sequel. It's a really great contrast and I think the truth lies somewhere in the middle The really cool thing about so I'm actually going to talk about it because it shows you how hard this is I set through the last talk from foundation to be in a I think they glossed over a few of the really hard points And I hope this you know looking at what Google had to do helps you understand just how challenging this problem is But the real heart of this was a novel time API that exposes clock uncertainty, and I'm going to talk about in the detail But the upshot is here. It gives you non-blocking reads in the past Lock-free read-only transactions and a topping schema changes. Okay, these are really big things And they're the types of things that dba's and developers love because it makes their reasoning very simple and straightforward Real quick a little bit about how it works the idea behind Span or any distributed transactional system is you've got to have the ability to put things in a line Right if I get a whole bunch of requests I somehow have to agree upon the order in which they come in and then be able to like Repeat that order in other places. That's the really hard part So the natural thing is you just look at the system time right but distributed time turns out to be very hard In fact to solve this distributive time problem the Google engineers had to use GPS and then Somewhat orthogonal system which are atomic clocks okay in their data centers This is like very non-trivial atomic clocks aren't super expensive, but they're still not cheap Okay, and you can get them in a rack and if anybody wants to think about you know This is millisecond scale synchronization. I think I worked out nanosecond scale synchronization All I need is a neutrino beam. So see me see me afterwards if you're interested in the the patent But really the idea is is they said okay if distributed time is hard You can never keep everything in line then let's do our best to keep things synchronized But let's measure how out of sync the system is okay, and let's expose that as like a 95 percent confidence level in the time And just reason a little bit probabilistically okay, and if and if our our synchronization between everything gets worse Then we'll just slow the system down right because you can deal with that That was very very novel but very challenging and so they they have a mix of things also that have to make the system unspoofable because if you can spoof GPS from the outside having atomic clocks inside is is much better But these things have low frequency drift so you have to kind of and their signals with this So it's really cool. This is what it looks like This shows you the uncertainty in time, you know versus day of the day of the month or something that the spanner system Exposes and this shows you kind of at the everybody agrees within you know 90 percent of the system agrees That's well below a millisecond by the time you get to 99.9 You know three nines of agreement you're sitting around a few milliseconds, okay? So it means that over hundreds of thousands of machines, you know even with the world's best technology you can actually only Synchronize time on the order of milliseconds with this technology and that's pretty interesting You could probably go farther, but this was good enough for them actually And then there are things where they lose one of the masters and it goes up and in these periods where the uncertainty Right in this time API goes up and the system has to naturally slow down So this sets basically a rate if you will for how many transactions per second the system can do at a fine Level of granularity says as the skew rises the transaction rate slows that was one of the interesting things I did, but this is a very kind of novel and pragmatic approach That allows them to do multi-part transactions have the system be Externally consistent, which is a very big deal when you're writing these applications. So it's very cool So those are the two building blocks I think to talk about how we got here and some of the reasons that we did it Whether or not transactions just to kind of wet your appetite about whether or not you really need them in the first place I mean, let's talk about what reality is really going to hold So here's my take on how we should view consistency and multi-part transactions I think consistency should be like strong consistency say single document strong consistency on disk Should be like the four-wheel drive button on your car Okay, there the multitude of operations you want to do as a developer don't require it They just don't you may think they do because you haven't set down to think about the problem enough But they generally come with things like some locking or some extra slowdown In a dynamo based system I'm not going to go into details. There's an incredible link here by some folks at Berkeley Hellerstein and others who talked about well What's what is eventually consistent really mean like in a in a system like you know Riac or or you know cloud ants or dynam or Cassandra, you know Is it minutes? Is it hours? Is it seconds? And the reality is it's usually milliseconds. Okay? And so you have to think about all right if I'm more than that timescale If I'm changing things if I care about consistency on the order of a second Then that's fine for me right because I'm a thousand times slower than a millisecond If I care about consistency in timescales less than a millisecond It's a big deal for me right and so there's a way for you to evaluate You know they have a little you know, they have an analytic formulation They have a Monte Carlo that they got from real-world data They have some little JavaScript tools for you to kind of reason about this but the point is okay Say I have a counter right just as a concrete example I have a counter that I want to update from a big distributed concurrent system Well if I'm updating it once an hour, that's probably going to be fine Right, I could use a dynamo like system to do that. That's eventually consistent But if I'm updating it, you know once a microsecond Then I'm going to be in a whole world of hurt right because it's never going to catch up So the question you have to ask yourself is like okay What's the timescale of my operations right in order for me to be able to understand that? And and this is kind of the rule of fusing a dynamo inspired system Which is in the backbone of the majority of of no sequel device technologies You know and you can do things like make sure you make things immediately consistent on disk already Within dynamo inspired systems so you can get 90% of the way to strong consistency on disk Just using the parameters available within a dynamo like API That's great But there are some asterisks on this and the asterisks come from concurrent paths into the system Okay, the fact that most no sequel systems that are based on dynamo Don't do the Google spanner thing where you make everything get in one funnel and get in line Right you just accept things in a lockless manner right spanner locks, right? It does fine granularity locking right row level locking And you know megastore itself has locking and it's actually very slow. It's incredibly slow But things are immediately consistent, but developers liked it. They could reason about it now It was one of the reasons that it was like so popular right So there's some asterisks and dynamo systems where it already goes a long way But I think what you'll see is that you know I think we should treat strong consistency per document as as four-wheel drive and I think that you know The market knows how to do this You know you can do it with generalized Paxos We're working on it. I know the react guys are working on it. I'm not sure what Cassandra is doing But it's in the product pipeline simply put it's not the most requested feature. That's where it is It's like sequel declarative languages people ask for it, but it's not the most it's not the most requested feature yet Right, so it's something that's going to be coming down the path Let's take a look at what the spanner folks say You know they say one of the reasons that they wrote the paper or they did this work was that that was better to have the application Programmers deal with performance bottlenecks. Okay on writes or updates due to overuse of transactions Instead of having to always code around the lack of transactions And they say okay, we know how to do transactions to face commit over multi-pack so smitigates the availability problem That just says there's engineering and algorithms that you can throw at that We know how to solve that problem, but it has there's some trade-offs But the main point here is that you know coming from the kind of new sequel side It's like well, maybe we went a little too far like let's just go back to transactions being the defaults Then developers find it easy to reason and you know You can go back to writing applications and we'll just deal with the slowdowns as they happen And that's actually not bad for the specific problem that spanner was built to solve right which was their ad network But traditional three-tier application where you've got clients, you know running in browser someplace You've got complete control over the app tier and you've got the database right underneath that traditional three-tier app stack But that's not not where we we are and I'll get to that in my mind transactions are like you know if if Strong consistency of documents is four-wheel drive transactions are like this optional You know this button The lock the differential button. I Recently learned that my car doesn't have this I have a Honda CRV and when I high centered it over a log 25 miles from the nearest Cell phone tower at a trailhead with just my wife on a very cold and snowy day She pointed out that it would have been a really good idea to have this button, right? She's a great Great woman though and helped me like stack up rocks under the tires so we could get out of there It was either that or kind of a snowshoe a marathon distance out So, you know, this is something you only use in case of emergency, right? That's the way I look and I think that's the way as developers We should look at multi-part transactions. These are possible over multi-pack So it's using two-phase commit and again It's one of those things that you're going to see coming out of no sequo right No sequel devices, but again, it's even farther down the line and developers even people that are running huge amounts of real money through systems, you know millions of dollars a day through either e-commerce sites or Amazon right or through things like mobile gaming like very aggressive mobile games Have learned to reason in terms of that immutable data model and thinking about what's the penalty to being wrong And how long can I be wrong for right because there are there are trade-offs in the other direction And the big reason here is this right? You know when you talk about traditional databases when you talk about Google Spanner You're talking about systems that are by default highly connected and only occasionally become network partition Okay, where they fall out of sync with each other, but I would argue that there's a whole Another exploding ecosystem around mobile devices, right that we all use and we all expect these things to work Even if we don't have Wi-Fi, right, you know who uses expensive I I mean, it's the best app ever, right? Changes your life and you do not have to be connected to take a picture of a receipt right and have it sync later on How the hell did it do that right it did it it would not work You cannot use expensive final plane if it did immediate strong consistent multi-part transactions It just wouldn't work. It was a sorry can't do it can't be satisfied And the reality is that mobile is in a chronic state of network partition This is it right and so the systems that we need to solve this problem are very different, right? Then the it just kind of a scale of a traditional sequel databases And so I know this isn't a mobile-based conference, but really you just can't ignore what's happening over here in the side of the market Right, it's changing everything So, you know, this is not crazy talk, right? This is exactly what what we sell a cloud in our competitors are trying to do similar things You want to be able to write to your local data, you know your local data store Which is like local store maybe into the browser and then synchronized back and forth to the cloud at a later point in time And it's absolutely real, right? We have customers that do it Parse is doing incredible things with it This idea is here now. It's like HTML 5 we used to argue about it in 2008 like it would never come and now We just all use it right and so the same type of thing being able to synchronize data across different device types That may or may not be connected to the internet. It's really important, right? And if you look at the the explosion of internet connected devices and where we are like distributed data How many of you have a Fitbit, right? We're gonna have all these different things on us that are getting data all the time We're going to need systems that can deal with partitioned networks So getting just to nail at home. I don't think transactions are going to kill no sequel Right, and I think the mobile has already killed transactions. That's really that's my takeaway here And what you need in reality is systems that can be adaptive, right? The default should not be multi-part transactions You know no sequel will evolve I think that you should expect evolution late 2013 and see people bringing strong consistency to market in 2014 Maybe multi-part transactions by the end of the year, but again, they're just not the most demanded feature These things are going to be options, right? And there's going to be have to be a little bit of a difference Thinking as you move into kind of the bell curve of enterprise developers that maybe I don't always need this like what is this? strong consistency of this this blocking transaction really buying in this case And that's a little bit of education that has to happen and so with that I will take a couple questions and we can all go get a drink Thanks Want to do The same thing Yeah, so I guess it's more of a comment on things I was just asked to repeat it But I think you're right you need to understand the situation you're in and whether or not, you know The problem you're solving requires, you know immediate consistency or multi-part transactions my response to that would be you know again You talked about you know, you were able to sit down and in those situations say What is the time scale to strong consistency, right? Can I be wrong for a second a lot of times? I think the answer is yes, right and you're not necessarily wrong You're just seeing time move a little bit slower, right things propagate through the system It's not like you're going backwards one thing I have learned is gotta be monotonic data can never go backwards in time Right, you just completely screw up developers if you do that So that that doesn't make sense and that's actually very challenging in a distributed system right monotonicity But it doesn't you don't need serialization to be monotonic and that's that's a great thing So yeah, absolutely It's a use case thing and I think that if nothing else like no sequel as a movement Has kind of just like raised the question again like okay, maybe I should sit down and think about that You know before we get rolling But I think that there are a lot of other things you get right by by being lockless by being able to deal You know, it's very challenging to build a system That's fully versioned and has you know hash histories associated with every document and all of that overhead But it allows you to to reconcile To causally disconnected things later on in time. That's that's very powerful. I mean that is what you know, the no sequel vendors That's what bash I was working on before they're working on transactions, right? It's same thing You know we put all of our effort into that because we think that sync is just you know more important At this point and more of a differential or more of enabling technology to people building new applications than transactions One more question I arbitrators on the local device, it's at the place where the transaction is going to occur, which is somewhere in the cloud that mediates between competing, you know, if you have two people bidding on an auction on eBay, it decides which bid comes in before the auction and ends. True. It arbitrates. Yeah. And first of all, I hope you understand that this is meant to be a little bit inflammatory, right? In the use of kill. Yeah, I don't think mobile has killed relational databases. And, yeah, I was using transaction in a very specific, you know, database terminology, which is multi-part transaction, you know, which usually means two-face commit, right? And the point is that these transactions are definitely getting more distributed in time, right? They have a little bit of state change on a device, then they have another state change in the cloud, right? And we need to build systems that make that very easy for developers to use, right, instead of having to build that synchronization themselves and build the storage on local state themselves. I mean, we could build our own computers. We could build our own sync, right, if we wanted to. And some folks like Parse have kind of done that. They've built their own sync. Parse convey these back-end providers. But I think that the database, you know, no-SQL database technology isn't making it far easier by changing the fundamental way we're representing that data on disk to enable us to synchronize that later on and then do it in a way that will enable, I guess, some more transactional integrity in the cloud. So I agree with you. It's not as clean as yes-no, obviously. Certainly, yeah. Absolutely, absolutely. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. I agree with that. And you said something else that I think is really big, which I didn't get a chance to talk about. But very technically, the way we store data on disk to enable that sync, right, is changing, right. So you're going to see all the no-SQL vendors moving, you know, away from just Merkle trees and vector clocks to hash histories, which give you some history of this document's different versions. And, you know, the phone and the cloud can synchronize and say, I have a portion of this part of the hash history. You have this portion of the hash history. These three leaves from here need to go over there, and these two leaves from here need to go over there, and now we're synced. And there's a very quantitative measurement of what sync means, right. And if conflicts arise in that process, like two people ordered, you know, or two people simultaneously in a disconnected way, operated on the same document, then we can't throw that data away. We need to expose it in a consumable way for the developer to resolve that conflict. In the case of, you know, the Amazon's original example of that, two people, you know, or you double-order something, two things show up in your shopping cart. They expose that back to the consumer and let them kick out the duplicate, right. But I think we need to do much better with being able to push your conflict resolution strategies into the system and have it auto-resolve those for you. That's true. All right. I'm over time. So I'm going to stick around. Thank you, guys.