 So, my talk is SQL to NoSQL to NewSQL and the rise of polyglot persistence. Hold on. Let's turn this guy on. All right. I got the slot right before lunch. I'm sure you're all ready for food and you don't want to listen to me speak. Wait, if you hate me, there you go. Before I get into the talk, I just want to give you a bit of information about me, kind of like my perspective so you can see where I'm coming from. I'm the CTO and co-founder of Influx Data. We make an open source time series database called InfluxDB. It's written in Go. It has a query language that's kind of like SQL. And I've been thinking a lot about the query language and how I can improve it lately. And that's kind of the inspiration for this talk. I'm an author in 2010. I wrote this book, Service-Oriented Design with Ruby and Rails. So even though I haven't been a Ruby programmer for quite a few years now, I want you to know that I'm with you. I'm one of you. So you can embrace me as a brother. As Luke mentioned, Goruko, I spoke in here quite a few times. This is actually me 10 years ago at the very first Goruko where I was extraordinarily nervous presenting something that I barely knew anything about. So, to the talk. Now, the core thesis of my talk is that SQL and actually the relational data model, its dominance in the database world, like its complete dominance, is over, right? And it's always going to be dominant in some way, at least for a while. But for decades, the assumption was if you were going to use a database, it would be a relational database. And my thesis is that that time is gone. And we've actually been in a new time for a while now. When you think about SQL, it's a domain-specific language, right? It's really an API. It's an API for working with data, right? And it's one API, but there are many ways to represent APIs. And in the past 10 years, we've seen the rise of other kinds of ways of working with data. My thesis is that this multi-paradigm approach of NoSQL in all these different ways is actually here to stay, right? We have many programming languages for many different kinds of tasks, so we should have many query languages, not just SQL, right? And the thinking in the database community goes something like this, right? You have SQL, and it was this great thing. And we're going to just use it for everything. And then around 2006, 2007, people were like, oh, wait, SQL won't scale, so we need NoSQL. We'll constrain the problem set, make it not as powerful, but it'll scale. And then they were like, oh, well, wait a second. We don't want to hate on SQL, so not only SQL. So NoSQL became that, not only SQL. And then NoSQL came about. And the theory was, well, we don't have to make compromises on our query languages and our access patterns. We can actually just layer SQL on top of distributed databases. So the thinking in the NoSQL crowd, and this is something that I disagree with and I'm saying that people in the NoSQL crowd think this way today, which is no SQL approaches are inferior. They're trade-offs you make to try and get distributed systems. They were just like an interstitial stage until we could actually get the technology to catch up so that we could scale SQL. But I think if you're spending your time just obsessing about SQL, it's going to end badly for you. Like, there are other approaches. And that's really what I'm talking about when I'm saying polyglot persistence, right? There are other ways to work with your data other than SQL. SQL is not the end state. It's not like the only acceptable way to query your data and work with it. And the other thing is that in the beginning of the NoSQL movement, the whole thing was tied up with scale, right? People were obsessed about distributed systems and how many requests per second you can handle and all this other stuff. And my contention is that NoSQL, like, scales the least interesting part of NoSQL. Scales, like, most people don't even have scale problems. NoSQL is really about programmer productivity, right? And that's, for the Ruby community, I think that's something that everybody can connect with on a visceral level, right? When you picked up Rails, for those of you who did, you probably didn't pick it up because the fastest thing in the world, you picked it up because you could build an application faster than you could in any other environment. And that's why Rails became popular. And that's why I think NoSQL is going to continue to be more and more popular over time. So query languages are APIs for working with data, and we need things that are effective for different use cases and different access patterns. So this talk is about a little bit about database history, a little bit about query languages and APIs. So I'll start with the history. I'll make more hand-wavy arguments, and I'll have some examples of things that I think are actually more effective. So let's talk about SQL. This is where I begin our database journey, even though there were databases beforehand, hierarchical databases and some other weirdo things. So in the SQL journey, I think of it in this time frame, right? 1970 to 1986. Now, 1986 was still a long time ago, but bear with me, I'll talk about this being, like, really the key area of SQL's development. So in 1970, a computer scientist working at IBM, Edgar F. Cod, wrote this paper, a relational model of data for large shared data banks. And this thing was the precursor to all relational databases. It was, he laid out the relational model, he laid out a query language called alpha, and he built on this later to create a relational algebra and all the other stuff that SQL databases are based on. So then in the 70s, IBM started writing a prototype of this thing called System R. Interestingly, Edgar Cod wasn't actually on the project. It was a bit of like a political thing. So System R was just a prototype. I think they sold it maybe to a few companies, but it wasn't really a commercially available product. So at this point, relational databases are still just like an academic concept. But then in 1979, a company named Relational Software released the first functional relational database with a language called SQL. There was a language previously called SQL, but they couldn't use that name because of, like, trademark infringement or some sort of stuff. So the database was called Oracle V2, which some of you may recognize, because the company Relational Software later renamed themselves to Oracle. Now, this is Larry Ellison, the founder and CEO of Oracle. He's on the Forbes richest list. I did an image search for him, and I got an image of him brandishing a pistol because if you are the lord of databases, apparently that's what you do. Yeah. So side, let's side note here. Oracle's headquarters is in the Bay Area right there, very close to Oracle is an airport called San Carlos Airport. It is a four-minute drive from Oracle's HQ. That airport has the airport code SQL, which actually, before I reached this talk, I had thought it was named that because of its proximity to Oracle, but the information I found said that it got its name, its airport code in 1977. Oracle didn't release the first database, SQL database until 79, so San Carlos Airport wins that one. All right. Back to our timeline. So also in 1979, IBM released System 38, which was the first commercially available system from IBM that had a relational database. Interestingly, though, it wasn't like a software package that they shipped to people. It was a piece of hardware with a bunch of software on it. It looked like this, this absurd thing. This was IBM's relational database in 1979. Yeah. Then in 81, they actually had a software package called IBM SQL DS, which was a software database. And then in 82, they came out with IBM DB2, which is a name that still lives on to today. Then finally, in 1986, we got the first ANSI standard of SQL. SQL 86, right? The SQL standards are named after the year in which they were, like, blessed. So that's SQL's development over time until it became a standard. But the other thing to note is that SQL dominance took time. It just didn't happen overnight. This was a thing that took decades. SQL wasn't just, like, handed down from up on high by Lord Ellison to the people, to the plebeians to use for their applications. So one of the competing languages at the time was something called QEL. And this was developed at Berkeley in the 70s. So QEL looks like this. Here's a query to get some stuff. It's doing a salary calculation against this Jones character. In SQL, that looks like this. So anyway, it was developed at Berkeley. Guy at Berkeley ended up going to start the company Ingress, which was doing a commercial implementation of it. Then I think he went back to Berkeley in 85 and started the project Postgres. Now Postgres used QEL as its query language called Postgresql, whatever, that thing. Yeah. So it was that way up until 1994. And then in 1994, they replaced that query language with SQL because it had become so dominant because of Oracle playing in the game, right? Everybody was using Oracle. SQL just became the standard because Oracle owned databases. So 94, they changed to SQL. 96, they changed it to the name that we know today. Post... I can never say it. So the other thing about SQL is that it isn't fixed. You think of it as like a static thing, but it's not. It's constantly evolving over time. 86, the first standard. 89, revision, integrity constraints. 92, another major revision. 99, regexes, triggers. 2003. The hell? What? XML. Oh, God. Honestly, XML is the worst part of the 90s. I don't know why it survived. So yeah, let's get back to this. 2006, more XML. This is like chasing good money after bad. 2008, some more improvements. 2011, we see temporal data, something near and dear to my heart. In 2016, we see JSON. So the interesting thing about this evolution is that Cod's paper in 1970 didn't allow for any of this crazy stuff going on in the database. It was very strict about what was allowed. Like it was tables with relations and all this other stuff. Like documents weren't a part of it. So current SQL databases are a pretty big departure from that original concept. They built on it over time and they've extended it. And because of the fact that it's an evolving thing, SQL isn't standard, right? You have MySQL, it has a flavor of SQL, and you have Postgres, and if you're doing really basic things, the queries all look the same, but as anybody who's worked with both for a period of time knows, there are tons of queries that work in one and not the other, and the usage is completely different. Same thing, Microsoft SQL server, Oracle, DB2, which is at IBM, Informix, which IBM acquired. IBM has multiple standards of SQL within the company, and that's one of the things that I think is, you know, the reason why ActiveRecord exists. Obviously, it's supposed to help you work faster with your database, but one of the other things it did was it created a standard Ruby interface, regardless of which database you were working with, right? SQLite, Postgres, MySQL, Oracle, whatever. All right, let's talk web scale. Let's get serious about making things big. That's NoSQL. So the original impetus for a lot of the NoSQL stuff was really about scale. So in 2006, Google published a research paper called Bigtable, a distributed storage system for structured data. The deal with Bigtable was it was a very simple model, and there was almost no query language to speak of. There were some basic operations, but there was nothing like SQL on top of it, but it could scale to Google size, which was very interesting. In 2007, Amazon released the Dynamo paper about a highly available key value store that would scale infinitely. Again, key value, there's no query language. There's three operations. Get, put, delete. But people were interested in it because of the fact that it scaled. In 2008, Facebook released Cassandra as open source, which is kind of their take on Bigtable. They followed up with a paper in 2010. 2008 also, Bashow was founded. They created RIOC, which is also just another simple distributed key value store. They built some other stuff on top of it over time, but a very basic model. 2008 also, HBase was created at Power Set. That's the Hadoop implementation of kind of a Bigtable concept. And then finally, in 2009, we saw the first event that was actually labeled a NoSQL event. The term had been used in the 90s, but not related to this. This is where the name actually got known as a thing. We see three years after these people had been doing all this work, they finally gave it a name, NoSQL. And it was really just about this idea of scale. Because when you think about these implementations, they're only interesting because they're on multiple servers. If you gave somebody these APIs on a single server, they'd be like, why would I use this? This is a piece of crap. I'm gonna use Postgres or MySQL because it can do a hell of a lot more. So NoSQL also became about other things too over time. This is the not only SQL movement. This is, hey, man, SQL's good. We can be friends, but we can use this other thing too. And this, I think, the not only SQL piece is where the idea of polyglot persistence, I think, really started to take shape. And for me, there are two companies or two projects that really exemplify this. The first is 2007. TenGen, founded here in New York City. They started MongoDB. And I think that was an important one. We'll come back to them in a second. And the other one that I think is very important is 2009, Redis was started. Another NoSQL database. So really, for me, MongoDB and Redis are what NoSQL is all about. It's all about different access patterns and different ways to work with data that are outside of SQL that are actually interesting even within the context of a single server. There are reasons you would use Mongo or Redis over a SQL database even if scale wasn't a problem you were looking to solve because they make some tasks easier. All right, before we get into more of that, let's cover NewSQL's history. Right, so the theory on NewSQL was, you know, the NoSQL data stores were not as feature-rich as a SQL database. So let's figure out how to build a distributed SQL execution engine that can be layered on top of a distributed database. So in 2008, new ODB. 2009, VoltDB, which was spun out of Vertica. Vertica is a calling restore that was created by Michael Stonebreaker, who's like a famous database dude. 2010, Google published the paper Dremel, Interactive Analysis of Web Scale Data Sets. I can't believe they actually put Web Scale in the title of that thing, but... It's a good paper, but it's basically about having a distributed SQL execution engine on top of big table-like structures. 2010, Citus Data was founded. Citus builds a distributed extension onto Postgres. So if you want Postgres as a distributed database, you can use Citus Data. 2011, again, we see this pattern. Somebody finally gives it a name. NewSQL, the term is coined by an analyst. I don't know what analysts do other than coined phrases like that, so good for him. 2012, Cloudera released Impala, which is the Hadoop flavor of a new SQL database that's super-fast and super-scalable. 2012, Google's Spanner Paper. By the way, as I'm going through all of this very, very quickly, there are definitely other things in NoSQL and NewSQL, but these are just the highlights for me. These are the things I think of. So Google's Spanner Paper, which is about their globally distributed database that has a SQL engine on top of it, it becomes not just about scale, but it becomes about global scale and high availability on a global stage, right? You can have entire regions fail and things will just work. And then 2014, because I love New York City, Cockroach Labs, which builds Cockroach DB, this company was founded here. The project actually predates the formation of the company, but their goal is basically to build an open source like Spanner Database. So the idea behind NewSQL is that you get scale without making the trade-offs, right? You get the full functionality of SQL. You get the familiarity of SQL. And the familiarity thing, this is a trap I fell into early on when we started out doing InfluxDB. I thought, okay, we need a query language for this thing. Why don't we make something that looks like this? It looks like SQL. It's friendly. It's a comfort blanket that programmers can wrap themselves in, and it feels good. But it's not actually SQL. And the thing is, the thing I've realized over the last four years of doing this thing is that familiarity is not always the best option. Not everything is a set to be queried. Not everything fits into relational algebra. There are other paradigms that might make more sense for different use cases. So this quote, this is commonly attributed to Henry Ford. If I had asked people what they wanted, they would have said faster horses. I actually looked this up, and I found some article in the Harvard Business Review that said they can find no actual indication that Henry Ford ever said this, but people continue to use the quote. But I think it's good for this case. I'll do one more. So the movie Steve Jobs, the sore-conscripted one, not that absolute turd that Ashton Kutcher did. So Michael Fassbender is talking to John Scully, who's the CEO that you brought in for Apple, and he said whoever said the customer is always right was, I promise you, a customer. And the idea that I think both of these quotes kind of capture is that, like, yes, you should listen to your users. You should listen to customers and all that other stuff, but a lot of times what they will do is they'll only look in the immediate vicinity of what they're doing, right? And that's what happens if you only think of terms of SQL. You look in that immediate vicinity, but you can't break out of it. Innovation can happen either incrementally or with a significant shift, but not both ways at the same time in the same project. You have to pick your battle. Incremental innovation looks like this, right? Additions to SQL over time, but ultimately the base thing, the core thing, is still SQL, right? And the question is, is SQL the best for all data tasks? That's what the new SQL crowd might have you believe, because it's familiar, because programmers know it. People don't want to learn new things, so just, like, give the people SQL. That's what they want. And I don't think so. I don't think SQL is the best for all data tasks. I think for some things, we need breaking innovation to give programmers better tools to work faster, to solve their problems faster. All right, finally, an example. A sorted set. So sorted set, you have a bunch of members, they have scores, you want to rank them. You know, this is good for if you have a game and you have a bunch of players and you want to show the top 10. If you want to see the top 10 and you want to see this application servers in your infrastructure, you want to rank message board, whatever. So in Redis, this is great. This is super easy, right? There's a Z add thing. We're calling the sorted set P set. Five is the score. Foo is the member. We add that thing. Let's add bar. Let's add ASDF. And let's get the rank of bar. So we get the rank of bar and we see it's zero. And we see the rank of ASDF. We see it's at the bottom of the set. So now let's change the score of ASDF. Let's reduce it by two. And then we get the rank again and we see this moved. Great. Super simple. I can fit it all on one slide. It's glorious. Let's do this in Postgres. And the reason I'm doing it in Postgres and not just SQL is because this code looks different if I do it in MySQL. So here we're going to create a table. It looks the same for both. But other stuff. This looks different. So the Z add function. So in this case, we want to say insert if it's there or update if it isn't. Sorry. Insert if it's not there. Update if it's already there. This is the code to get the rank. So we're getting the rank from the P set of the ASDF member. So you don't really need to read all of this to understand it. Just see that it's bigger than the reddest stuff. There's more stuff there. This is the Z anchor by function. Right? And we're doing this for the P set ASDF. Again, like, that took me a bunch of slides to do all that stuff. And that's only the tip of the iceberg, right? Redis has a bunch of other functions for working with sorted sets. Redis, I view Redis as like a data structure database. It provides a bunch of primitives, like data structure primitives that you can work with to build your things. And the other thing I didn't cover is, like, I'm not even talking about performance. If you tried to do this sorted set thing in Postgres, it would probably end in pain for you because it would be super slow. Redis is super fast, but this is really just about productivity. Getting stuff done quickly. So, I wanted to fit a Mongo one in here, but I can't just realize Mongo has a document model and you can do some things faster. Because I'm a vested interest. So, the data model looks like this. You have a measurement. You have tags. You have fields. The tags are strings. The fields, you can have values which are float, integer, boolean or string. And then you have a time stamp. As a table, it would look kind of like this. So, each row is a series. And you can see the tag data there. The measurement name, tag, and the field name. And then the columns are actually time stamps. So, you would have a very, very wide table if you were working with time series data. Now, in inflexql as it stands today, grabbing an average would look something like this. This is for the last 24 hours give me 10-minute averages of my CPU usage for user on server A. What I'm thinking about doing in inflexql 2.0 is like this. Now, it doesn't look more efficient at all, right? It's just a different way of looking at it. And the first one probably actually makes more sense to you because it's more familiar. Which is the thing that scared me away from trying to think about other ways of doing it. Because when you think about the basic use cases, SQL actually works really, really well and people know it. But my contention is that for the time series use case, functional is actually a much better way to represent what you want to do than SQL because we're not working with sets. We're working with streams of data, streams of time series data. So here's an example. What if we had nulls in this table and we wanted to fill them based on different behavior, either with the previous value or with like the mean of the range of data that we're looking at that could get complicated in SQL. What if we wanted to do something like this? We'll just grab the data like we had before and we just chain a new function off of it to transform the data frame, the matrix of data in one fell swoop. And here we're just going to say we're going to fill any nulls with whatever the mean is for that series of the frame of data that we're looking at. What if we wanted to do something like interpolate the data? So we have a bunch of time series and they have values, so we have a bunch of time. So we want to make sure that before we do some sort of operation on them, we align them so they work on the same time, right? I don't even know how I would represent this in SQL, so I wouldn't even try. But in a functional style, I could just have a function called interpolate. I could optionally have arguments that it takes that would indicate how it should behave. So I think for time series, like going functional, like each function returns a block of series data and they transform each other or do calculations, it makes a lot more sense. I'm actually really happy that the GraphQL talk was before me because I don't even have to explain it now, so you know what GraphQL is. I contend that GraphQL is an example of polyglot persistence, of emergent polyglot persistence. If you've ever written an API to find it, you've written a database because tons of databases are actually like this collection of services and there's really just an API. And GraphQL is about combining the results of a bunch of APIs into one thing and it's a query language that pulls together a bunch of databases. So innovation in the database community I think happens in waves. So for this, I've got 1970, I've got 2020 because that's kind of like my event horizon right now for reasons. So basically from 70 to 2004, you had the relational revolution, right? This took a little over three decades and then we had NoSQL and really the goal initially with NoSQL was about scale. How do we achieve scale? And now NoSQL because hey, let's not make the trade-offs but I don't think NoSQL is going to dominate the entire scene. I think we've actually been in the age of polyglot persistence since about 2008 and it's going to continue. These NoSQL projects are actually it's not like only a few are going to exist, like only Redis and Mongo are going to exist as successful NoSQL projects. I think we're going to continue to see more of them and more of them that aren't even necessarily focused on scale is their key to their success. I think we're going to see a lot of things that are going to be happening but on programmer productivity. So spotting a wave in these things is kind of like spotting a recession, right? You can always see it like four years after it started happening. But I mean the point of all this is, you know, is Python the best programming language for any task you could possibly perform? I picked Python because we're at a high level, right? I think we need to think outside the box of SQL for a lot of our tasks, right? Even though incremental innovation is a great thing and it's something that we want to do in most of our projects most of the time, it's not the only thing. We need to break out of that. And I think polyglot persistence is a breakthrough innovation in programmer productivity and I think that's more important than scale for other things. And the reason why is because I think more and more in the future our problems are actually going to be data challenges, right? How do we work with data? How do we mine insight from data because we're going to collect more and more of it, you know, in order of magnitude more than we are now, probably within only a couple of years. So embrace being a polyglot database person. Thank you.