 Hi, I'm inspired by Doug by losing my voice here for a moment, but we'll get going here. I'm chief application architect for MapR. I think I'm now trapped by having to be near the microphone and that means that I get to talk to a lot of people which is a great way to learn all kinds of things. I get to talk to people in many industries. We have well over a thousand customers all around the world. Some here in Singapore, some in Japan, many in the US, and it's really cool to be able to do that. I'm also a committer and PMC on, as I say it now, a bunch of projects. Some active, some less active, some more historical. Currently I serve as VP of incubator at Apache, which means that I get to participate actively in far fewer projects because incubator is something that just keeps crawling along. People think that software comes from Apache. In fact, that's wrong. That's exactly backwards. Software comes to Apache. So for instance, the University of Singapore, the SINGAP project, it's a deep learning package, came to Apache. I was lucky enough to help mentor that. Or Kee Lin came out of Shanghai or Zeppelin came out of Korea originally. It comes to Apache, not from Apache. And it comes to Apache from people like the ones in this room. I imagine that if we have any, but let's do that. Let's be John Byrd's for a moment. Who here has contributed to open source? In any way, tried it, given feedback, given patches, written a little bit of documentation, explained it, written a tutorial. There's lots of ways to do it. Say maybe third or quarter of the place. Who here has used open source? Raise your hand because I know you have. You too. Raise your hand. You. I know you've done it. You have an Android phone? Hit him. Hit him. Make him raise his hand. Anyway, he's a recalcitrant, but we know he has. And as people say, I've got many hats. I do different things, probably too many. I'd like to point out that we have some books available online that you can give. We gave away all the books today, sorry. But Ellen and I were signing them for about four hours. We thought it was going to be during lunch, but that didn't work. And so we finished about 430. We have a lot of those books. We have six of them that we've written, Ellen and I, and there's a seventh one that she's written with Kostas Tumas about Apache Flink. We have books written by other people on the map bar site as well. We give away the PDFs you can buy, the physical copies, or you can buy the Kindle copies from O'Reilly. Now my background is also varied. It's universities and startups is the short answer. I've had five or six startups in the California area. I've worked with big data since well before it was big. In fact, when it was quite small. Big has gotten smaller or small has gotten bigger. I'm not sure which. And I've been involved in open source well before the internet. It gives me a perspective on how these things work. It makes me no less impressed though. So let's talk a little bit. I want to talk today about streaming. Doug talked a little bit about the replatforming that's happening, and this is epic. It has never happened before at this pace. There have been replatformings before. If you think about it, there have been revolutions in computing and abstraction before. Think about the invention of accounting in cuneiform writing. That was the first virtualization of commodities. So little marks on a clay tablet stood for things in a warehouse. That was revolutionary and it made big differences to society at the time. It made differences that extend all the way to the present time. Or you look later, there was a time when the first nation was destroyed by software. That was in roughly the 1400s, when the Hansa League was destroyed by double entry bookkeeping and letters of credit. They insisted on payment in silver for all of the commodities they sent. But the commodities they sent were relatively valuable, so they took up as much room and mass as a load of silver. And so they had to fill their ships both directions. One direction say with furs and amber from Tallinn or Riga, and on the way back they had to send silver. But the Italians adopted the Dutch convention of letters of credit and they could send cargo both directions, and then they just made a bookkeeping change on the two sides, and they destroyed the Hansa because of that software change. That's just software. They didn't have computers yet, but they had software. And you think about another revolution, the adoption of electronic computing in our daily lives. That's been massive change. The adoption of the internet, the adoption of SQL. These were all comparable replatformings, how we rebuild the way we process information. Right now what's happening is we are changing from an idea that state is the primary concept in computing to one where flow is the primary concept. This is changing the economics. This is changing the shape of the economics. Every data analysis that you do has diminishing returns. Small data is typically the most valuable data. You do that first, and so as you build the scale, it flattens out. You have marginal returns that are less and less. And costs historically went up nonlinearly. That meant we could not escape that corner in the value curve until now. Until now, when we have computational structures where the cost goes up linearly with a very low coefficient and suddenly the optimum scale for analysis is thousands of times larger than it was. It's a massive change. And this is a change that applies to all industries at once. None of the previous changes applied everywhere. They applied preferentially to one scale of business, one type of business at a time. So SQL took 40 years to adopt, whereas Hadoopish things, flow-based computing in large scale has taken, I mean starting 10, 15 years ago at the very, very beginnings, but most of the world, most industries have adopted in the last five years. One moment in time, this is a complete revolution. And to me, the core of that revolution is in streaming computing. That is really where things are changing, and that's what I want to talk about today. But first I'm going to talk about what we do as a company. We are one of these evil people who make money off of open source. We give back in various ways, but what we do is build a data platform that takes different forms of persistence and puts them in as first class objects in the data platform. I could call it a file system, but that's confusing because it doesn't just have files. It has tables and streams as first class objects. Now the evolution of data persistence over time has been over 40 years or so, Linux, Unix, other systems, deck head systems, and so on. But we've progressively improved the functionality and the interoperability of files up to this point, primarily, massively. 40 years ago, you used to have a committee meeting if you wanted to use a file. And you would have to have an after hours call person for your file. You would have to have a contingency plan for expanding your file. Exactly how are you going to do that? And you would have to have hardware budget authority for your file. We don't do that with files anymore, and I don't think we should do that with tables or streams. But that progress happened over decades. Now, good and bad, but about 10 years ago, Hadoop changed that. In order to get a lot, a lot of scale, Hadoop gave up almost all of that compatibility, gave up on standard APIs, gave up on functions that you could do to files, like appending them at high rates, like changing the middle of them and things like that. Now that was, for good reason, that was how we were able to get scale at two-bit little startups like I was working in. And it made a big, big difference. It was a good thing, but it made it much more brittle and specialized. What we did at MapR, I don't have the hat, but we added back a lot of that functionality, that compatibility while improving scale, and we've continued to do that as we changed logos. The logos are not actually part of the change, but they illustrate it well. And the results are superb performance, massive performance. Here, this is a system with 40 SSD drives, NVMe drives, and 12 controllers, 200 gigabit network devices, and it's able to sustain reads from non-local reads of 16 gigabytes per second. That's 80 percent occupancy of those dual 100 gigabit networks. So that's cooking. This is 80 percent of what the hardware can do at all. And 40 NVMe drives, of the fastest NVMe drives there are, is quite something. So it's fast. It's scalable. We have customers with over a trillion files in a modest size cluster, scalable two levels that we have never seen before. We have customers who are verging on exabyte storage. It's big. The way it works, and this will illustrate then later the use cases that I'm going to talk about, is we split files into pieces. That's kind of like what a lot of people do. These are pretty big pieces, and those are then stored in a large thing called a container. Those are the hexagons here. These hexagons, these containers, are replicated across machines. Now, one cool thing here is that we have one extra multiply in the scalability. A dupe as it was originally built, and as basically it's used now, has a name node, some number of name nodes, times the size of the blocks that it stores. That's kind of the limit of what you can do. Here we have two multiplications instead of one. We have the number of containers we can process times the size of the containers, times the size of the things tracked by the containers. By having those three things there, we're able to scale to much larger things. We also are able to delegate certain operations like the storage of metadata from the central facility into the containers. That gives us high metadata update rates and gives us much more failure tolerance. Containers implement transactions, file system like transactions, and these then also implement micro snapshots, which we can glue together into transactionally correct snapshots and so on. Now internally, the files are all, everything is implemented as B-trees. Previously, file systems implemented things in inode sort of threaded data structures where the first few blocks were accessed nearly directly. The next blocks are accessed with two levels of indirect, the next ones with three levels of indirect and so on. That was a good decision 30, 40 years ago when these systems were designed because there was not enough room to store the index of all of the blocks where they were in memory. That is no longer true by a long shot and so it's much better now and also it's no longer true that there's a strong preference to reading the first few blocks in a file. It's better now to have a balanced structure like a B-tree so that we have B-trees of multiple levels but then all of the blocks except for maybe the first 64k are at the same level of indirection because all of those interior nodes can be cached and so we only have one disk seek anyway. Everything in MapR's file system is a B-tree. Directories are B-trees where the indexes are hashes of file names. Files are B-trees at the top level where we have the filelets and then in the filelets are also B-trees where we access to the actual 8k blocks. Containers are B-trees where they access free inodes and things like that. And we've added in addition to this not just files not just directories but tables. I mean we've got B-trees just floating around why the hell not to quote a famous philosopher from Texas. Kinky Friedman not well known as a philosopher but but he comes up with great phrases like why the hell not. Anyway so we can build these tables and we can build them in ways that use some of the advantages of log-structured merged trees without the disadvantages. We have a thing called a twisted B-tree which only does anything like merging at the very very bottom levels which means there is never a large compaction that's cool and then we can capitalize on tables to build streams but these are inherently splitable data structures they don't have limitations about having to live on one machine and so just like I had in the previous diagram I can take a table and I can split it into tablets and then I can split those into partitions again we have one extra level of indirections compared to say H-base and of course all of these work the same way you have these tablets they're stored in containers and they're distributed across the system. The only thing I've changed in the diagram is the tablets the partitions actually are colored little cylinders instead of the rectangles with files and a bit more than that changes in the code but not really that much and this is an exciting capability so we implement the streams in terms of B-trees and the side effect of this and that this is where we go in the rest of the talk this is the key point is that in a stream we can exhibit the Kafka API we're very very strong proponents of standard APIs so we can exhibit the Kafka API but we can eliminate entirely the limitation on number of topics there are soft limits in Kafka on the number of topics to roughly a thousand per broker partitions really but in the limit that's the number of topics it's a bound on the number of topics and so this is either a zookeeper limit or it's a number of file descriptor limit and it limits the structures that we can build in these streams so I'm going to talk a little bit about several use cases I'm going to point out first some characteristics and that is you get use cases that are essentially already implemented without any changes so for instance if I do ls in my home directory that's what's showing here we see files directories streams and tables just like you see files you see streams and tables and the file names are uniform across all of these apis they contain a reference to a cluster and a reference to a volume mount point these volumes are the unit of administration but the fact that all of the file apis which are posix apis or hdfs apis all of the table apis and all of the stream apis there's only one stream api there's multiple of everything else all share these same path names so this idea of directories is ubiquitous and that means that if you have some sort of application which is medieval old school and tied to particular nodes you can now move to a universal namespace below that and you can containerize it with throwing away a lot of the nodes because now you don't really care where things live they can live together they can live apart they just live above some ubiquitous storage so before you start we're already helping you out and it makes people look good but here's another use case and this is where we begin to see some i mean the first one is surprising to people just because they assume that big has to be nasty and unfun and not easy but here's one where we actually use these changes in how the streaming can work to eliminate the database entirely from an obvious database application so here's the idea we have dark pool exchange of some kind where people institutions are bidding to sell or to buy stock and because it's off the exchange they actually have to say who do they want to offer or ask about this so there's one sender on average 10 recipients a stock symbol and a price and a time and so what we want there is to handle a lot of those per second and we want everybody to be able to say what's the last stuff i sent what's the last stuff i received basically stream it to themselves and what's the last stuff that occurred for this particular stock if you think about it those queries all what is the last something it's really a lot like streaming it's not really a lot like databaseing and you can do with the database because you could put a time stamp on everything and you could say select where time stamp is bigger than the last time stamp that i got where i'm looking the stock symbol equals this you could do that but it turns out to be not not nice to do that so let's assume that we have like 10 000 senders 10 000 recipients and maybe 10 000 stocks there's 12 000 stocks in us you might be a much bigger number of stocks and you know like there's four million commonly traded financial instruments these are like commodity futures stock options stock option futures all those sorts of combinations and derivatives so you need a big number of those now the customer that we're talking to they're tried along with one of our competitors for years to implement this using hbase the only viable implementation they have is in this fiendishly expensive in memory database that's used a lot in new york and here's the solution we came up with the idea is very very simple we had a data generator that generates data we took a sampled day of trades on the new york stock exchange and we filled in simulated bids and offers before each trade so we would get the full bids offers trades uh tick data at least something a lot like it that's about 300 000 transactions per second so we have a data generator that fakes that at realistic rates at rates that go up and down like they did on that storied day in 2008 that we have a sample day for and we push that into a stream and we call the stream transactions of course we call it something because a stream lives in the file system so it has a name has a file name or a stream name and this stream there could have lots of topics in it it's like itself a whole Kafka cluster and then we have a worker b the transaction exploder there and what it does is it reads one transaction and it writes it to many different streams it writes it to the stream that says by stock so there the topic is the stock symbol and roughly every 10 seconds or so it will write a little stock index thing that says at this time this is the index in this stream and then there's a second one by sender so there the topic is the sender and in the bottom one it will write on average 10 times by each of the recipients this seems wasteful but it's an interesting exercise and an interesting result the result is that we wrote for every incoming transaction the same thing 13 times into different streams we can adjust the lifetime so that the penalty is not 13x the storage because we'll delete that some stuff pretty quickly but that means that we have to do four million inserts per second to keep up with the 300,000 that are coming in in the middle this seems extreme but these are streams not a database streams cheat and are able to support very high insert rates and so this demo application including a web server including query that run in real time including all the insertion including the data generator run on three small VMs so the cool thing here is that we have an architecture that allows us to query this stuff in real time in ways that otherwise something that costs a hundred thousand dollars per core and will only support a dozen or so stocks per core so if you think you know 12,000 divided by a dozen that's a thousand cores times a hundred thousand dollars so the alternative here is something that costs about three or four hundred dollars a month on azure or aws or google versus something that costs you about a hundred million dollars to run in capital costs that's a big deal that difference that roughly three orders of magnitude this is not saying it's twice as good this is not saying it's 10 times as good we have roughly three orders of magnitude in price performance here with a very simple design and the alternative design is not that simple so the point here more than coloring something red or coloring it blue is we have a revolution here this is mind-boggling that you can do in two or three weeks something that took years to develop and I can now do it I mean with hardware that I could practically hold in my pocket we had a we had a cluster in our booth that fits in a little tiny briefcase that has three nucks could easily have handled this task a two thousand dollar cluster that could do this versus a hundred million is a big deal the system handles nearly four million inserts on three not even real nodes doesn't use a database now that isn't always a good thing because we can't do arbitrary queries but we can do the queries the customer needs and we could put a database off to the side and store that and do real queries against that and these are also inherently real-time queries the delay through the system is matter of milliseconds archiving to compressed column stores is quite doable we could do that too we don't have to have a database we could use files the other of the three major forms of persistence and we could do aggregates to a database instead of storing the raw data we could do aggregates pre-aggregates by many different combinations of keys to live dashboards that could be interactively queried now you know I said stocks I said senders I said banks but doesn't have to be so this is a very general thing here is the same thing very very much the same thing except now I have machines in data centers sending metrics and they're sticking it into a stream there called metrics which is magically replicated using map our wonderfulness to a central place but the fact is we could use the same trick of a transaction exploder to push it to a topic which included data center machine and metric and so all of the work that we have gone through over the past years to build a time series database is now being done inherently in the stream the offset of the stream is a proxy for time the stream is doing compression of batches of these records and so I could query a bunch of streams for particular offsets and I can get all of the queries that I could get out of say open tstb except I now don't have any of the mechanism of tstb that gets in my way I don't have async hbase I don't have all the serializers I just don't have most of the mechanism it all just happens in the platform so the thing here is that streaming pays off big in some pretty quirky and cool ways another example of this is big time iot so suppose we have a hundred kilobytes per minute coming from cars or things a million cars sounds big huh well not big enough whoops that's not big enough customer wants a hundred million frigging cars to report at these rates they want multiple data centers they want to know the state of any device at the current time oh and it's history too now what the customer wants is to be able to have all of those cars report all of the time not be moving all the data back and forth between data centers but we can do that with streams using technology just like what we saw except we have a little bit of extra mechanism here the car is reporting into a data center at the bottom there and if it's local to that data center if that's its home data center it goes into the uploads stream otherwise it goes into the forward stream which gets replicated to wherever it's supposed to forward to and then we have a small process that puts it in the right place so we can again use the mechanisms in the platform to build these really cool architectures and we can focus on the hard parts on the the parts that matter to the customer and not deal with performance that these things are wide enough they can hold variety enough and they're tall enough that we can't touch the ceiling we can't touch the walls can't touch the ceiling with these systems we can build new kinds of applications that we could not build five years ago and these are going to make a huge difference one of the key design ideas here microservices at isolation this isolation lets us build in small teams at silicon valley startup pace even if we have a thousand developers if they're in teams that efficiently work at scale of five to ten people we can go at extraordinary speeds so suppose we're going to make a decision a little fraud decision or something like that it depends on the previous state of a card but depends on the current state classic way to do that would be this have a database classic way to scale that would be to have many fraud detectors that share a database sharing database leads to fires one way or another either the database melts down or we have committee meetings to decide whether or not we put an index on it like i want or leave it scannable like you want these are not reconciliable discussions we can agree on the general shape of the business events we can't agree on the details of how the database should work because that's an optimization that should be local we should build it like this share a stream but inside the square we should have private database private updater private decisions so that you know like a ceo comes in the ceo are there any ceos in the room good we can tell ceo stories but uh ceo walks in right and says oh you got this cool database that's keeping all the state of everything i would like to have a disc dashboard in the lobby that has little stars whenever the transaction comes through wherever they are ceos do this kind of thing and of course dba's at that point just go oh god it's going to break it it's going to break it because you know now we have to have whoever runs the dashboard in the meeting as well as all the people who actually do work and make money for the company just because ceo wanted something glitzy but with this isolation with this idea of streams remember streams are way fast so we're not going to saturate it so we can just start adding on things like the dashboard process the producer for the stream will never know that we've added more consumers it'll just be fine we're talking four million with real-time queries in three vms if this is running on decent hardware we will never know and this is a way we can scale these suckers so we just add another one and notice they each have their own database database the slowest part there but it's now scaling oh but the database would be big right take our bigness customer they're the biggest credit card company in the world and they have like a hundred million customers for this sort of thing you want to keep the last transaction the last location maybe a hundred bytes the database would be 10 gigabytes for the whole world you know it's like oh my god we would have to store that maybe in memory or whatever the hell we want because it's isolated we could have stored it in a database before in the next version we could you know i mean the same diagram tells us how to migrate across versions the next version could store it in memory or in my sequel or in manga whatever we want the database is private that's an implementation decision so convergence is what we talk about where we think about all the different forms of persistence as tools that we can use as easily as we can use files now the requirements are we need persistence and performance persistence because that's how we get isolation performance because that's how we get you to use it i'm pointing at you dude yeah if it's not as fast as anybody can imagine everybody's going to have in the back of their mind a workaround how they're going to work around it because it's going to fail tomorrow but if it's so fast that they can't build something that's faster just use it who here works around tcp because it's too slow it's like give me a break shades of 1996 where people are worried about that kind of thing we don't do that anymore because it's fast enough nobody cares who here implements their own domain name server because they're worried that the real one would be too slow can't imagine doing that that's so stupid well we want it to be like that with streams we want them to be infrastructural just like the air do i care that those are the right molecules that i should have been breathing wait a minute i didn't reserve those i reserve these well files and streams should be like that should be just like what the fuck cares might i got bits i'm going to arrange them this way let's go we have to have platform security has to be built in the you know i mentioned when dawg was answering the question about ethics we have to provide the tools to let people implement ethics we cannot dictate them but we have to provide the tools that people can provide privacy and such and we have to make this global the data sovereignty rules or just trying to build a real business you've got to scale it across legal boundaries national boundaries multiple data centers for disaster recovery whatever these data structures are they have to replicate magically without application people try to do that because if there any application developers here well i may offend you but application developers never have the time the budget or the skills to do good platform programming this different skills the ceo is going to kill you if you spend time on that because you're not doing the features that add value to business and so this has to be built in the replication update multi master global across wherever has to be there so the lessons i've learned from this and this was this is what i try to call a vendor neutral talk in that i talk about what we do as a vendor but i'm hoping that everybody walks out going ha that was interesting that's a vendor neutral talk no matter who you belong to or who you work for if you walk out saying that was interesting good to talk about then that's vendor neutral but the things i've learned especially map are about open source even though i've been involved in open source a long time is that apis matter a whole lot more than implementations and this goes back to a lot of apache mottos that didn't necessarily make they made sense to me but they didn't resonate one of the monos at apache is that code is not nearly as important as community you think wait a minute that's what apache produces is code right code is dead semicolons will never spawn users and developers developers can reimplement code in a flash and often it's a good idea to do that so the community is what we have to build not necessarily the code sounds totally contradictory because we as coders think about what we do is producing code but in fact what we should be doing is building community between coders so that none of us really have to stick with it forever the community carries on beyond the lifespan of interest of any given one of us and its apis that make that possible a standard thing so users can use something relatively stable the implementation can change so that both sides can be satisfied at the same time and we've definitely seen that though the posix api hdfs api hbase the document sort of databases things we've developed like ohai and and drill and apache arrow and and and so on there's plenty of room to innovate ahead of the community in implementations but we absolutely have to stay no matter how close source you want to be have to stay with open apis that's how we build a marketplace build a field of use together so that's kind of a an emphatic talk about some weird stuff but i'd love to hear some questions or comments we have new book on flink ellen was signing that today we ran out we have the older book on streaming thing we ran out of that there's a book on spark we ran out of that but you can get them all online for free so sorry we did that but that happens every time yeah yeah let's say if the hundred million cars are sending iot sensor data to respective individual places yeah move by you can get the status which will be very quick enough but what happens if i have a requirement to have to know the average speed of those hundred million cars out of the street so the question is what happens when he has to do a query out of those hundred million cars other than what is the state of this car what if he needs to know what's the average rpm over the use last day of red cars versus blue cars uh and the ceo comes in and says what about ones with chrome as opposed to plastic bumpers uh things like that of course there will be queries like that if there is an ongoing need of that we can stick it in a database especially we can stick aggregates in the database so that the update rate is slow enough remember every minute hundred million cars will report maybe not all of them are running call it 10 million so 10 million per minute that's a million every six seconds that's 160 000 per second that's getting kind of cranky and fast for a database because there'll be secondary tables and things like that but if we aggregate those down by 10x by 100x then it's yawn kind of speed for a database so we can certainly do cubed aggregates in real time and then we can just do those queries against a data cube so sort of repository an OLAP kind of repository and it'll so it happens so fast you won't be able to blink I'm saying the streams can build OLAP cubes with the assistance streams or transport processing happens in things like flank or spark streaming or apex or yada yada there's a lot of those popping up and that processing engine transport and processing the processing engine can certainly do the aggregations sliding or tumbling window aggregations to drop aggregates out to a database furthermore it can even keep in memory those aggregates so that they can directly be queried at the end of a window you have a lot of aggregates that might have to be written even in incremental ways so maybe you don't want to do that maybe you want to make that directly queryable that can be done an in-memory database but again we're going to build a service so implementation should not matter today's implementation may break tomorrow but we will fix it sometime tonight yeah in the back for what oh data yeah okay so the question is about a particular self-describing data standard um there's a lot of those lately uh which is unfortunate uh we've even contributed that we have something called ohai which is very efficient binary representation of json that can be columnarized and works with Jackson so that it's wonderfully incremental and fast um there are standards for binary interchange there are standards for apis and it's very frustrating that there is not one standard which is a really good choice in all cases but that's the way i see it right now is that we don't have a good answer yet but oh data which i don't know anything about could well be things but there's there's things from the financial uh community which are binary json operations there's bison there's there's json itself which is getting faster and faster departs there's all kinds of things the only thing i would say is let's not use xml again how about over here any questions you lose your deposit how about over here anybody john burns is speechless this is a an epoch in history uh i'm just tired yeah okay maybe everybody's tired i'd be happy to talk after the break but we're going to have paco talk first and he always talks he's good okay thank you