 All right, everybody, I'm going to get started. Thanks for coming to my talk, Blockchain as a Service, the Building Blocks of Blockchain. I am Gary White, Jr. I work in the Dell EMC Dojo, based out of Cambridge, Massachusetts. We believe in evangelizing DevOps and trying to spread the way that specifically Pivotal works across the Dell EMC organization and throughout the Dell Technologies Foundation. I'm also an active CF Committer. We try to keep active in Cloud Foundry projects and contribute as much as we can. And then on top of that, I am a blockchain enthusiast, and that's how I fell into this talk. You want to tweet at me? You want to tweet at the Dojo? Ask questions after the fact. If you can't get to me, then go ahead. My information is going to be on pretty much all these slides. So whatever picture you take, you'll get it. And this is the agenda. This is what we're going to talk about. So part number one, I'm going to tell everybody what a blockchain is. Because before I can tell you why you would want to use it, you have to have a good grasp on what you're going to be using. Part two is going to be why you should care, how blockchains help you secure, access, and monetize your data in a distributed fashion, which is something that's very difficult to do with current implementations. And then how you can make it a service. Some of the work that we've done and some of the work that you can leverage that has been done by people external to the Cloud Foundry community that help you distribute your applications with a blockchain even in Cloud Foundry. So let's get started. The first part is that deep dive on what a blockchain is. And to answer the question of what a blockchain is, I wanted to spell what a blockchain is not. Blockchain is not Bitcoin. Bitcoin itself is not blockchain. It sits on top of a blockchain the same way that you might have an application sitting on top of a database. Blockchain is just a data structure. It's like a hash table. But instead of in a database, you make transactions and then the database itself holds the current state that happens after all of those transactions. A blockchain holds all of the transactions that you make into a database so that you can replicate the state anywhere throughout the system. And the upside of that is that you can't delete the transactions themselves so you can always replicate the state. It is also not a currency. Blockchain holds data the same way that MySQL and Mongo might hold data. So trying to think of a blockchain itself as being a currency is kind of like thinking that the Mongo database behind PayPal is a currency. And the third thing that it is not is like a big deity. It's not this own thing. It's not an overlord. There's no such thing as the blockchain. There is such thing as a blockchain. Like I said, it's just a data structure. You can use a public one and that's usually what people refer to as the blockchain is one that's maintained by a wide set of people. But it may be that you have your own on-prem blockchain and that's a solution that you can definitely do. So to talk about blockchain, I wanna cover what that data structure looks like. We covered it a little bit in the sense of it just holds a bunch of transactions that you can replicate state from. And that set of transactions is called a distributed ledger. It's just a bunch of computers throughout a network holding on to the same data structure and coming to consensus on what that data structure looks like. So it's gonna have open read access in the sense that anybody can come in and see what's on it. In most cases, there's some that are fringe nowadays that try to change this, but in pure form distributed ledgers are open read access. Open write proposals so that anybody can come in and say, hey, I wanna put this on the distributed ledger and it can be validated later. The validation process and the security is where you start to get into blockchain, but you can integrate it into a distributed ledger and still technically not do the same thing. And then something that comes out of this is that instead of linking to some external account like an email, you just get a private key and a public key to identify yourself and sign transactions. We'll get into that a little bit later, but that causes some anonymity that you don't normally get with data structures nowadays. Think of distributed databases. It's almost always written by some user that has to be registered. So talking about databases, didn't we already solve this? Doesn't this already exist? Doesn't distributed data already get security? Can't you just do this somewhere else? Why does blockchain have to come into this? Well, as much as distributed databases have great replication, you can prove to five nines or whatever it is that you need that you can get the same accessibility throughout a system. There's data synchronization in there too and there's familiarity. So people use like MySQL and Postgres and they can plug into solutions like Cassandra and Greenplum that are intended to just be abstraction layers that distribute those Postgres and MySQL databases across wide areas, right? But one of the things that you don't get with just using a distributed database solution is this fault tolerance. There's a very big fault tolerance that it does not address that can be on the level of hardware failures or software failures. And if you don't plan against these particular kinds of faults, then they will always cause system failures. I wanna say right now that this talk is not about why you should use blockchain for everything. This is a talk about some of the advantages of using blockchain to secure your data. There's probably another talk about why big database is something that you should be using to distribute your data, but that's not this talk. So let's talk about the problems that you do see with distributed databases. The biggest glaring one is called a Byzantine failure. Byzantine failures come from Byzantine faults. And to tell you about Byzantine faults, I want to describe the Byzantine generals problem. I know that was a lot of Byzantine, but just stay with me. So the Byzantine generals problem is like this. You have six generals surrounding a city and they wanna decide whether or not they're going to attack the city. So they decide if more than half of us want to attack, then we'll attack. But otherwise, all of us will retreat because we don't wanna lose the battle by not having enough manpower, right? So they can send all their data out and maybe they sign it to make sure it's always coming from the right place and they can just propagate it to everybody else. So everybody has a copy of everybody else's message, right? But what can happen is if you get communication bus failures or even worse, if you get people who defect from the system. So you get some kind of integrity fault. Somebody broke in and they want to break your system. They want to cause the generals to attack and a few of them to retreat to cause this failure of there's not enough manpower to win the fight. Then these generals can start sending their own messages out to different people and say different things to each other. And that is not something that Big Table tries to address. And like I said, that doesn't mean that Big Table is not a good solution for a lot of problems. It's just that blockchain addresses this one because in the days that we live in, there's a lot of security breaches that happen and credentials get stolen. And blockchain will prevent against this problem completely annihilating your system and your data. So one of the more realistic examples that's abstracted a wave or that's closer to reality than that abstraction is NASA had a flight control system that depended on accelerometers communicating with a metadata master. So I put my nice little diagram here of all of these accelerometers thinking different directions and they get it from this data master. The communication bus melted in the space shuttle because it's hot in a space shuttle, right? So when that communication bus melted, some of the accelerometers saw one thing and some of them saw another thing. And so they started getting in disagreement and when they started disagreeing, they kept disagreeing and all of them started disagreeing. Instead of taking the integrity of each accelerometer and they all have logic to figure it out, they all depended on this one metadata master saying things that made sense. So that's the problem, is that you're not verifying at every node, you're verifying at some central authority and you're trusting that that will never ever break. So that single point of failure is not something that we want to continue doing. We don't wanna continue having to trust everything that comes from one source. So that begs the question, who do we trust? We trust nobody. Don't ever trust anybody ever whenever you're designing a system like this because it's like really bad and you're never gonna get anywhere and you should never trust them ever, seriously. And being serious, like don't ever trust anybody ever because as soon as you do, they're gonna break your system and you need to design a system that's perfect against this, except you can't, right? At some level or another, we have to start trusting the people around us and have some level of validation that we can prove from the data we're getting from a different source. So what's that like threshold? How do we start trusting these people? We can trust them if we put a set of rules like as long as we can validate that what they're saying makes sense, then we can trust the data. And also, we should probably make sure that it didn't melt in a communication bus, like there's some kind of MD5 hash, if you've ever downloaded something from the internet, your computer does this anyways and maybe they should have to sign it with a private public key encryption. Pretty standard stuff, things that we've heard of before, hopefully in this community. And then this data should have integrity at the end of the day. We shouldn't be allowed to delete anything that's been accepted by the system. We should be able to always go back and recreate the state, but we shouldn't be able to just delete state and then try to bring it back later. And then lastly, data access should be easy because I can think of a scenario that will keep your data secure and here's what it is. I'm going to put all of the application data at the bottom of the ocean and then I'm gonna put it in a safe so that nobody can get into the safe to read my app data. And then if people try to steal the safe from the bottom of the ocean, I'm gonna put sharks around it so that they can't get to the safe and they can't steal my application data. And then if they have shark repellent, then I have lasers on the sharks so that they can get rid of whoever's coming towards my safe with all my application data. It's perfect, I think, right? Everybody, you wanna buy this from me? I think it's good, no? Okay, well obviously not, right? This is kind of like what we're doing with all of these layers of security. You can keep putting more and more stuff in front, but at some level it just gets ridiculous to dive to the bottom of the ocean and fight all these sharks and then get the app data and bring it all the way back. It's not very performant efficient. So we need to use something like blockchain. And so I wanna talk about blockchain now and describe exactly what it is. Blockchain is made up of all of these blocks. And I know that everybody understands this block as soon as they look at it and I don't need to cover any of it, right? This makes sense to everybody? No, okay. So let's go piece by piece here, because this is a lot. I'm gonna start in that top left corner in transactions and we'll talk about what those are first. Transactions are just user data. They ask a few questions. What's the plan? What's the thing that you wanna write into the ledger? It's just like think of like the Bitcoin scenario where I say I wanna send five Bitcoin over to my friend, Tim, over there, sitting in the audience, he's right there. You want five Bitcoin? Cool, so the second thing that I'm gonna do is make sure that it's secure by putting some MD5 hash together. Again, this is just something to ensure that when it goes across a communication bus it doesn't get garbled or anything. And then to make sure that it comes from me, I'm going to sign it with my private key and then somebody else looks at my public key and they decrypt it. So it's coming from me. It has integrity when it gets across the bus and I just shout it out to everybody around me. I just say, hey, this is what I wanna do. Everybody makes sure that what I'm doing is something that you agree with. It gets to somebody else and they just say, yeah, you have 30 Bitcoin so you can send five Bitcoin over to your friend, Tim, over there, that's okay with me. And everybody eventually will get to see the transaction and get to consensus, but hold on, let me get there. That doesn't happen immediately. So as we move across this diagram, I wanna go one box down into all of these transactions being put into a data structure called a merkle tree. So a very simplified version of this is just that you number all of these transactions so that it's a replicable state tree and you can say, this is transaction one, this is transaction two, and what you do is you take the hash of transaction one, the hash of transaction two, and you subtract them and then you hash that. So you keep doing this and you keep subtracting subsequent hashes and then hashing that results over and over again until you get to the top of the tree where you get just one hash that's going to prove the state of everything in the tree. So when you send this one hash out, if anybody changes any of these transactions, you'll know because you can't just recreate the hash and pretend that it works. People will try to recreate it from the transactions in the tree and if not all the transactions are the same, then they're not going to agree that it makes sense. So that was a lot, but I got a lot more, so keep holding on. We're gonna move from that bottom left corner into just saying that that goes into the center of that block. The transactions and hashes are given integrity by all of the hashing that's done within it and the fact that they're signed by people who are sending them and then we have our Merkle Tree top hash to ensure that all of those transactions are the same transactions that whoever's putting this block together put them in order for. So they're always in the same order and you're just making sure that they're in the same state. The previous block hash is in here just to, yeah, hey, come on. I think my pointer died. There we go, cool. So the previous block hash up there is just to say this is the starting point that I'm coming from, right? This is just saying this was the state when I started signing all these transactions so you can validate that all of those transactions make sense in the order that I put them into the block. So like, if they happen out of order, they may not work but since they're happening in this order, they should work. And then we're going to introduce a third thing in this block that's called a nonce value and this is where it gets a little bit more complicated. A nonce value is in essence just proof that you've spent enough time trying to find the nonce value. It's a bit of a recursion but it works like this that you take a previous block hash and you can catenate a Merkle tree hash and that is your base string. So here it looks like this but to kind of make my point of what the nonce value is supposed to do, I'm just going to pretend that this is, the base string is hello world because it doesn't really matter what it is. The point is that in a blockchain, I can make validating a block more and more difficult by saying that you need to put a certain number of zeros at the beginning of the hash that you send to me as your validated hash. So I'll say that again. It's that you are trying to find some, in this case, four zeros at the beginning hash using this hello world string and some value concatenated onto the end of it. So what we're going to do is we can start from zero as our nonce value with the payload hello world. Okay, that one didn't work because that doesn't have four zeros at the beginning. So I try one, two, three, four, five, six, 4,000, whatever, however long it takes, it's going to probably probabilistically take a certain amount of time for me to find this value because remember that hashing is non-deterministic. I don't know what's going to come out of the hashing algorithm. So finding this one value with this leading number of zeros, there might be another one that starts with, instead of this C, there's a B, instead of this A, there's an F, but it takes a certain amount of time to find something with those number of leading zeros. So that's what the nonce value does is you just keep concatenating it onto your base string until you satisfy that condition. That adds a certain amount of difficulty. It takes a reasonable amount of time to do that and that is the value of forcing people to do this, is that you know that they've held these transactions and they've put work in to try to get here. And the added value that comes from that is since everybody's just grabbing in this grab bag of transactions and they're trying to find this one value that works for all of the transactions that they have, then they have a vested interest in just cooperating with the rest of the system because who knows who's gonna find this nonce value, right? It's like Charlie and the Chaka factory. Who knows where the golden ticket's gonna be? Who knows where that number's gonna be? You don't know until you just start looking, you start opening candy bars and you find it eventually. And then good for you, you did it. You contributed to the system and you can't really guess who's going to find it and you can't really engineer where it's going to come from or what the value should be. So when you get this base string and then you put this golden nonce value, this magic ticket that you opened enough Wonka bars to find, then you've spent a lot of time validating that you have held these transactions and you have been trying just on these transactions to contribute to the block. So the point of that is that you can't just do it yourself. You can't force that to happen. So it benefits you to try to contribute to the integrity of the system. And that is what a block is supposed to do. It's just supposed to be application data that you're trying to secure and hold down and validate that it has some integrity. I've already looked through all the transactions and I'm spending the CPU cycles to find that nonce value. It's not likely that I'd want it to go through and then somebody else sees that a transaction is invalid. That wouldn't be good for me because I spent a lot of time trying to make it work. So that is the block. And then to kind of solidify this in everybody's minds, I want to go over how Bitcoin uses this to make their cryptocurrency so secure. They just put a payload in like I said in the example, I send five Bitcoins to whoever. People look around and they say, yeah, that guy has enough Bitcoin, so that makes sense to me. And then people will mine my transactions in the sense that they're the people putting it in blocks and just guessing all those nonce values. That's what all those big GPUs and those warehouses do. They're just guessing the nonce values until they work. And then they get the incentive of if they find a golden nonce value, everybody just gives them Bitcoin. Whenever you find it, you get rewarded with a certain amount of Bitcoin. And the difficulty can go up or down based on if I want the block to take 10 minutes to find to ensure that everybody has seen the transaction in about 10 minutes, then I can just adjust my nonce value or my difficulty, the number of leading zeros up and down. I can say, okay, so I had five leading zeros and it took them five minutes. So I'm gonna put 10, because it should take about twice as long as it did. Makes sense? That's the variable difficulty. That's being able to adjust how long it should take to validate blocks. The anonymous thing, again, is just kind of a nice thing. I think that a lot of data structures don't do this. Distributed databases require you to have some kind of credentials to log in. Blockchain does not require that of you. If you have credentials in the sense that you just have a hash ID that's registered into the network, then you can do it. And there's a huge adoption of those. People have been building on top of Bitcoin to try to encode some data because of the security that's been coming out of it. It's so hard to break that people like shoehorn little tiny data structures in there. It's really interesting stuff. So one of the things that I didn't go over is that in this model, there still can be conflict, right? Because let's pretend exactly 50% of the system was working on two different blocks at the same time. And they both found this nonce value at exactly the same time and it does this. There's two blocks, uh-oh. What happens? Well, more likely than not, one of these is going to find the next nonce value quicker just by chance, right? So I'm just gonna pick that the top one is going to find the nonce value a little bit quicker. And so they do, and then the nodes have a decision to make. The remaining nodes in the system can look at the shorter chain and say, do I wanna weigh CPU cycles on something that might not get adopted because it's not as long of a chain or do I wanna weigh CPU cycles on something that my CPU might actually lead to me being rewarded? And people will want to be rewarded. Nodes have an incentive there to not throw away their cycles on a shorter blockchain. So this one goes there and it gets a little party hat because it got adopted by more people. If people have transactions that are built on this version of history, unfortunately, if they're not compatible with the new version of history, they might get deleted. And so there is a level of this that you have to be prepared that you could be on the lower chain. So if you're not playing by the rules, you might not get your transaction validated. If you are playing by the rules and you actually own enough resources, then you don't have to worry about this problem. But this keeps people from spending more resources than they actually have by sending two transactions at the same time. It'll just fork and it won't work for some of the people. So that is how you would resolve conflict in this system. The nodes will eventually come to consensus that one of these versions of history makes more sense than the other. So now we understand blocks and everybody in here is a blockchain expert. Congratulations, woohoo! But we still need, I wanna talk about these transactions up in the top because we're just using this model of let's put really easy data in there. And let's just not really worry about what it is. We just make applications, figure it out. And I think that we could put applications themselves in there, right? This is an adopted theory by the Ethereum Foundation and some of the alternate blockchain implementations that they put basically bytecode inside of transactions. And they say this bytecode does this application. And then when it gets mined, then you can just call this bytecode and execute functionality within any node in the blockchain. So you've suddenly started putting your applications in this distributed ledger and you can use that to your advantage. And this is why you should start to care because you can use the distributed security that comes from blockchains and the fact that people constantly contribute to the system in the sense of proof of work and that mining cycle, finding those golden nonce values to ensure that there is integrity within the data. And if you wanna contribute to the system, then you can and you can put certain resources behind it that make it very difficult for other people to try to delete any of the data that you're trying to commit. And even if they break in and they commit stuff they shouldn't, then they can't destroy all of the older versions of history that already exist. So if you put your applications in here and you have application data sitting within the blockchain, then that data has integrity. Even if there's a point that there's a breach, you can always roll back and get back to the state that the application had full control of all of the data within it, right? And so there's a social construct in here that even if they break the current version, you can always just be like lift and shift and put it somewhere else. And then they have to break all those private keys. So if we wanted to put this in a cloud because we're at Cloud Foundry Summit, so I might as well talk about clouds, then we could do something like this. Here's one architecture where we don't change that much about what the user sees and we just put a web UI and have a private blockchain sitting behind a firewall. So it's hard enough to get to the blockchain, but even if you get to it, you have to break private and public key encryption, which is like, I'm pretty sure impossible, or you have to break into a VM, which again, private public key encryption, good luck with that. And even if they do all of that, then you can just like boot them out, kill that node, kill that one blockchain ID and say cool, I'm just gonna keep doing what I was doing. So this is one way you could do it. You could treat it kind of like a distributed database or you could open it up and allow users to leverage the distributed applications that you have written into your blockchain and allow your web interface to leverage those things that you've written in there. So why would you ever do this, right? Because you're opening yourself up maybe to a little bit more hackerishness and people are trying to break in, but the reason you might is because you can monetize it. You can charge transaction fees. For every time that they use certain functionality in your application, you can make sure that they've committed the resources that they're able to pay for whatever functionality they want to invoke. And that's some value that you don't get out of a distributed database. So this is where we start to get into the project that the Dojo has been working on, which is Ethereum as a service. Ethereum is an implementation of blockchain. It's an open source project that does distributed applications as well as transactions that just send ether back and forth, which is Ethereum's version of Bitcoin. So in this architecture, I have a few Cloud Foundry instances and they're all talking to a deployed or maybe not deployed machine in the back. And they talked to the CF Service Broker API. So we've leveraged the Service Broker API to make sure that these containers running within these Clouds can all get access to the same blockchain and the same version of truth as they communicate through their activities of application functionality. We exposed this through a web UI to the users and if you wanted to, you could make this a private blockchain just by running a boot node in the back. That's Ethereum's version of just saying, this is the server that you wanna run into to make sure that you're talking to the same blockchain. As long as nobody knows where this boot node is or doesn't know the credentials to get into it, then they can't compromise your system. For some elasticity in this, because I don't wanna sit around and run this boot node and have to worry about it, this could be a Bosch release where you deploy a boot node and a network with Bosch and then you set up the CF Service Broker API through the CF CLI to point to these Bosch instances that are running this great thing that we made. So on top of this, you might want to leverage some distributed applications in your web UI. So there's a lot of work being done in projects like MetaMask and projects like alternative browsers that I can't remember the name of on top of this stage where you can call blockchains directly from the browser and you can make transactions directly from the browser and monetize data that way. So this might be an approach that you wanna take where your users have access to the blockchain and then you have access to the blockchain in the back so you can both verify that the data went through and that the certain functionality should be happening. But in this case, you may not even have to have the piece in the back, right? Maybe, because you can just leverage the transactions that are happening through these distributed applications that you've published on a public blockchain. Your users can interact directly with the blockchain so you may be able to use this model where you just have a public blockchain that already exists for you, you don't have to maintain the infrastructure and you leverage it with distributed applications and frameworks that are built by an open source community for just such a thing. You can still monetize your data this way by using transaction fees and that is the whole talk. That is blockchain as a service. So I will take any questions from anybody. Anybody got anything good? Cool. I answered all of the questions, but okay, all right. No? Okay. What's up? Which project? Blockchain as a service. So we're still developing it in the sense that there's still bugs that we're trying to work out with getting it to connect to the public blockchain. So no, in short. In the back, in the shorter chain you get cut off and I didn't catch the last part. So you have to watch the state of the chain at that time but if you're playing by the rules then you don't have to care if you're on the shorter chain or not. So it prevents the problem where if I forked the chain myself by spending 30 Bitcoin on one chain and 25 Bitcoin on another chain and I only have like 30 Bitcoin to spend, then one of those transactions gets canceled. But if I was doing everything that I should be doing anyways then you don't have to worry about your transaction being canceled because you didn't break any rules so it will get integrated eventually. Yes. No, you wouldn't put the nodes in the containers but you would put the nodes as Bosch releases and different available. So that would be the storage that you'd want is probably through a Bosch release because you could absolutely have nodes running as containers in Cloud Foundry. But the problem that you run into there is that these chains get really, really, really, really large. Like the Bitcoin chain I think is up to 70 gigs or something like that. Anybody know? Top they read? 40 gigs? Okay, sorry, I over-exaggerated. Yes, another question? Sorry. Absolutely. Because the public trust part of it is that you can't pay off every validator in the entire blockchain, right? Because you don't know who's gonna find the nonce value first. So even if you always have a miner running you would have to have more resources than it would be worth for all of those miners to take control of the chain. Yes. Absolutely. The incentive that's done by doing this mining activity is called proof of work and it does relate directly to the cost of electricity because if it starts going over then you have no incentive to do it because it costs you more to make the electricity. But there have been proof of authenticity protocols which I think start to kind of break the system because it's basically just trusting people again. That's distributed databases. And if you do things like proof of stake is Ethereum's cool fancy thing they wanna do that's a really interesting thing and you should look into it. If you just go to the proof of stake wiki you can just Google that and it'll come up. Yes, we should talk after. Yes. So the user themselves would have to have the private key to a public key that has some amount of Ethereum. So if you go to like CoinDesk you can sign up for an account and you can buy some Ethereum and then if you wanted to use the application then maybe you could have that brokering service yourself to provide for this amount of money I will just give you some of my Ethereum and you can run my application that you wanna run. Okay, anything else? Anybody over here? Cool, thanks everybody.