 Hello, everybody. My name's Steve Cole. I'm from Adobe Systems, here representing our implementation of Neo4j in the cloud, specifically. This is my first presentation. However, I do believe I'm fully qualified. I'm presented to you on an Ultrabook. I've got my iPad with my notes, and my phone is going off the hook with all my birthday congratulations from Facebook. So I'm nerded out. I've got my laser pointer. I've got a space pen with my name on it. I think I'm qualified to give this, and hopefully we'll see how that goes. So from a history standpoint, I come from data center background. I've been doing this since 97. I've been doing financials since 2000. A lot of security audits. A lot of high-end networking. A lot of fun stuff. It's been an adventure, and coming onto Adobe has been even more so as we moved into the cloud and one of the projects that I had to take on when we did that was putting Neo4j into the cloud, and not just into the cloud, but into the cloud globally. So a rather large undertaking and quite an adventure, and I'd like to basically tell you how we got there. I will be talking about what I did, the research around that, and the results of it. However, as you may be able to guess, I can't say specifically what the project is. Although, I think if you draw conclusions, you'll probably figure it out pretty quickly. And then of course, everything that we did, it wouldn't have been enough without the help from Neo4j, the actual Neo Technologies guys, so they pretty much, they definitely share in responsibility for getting us to where we are today, and thanks for that to them. So starting off with cluster models, I think the first one's not really a cluster. Single server. You're all familiar with it though, so those of you who do your development, you've got your laptop, it's got the instance of a database, whatever that database may be, and the graph is a SQL, is a NoSQL. And so this is something that you're familiar with. There are, with single servers, things that need to be done that change quite a bit once you start talking about an actual cluster, especially a highly available cluster. It gets even messier once you start talking about a global cluster, and that's specifically what we're here to talk about today. So the Layer Cake. You're going to get a little bit more than we bargained for today. We're going to talk about three things here. We're talking about Neo4j. That's the rock star of the presentation. However, we couldn't have done this global implementation without a few other components, and they are low-density layer, which we use Riverbed's steam-ray product, formerly known as Zeus, and Logni and Hamachi VPN mesh software, some of you may or may not be familiar with. Actually, it's quite interesting gaming world, to be honest. So that's our Layer Cake. See if I miss anything else here. Oh, yeah. As a true SaaS service, there are other layers, and I'm sure each of you know what these could be. You've got potentially a queuing system. You've got messaging. You've got all sorts of other stuff in front, including probably an actual application layer. You're not going to run this directly inside Neo4j, or you may. It's certainly capable of doing it. But from the standpoint of this presentation, let's just say that this is a service within a service. So we are not front-ending requests, and therefore it introduces some security stuff that we're going to talk about as well. Steam-ray, we talked about Neo4j. We love them. We're going to talk about Logni and Hamachi. So the single server, this diagram took a very long time to put together, but I think I've done a pretty good job of it. However, there are a few things that we do need to talk about. So of course there are server settings that need to be addressed. I think the Neo product does a pretty darn good job of figuring this stuff out for you. It literally leaves stuff commented out, and it will do most of the work for you. Some specific needs may need to be addressed. They address those themselves with each progressive version, which has been awesome. It's less and less worth for me as an ops guy to get this thing out of the running. But there are a few things, job settings, where your plug-ins live, that sort of thing. From a client standpoint, single servers are awesome. It's a single end point. All you do is you point at a local host, or you point at the one server that's running on, and you're done. And it's no must, no fuss. It's the response. You don't have to worry about consistency. If you wrote a record 10 seconds ago, it's there right now. Risky for production. Not many people do this, although I'm sure some people have. Obviously, speed development. We talked about that. This is, most often, what you've got running on your laptop. And it is, without question, the least complicated to operate. You don't exactly need a DevOps team for it. HA cluster. So this is a little bit different than your standard drawing, although you've probably a number of you, at least, been introduced to ZooKeeper at some point in time. ZooKeeper is a coordination service. It's used by Neo4j to communicate between nodes. There is a direct connection between the nodes. However, this concept of keeping track, who's doing what, what's your sync state, et cetera, is actually handled, for the most part, through ZooKeeper. And I am. There are a couple of Neo guys here. They can correct me if I'm wrong. But that communication in the beginning was grandiose. There was a lot of it. And it's very thin now. So it's getting much more effective. So this is a basic diagram. And of course, the load balancer up above, you see that is how you determine which node you're going to talk to. We're a basic system. We're just two servers. Are we talking to both at the same time? Are we talking to Ram Robin? Are we going back and forth between the two of them? How exactly do we want to set that up? There's a couple of different ways to do it. You can do this through hardware, data center, great cloud, not so great. Software, which is how we ended up doing it. ELB, the services. There, I just tipped the hat as to where we might be. And then there's round robin DNS, or we like to call it caveman load balancer. Each of those has its benefits and drawbacks. The true trick to load balancing, however, is determining where you want to send your traffic. And so the round robin model, good for developing an app service, hit every single server and, you know, one breaks. You've identified it. You can move on. With a database, it's a little bit different. And I think probably a few of you have seen Facebook progress to the point that you post your status updates and it appears. But does anyone remember when you posted your status update and it didn't appear for 30 or so seconds? That's because they were writing master and reading from slaves. And so that eventual consistency took a little while. You wrote to server A and it, you know, finally appeared on server Z, which is what you were reading from. Neo4j does have that same model where you can write to one and eventually will make it to all of them. So it becomes a decision for your engineering team exactly where you want to write it. And there are some benefits and drawbacks to each which I will talk about. And in fact, I'll talk about it right now. So Neo4j, it's recommended from an HA perspective that you write to the slaves. And this is a good thing. This is actually a really good thing from a durability standpoint. You write to the slave. The slave has that record. It sends it off to the master. The master writes that record. And now it's actually part of the actual database snapshot and it can be distributed to any number of nodes that you have in the cluster. If, for some reason, that master worked to crash, to go down, or, you know, Chaos Monkey was just released, right? You decided to nuke that instance. Your slave still had it. And a reelection will occur. A new master will become, will be chosen through the zookeeper service. And then that slave will say, by the way, I've got this other record in case you don't have it and you're in sync again. So a very, very good durability choice. We often go the other way, though, just to make their lives hard. We chose to write to the master for a couple of reasons. One, we've got a lot of slaves. And so if all these different nodes are receiving traffic, it was our perception that they would be flooding the master anyway. So that's going to be the case. Why don't we remove that extra step and put all the load on the master to begin with and then free the slaves up for sync, which is pretty basic and very thin, and complex queries, which is where we expected the slaves to be really working. What happens when you've got a multi node traversal or you've got some complex query that you need to be executed or even a batch mode of multiple queries? So we decided we wanted to leave the slaves free to do their work and we chose to write to the master. That did introduce a few problems, though. And the most obvious one is if the master crashes while you're doing this, which is part of our acceptance test and we do crash it on purpose, you've got a little bit of a problem. You just lost a record. And it can manifest itself in a number of ways, but most immediate the load balancer starts to fail. You start getting application level errors and it's really bad. But I'm very pleased to say that the newest release of Neo4j, which is on our website, has solved this. They've actually got some durability built in and I've been talking to the guys at the booth for quite a while today. I don't know how they did it, but they're not only more durable, but they're twice as fast. So the latest release, if you haven't downloaded it, it definitely do take a look. End point decisions, that's another one here. The end point decision, that's the master slave thing. It also gets into the concept of a read pool or a write pool. Do you want to separate them? Do you not? Again, these are decisions that are made for application and I think your engineering team probably best to figure those out. So, global clusters. Global clusters got really interesting for us. We saw the H.A. problem. We had this thing running in a single region at Amazon and it was performing quite well. We had acceptance tests that they were passing. The NeoGuide were working with us to help us to understand all the intricacies of master elections and all that stuff so that we could build around that to make sure that we knew what was going to happen. This needed to be a 24-7, zero downtime service. So, then we took a global and everything started to break. When I say global, what I mean is three regions. We're running in the United States. We're running in Europe and we're running in Asia. Three Amazon regions. There's some communication problems right off the bat. Those of you who are familiar with it may know about these types of things. If you build your own data centers, it's really easy to build this VPN between all the different data centers or you're using another technology in PLS or something like that to actually communicate directly. You don't have to worry about it's secure, it's either encrypted for you but it's not going out on the public internet. Well, it does when you're at Amazon. There are solutions, there are ways around it but because we're part of a larger project, we're not the only client of the Amazon service that we're using. We had constraints and we had to treat as if we were, you know, some guy doing a hobby site with three servers. So, permissions were easy to do, set it up, open up these ports, service could connect and we got everything working just great. Controlling of masters. So, I've talked about the reelection process a couple of times. It's pretty controllable, pretty predictable if you're launching Node 1 in the United States and then you launch Node 2, Node 3, etc. Node 1 is going to be the master, that's your first node up. When it gets tricky is when you start intentionally killing the master as part of the durability test and suddenly Asia becomes the master. Now, every record that you write in the U.S. which is where the majority of our front-end traffic would appear is going to Asia and then from there being distributed back to Europe and the U.S. It's great, functional, it's rock solid, it worked every time but it started to get a little bit slow. So, what we need to do is start controlling the master and that was another release that, you know, technology is released where we actually could say Europe, Asia, you are not eligible to be masters. If we lose the master, you guys are all just slaves, you can report data, you can read, but nobody can write. But since we have three nodes in the U.S., the only time that would happen is if AWS lost an entire region, but that's never happened, right? And from a low balance in perspective, I'm going to show you graph in the end if you haven't downloaded and looked at already. So, I'm sorry, go ahead. So, I just want to ask you about that last point you made. You elected them to not have a master, meaning nobody can do right. No, sorry, I may have miscommunicated that. So, what we did was we actually told it that Europe and Asia were ineligible to become the master. So, only the U.S. could be the master. So, if we lost a node, if an instance were to go down or, you know, for some reason software glitch, the node that was master in the U.S. was to go down, then only another U.S. node could become master. Okay, so you'd still operate with a master and I'm assuming Asia and Europe are slaves after the master in the U.S. Correct. But if we were to lose an entire region, so now Europe and Asia can't talk to the U.S., they can't connect to Zookeeper, they don't know what's going on, everybody goes into slave only mode and they can only report, they can only read. They can read back to you data that they contain, but they will not accept rights. And so, we've got a little bit more detail than I was looking to touch on, but we have a system then where we'd actually reconfigure Europe to be the master. It's like, hey, Amazon U.S. East just went down. We pull the trigger and Europe West becomes a master region and takes over all the rights. So is there a risk if you lose the intercom and all connections in that scenario of having a split brain problem? I'm actually going to talk about that. All right. Yes, there most certainly is. Sorry, my notes just went into sleep. Okay. And then the little bouncing again, I'll talk about that a little bit more. Global performance, by the way, if you start writing data to the master at a rate of X and you expand that from one region to two and two regions to three, no difference in performance. The sync is happening behind the scenes. It doesn't impact the master at all. So Logme and Hamachi. So first of all, Logme and Hamachi. It's a VPN mesh software and I'm sure you guys know of other solutions and we did indeed check some other solutions out to see what worked best for us. Specifically OpenVPN, I think a lot of you are familiar with. And there's another product that works well with the cloud that's called Neo Router. And well, we could have had two Nios, but in the end there's some extra layer with Neo Router that we didn't want to deal with. And so we shut them down. But most of all, Logme and Hamachi, which is sort of a dead product right now, it's sitting stagnant. They're not actively developing it. But the benefits that it brought with this command line tools were so great that it enables to do so much with our command and control service, which I'll talk about at the end, that it was a clear winner. So we negotiated with Logme in and got a sweet little deal with them, not price-wise, but at least a commitment to support us, which was great. It provides encrypted communications of VPN mesh with Neo ProJ, at least in the early versions. The newer versions are a little bit different. But at least in the earlier versions, if you had three nodes, you had A to B, A to C, you had B to C, B to A. And back forth you had, I mean, triangles easy, right? But once you get the square, you start to get across, and you go to five and six and nine nodes. That's a lot of connections. Too much to actively manage with peer-to-peer. So we went with mesh. It also enabled us to do some other things, which I'll talk about on the last slide. So encrypted connections, the way it works, it's an adapter that sits on top of your IP address and then gives you effectively a static IP. It's a five-dot. It's actually, it's routable, but it's unused. And in doing this, we could lose an instance. This has nothing to do with Neo ProJ. This is an Amazon thing, right? We could lose an instance entirely, have that instance come back up, and assume that IP again, and then we're back up and running. And Neo didn't even notice, other than the fact that the system went down and back up, it had no clue that it was a new box. So it has worked really well from that perspective. Again, the CLI tools empowered some functionality. We're going to cover up the last slide. A tip for anyone who actually does use Hamachi today or considers it as a result of this presentation, which would flatter the tears. Don't let it do plug-and-play port decisions. Force it to use a static port. And if anybody has questions about how to do that, I'm happy to tell you. The floating ports, at least in Amazon, and at least in a public cloud environment, the floating ports just don't work. I mean, you've got to open your security groups to basically anything UDP from any host, and you never know what's going on. With a static port, you can at least limit the port and the protocol, and then you can start limiting down to actually like egress IPs when you send traffic out. EIPs, by the way, didn't work very well for us, so we backed out of those. We definitely do stick to UDP forced ports. And the reason we do this, by the way, the communication between all these nodes within the Neo cluster, it's unencrypted. Now, they've added, at our request, they've added the ability to do encryption. So we can do an HTTPS layer. We've got authentication was added for us, and we're all in love with that. It's worked wonderfully for us, and we'll talk about why. But there's other traffic. Zookeeper's not encrypted. There's the actual serve node-to-node communication that's not encrypted. It's binary, which is great, but people can't read it as easily, but I'm a hazarding guess. Most people in this room can do something with binary data and figure it out. So it encrypts everything. It's totally transparent to the system itself. We don't have to tell it, by the way, you use this, by the way, you use that. We define the IPs. The IPs are actually the hommachi IPs. And after that, we walk away, and everything takes care of itself. All encrypted. All right? Start the show. Be it for Jay. And Zookeeper, the Apache project for coordinating. So Zookeeper, there's a concept of Quorum. I'm just going to touch on this really quickly for those of you who haven't used it. It requires an odd number of systems to be running Zookeeper. So if you have an eight-node cluster, kick one of them out of the Zookeeper pool, run Zookeeper on other servers, or run two copies of Zookeeper on Serotonous. But make sure that it does stay odd. The reason is, if you end up in a situation where you have an even number, it will refuse to make a decision. And it may be not, sorry, an even number, a less than half even number. It will refuse flat-out to make a decision. Yes, in that situation, you end up with dual head in the early days. That's also been solved through this process of us butchering these services and just making them fail to the point that Neo could say, wow, that actually isn't possible. We just hadn't seen it yet. And so they moved forward and fixed all that stuff. Zookeeper, again, in the beginning, very chatty, a lot of data going back and forth. Massive log files forced us onto ELB mounts just to be able to store the amount of log data and then newer versions as they get faster and faster and faster and getting more and more streamlined. And it seemed to me, they won't tell us for sure, but we think Zookeeper's getting a lot quieter. We did have issues with Zookeeper connections in the beginning. And we don't know if this was Hamati or what, but in the beginning we couldn't get Asia to connect to Europe or Europe to connect to the US on occasion, but mostly Asia was just troublesome. So we launched nodes for Zookeeper into these other regions. And once the cluster was up and everybody was communicating and happy, we shut the nodes down. And our performance went up to X. We don't have to do that anymore. Again, Zookeeper is much, much faster and more recent policing. The coordinator need not be global. This is a key for us. As I just mentioned to you, we shut Zookeeper down in Asia and Europe once it was up and running. We didn't need it. In fact, it's way happier if it's just sitting in one place. As you can probably guess, we put it in the US. It's our master region. We lose the US, we're in trouble. However, our command control software can actually launch Zookeeper in Europe, reconfigure all the nodes on the fly, restart the service, and we're back up and running. But do definitely keep in mind if you do something global, if you approach some project quite like this, keep Zookeeper as small as possible, but make sure that it's still redundant with at least three nodes. So connection issues, performance issues, ah, performance. Again, they've doubled their speed at least twice in the time that we've worked with them, which for me has been about a year. And just an amusing story. If you keep track of your testing and how you're doing it properly, you can run into some really interesting situations. So we were doing our initial testing. I think there's one for way back. We were doing some initial testing. We were doing our performance. We were just dumping records in this 32 threads simultaneously slamming the masterful of records. Then, wow, how fast can we go? Well, we found out. Actually, I'll talk about this later, but we found out pretty quickly that a single CPU seems to be somewhere around 4x, the four times the threads that you have CPUs. So if you've got dual CPUs, you're going to have best performance around eight threads simultaneously, you know, pumping data into it. So as you can guess with the four, eight, you know, 16 processors, however many Amazon can give you these days, you can get some pretty ridiculous performance. And so our sweet spot is taking a number of CPUs, multiply times four to eight, that's the sweet spot from performance perspective. We hit these huge numbers and everything was great. And then two days later, somebody else would do what we were referring to as our sync test. The sync test was, okay, I'm going to write a record to you as how long does it take to appear in Europe and how long does it take to appear in Asia? And these tests went great well. I mean, everything was within five seconds. That's eventually consistent. We were happy with that. We were satisfied with that. And it was actually what we hard-coded our numbers for. So we actually, we've set the sync frequency. Every five seconds, go check. We based on these tests about how long was it taking to happen naturally. A week later, somebody was doing a performance test, started a thing off, went to lunch. Somebody else came back, started a sync test. Where are my records? Where are my records? And I checked an hour later and the records still hadn't appeared. And so they were basically saying, oh, this build of me is broken and we've got to take care of this. No, it wasn't broken at all. It just turned out that at the point in time, the sync wasn't quite there yet for global. Again, we were pushing Neo to go global. They hadn't done it yet. And well, it's just because Asia was only getting about two records per second. Two. And we just dumped a quarter of a million records into it. So the person who was doing this test to find out what sync was doing quite disappointed 28 hours later when their record finally showed up. But again, evolution. These guys moved things forward. They would just keep, I don't know what, they keep saying the Swedes. And that's their answer to everything. But they fixed it. Boy, did they ever fix it. So now we can dump a quarter of a million records and find those quarter of a million records within five seconds or so. I mean, it's screaming fast now. No way to encrypt all traffic. That sounds redundant. It's just, again, it's not built into Neo. This is a layer they're expecting to be behind some sort of firewall. So when we're sending it across the globe over public, routable addresses, we've encrypted it. But in a data center, you know, you can find it. How am I doing on time? I'm doing all right, good. Riverbed Stingray. So again, some of you may know this as Zeus. It was formerly Zeus. It was a software product. And it was purchased by Riverbed. They renamed it Stingray. We've done some interesting things with it. At the time we were doing our research, ELBs were great, but they were missing one kind of important thing. Does anyone know what ELBs can't do that they should do? You can't put security groups on them. So here we were a back-end service, not exposed to the public. And our front-end service was exposed to the public. And then somebody pointed out that no, the public can actually talk directly to Neo. They don't have to go through the front-end. Why is that? Oh, look at that. Security groups on the instances did not apply to ELBs. Things may have changed. I haven't touched things in a few months. And I know that they were working on it. They were very well aware of it. They were looking at ideas for how to solve it. And I'm not sure what they've done with it recently, so I'll have to take another look. But at the time, no, you could not do it. You could not secure traffic. And therefore we had a little bit of an issue. At the time also they did not yet have the authentication layer that you could apply to Neo4j where when somebody requested, give me an entire snapshot of the database, you'd authenticate them. So that does exist now. It's a great thing. We love it. And on top of that, we put all of our services behind an Amazon, a cloud-based load balancer that's not the ELB. The biggest benefit on this is that you apply the security groups that you want. You can do your port protection. You can do your source protection. All of that on the instance that's running the Stingray software. When we first started, this was an Amazon subscription service. I don't know if any of you have used that. Subscription services were something where you actually select an AMI, say, I want to use this, and then you pay Riverbed through Amazon on a monthly basis for how many you're running, how many hours you run, et cetera, which is very nice. It was a really convenient, really cool way to get things up and running. But in the end, once we went global, we had 9, 12. We had a lot. We had a lot of these services running. So we ended up making a deal with Riverbed to actually get installable versions as opposed to monthly costs which worked out really well. So the security groups were one thing. They also, because they run on EIPs as an HA service, if load balancer A goes down, load balancer B picks up, they use a floating EIP to do this. So some of you may have used things like HA Linux or other solutions to actually move an actual life from one server to another, which you can do at Leno and other places, but you can't do that at Amazon. You get one IP address. That's it. That's all you can use. And so what we did to solve that by doing the EIP solution with these guys, they were the only ones that actually could move an EIP back and forth, which was great. It gave us a full HA service that had security groups because the security groups were applied to the instances. Told the NEO service, you're only allowed to talk to these boxes, which was the load balancers, and we were up and running. Worked beautifully. Because it's an EIP, though, you have one address, so you're forced into port overloading, which means instead of an address for reads and an address for writes, you have one address for everything, and you use just a different word. Works out just fine. You just have to make sure your software can support it. We were right in the software, so it could support it. We also introduced another layer, not reads, not writes, but redirects. And I'm going to talk about that in three or four slides. No. The next slide. Another thing that they could do that was really interesting is determine based on your HTTP method, you're doing a post, you're doing a get, so are you doing a read, are you doing a write? What are you doing? And based on that, we could actually just send the traffic where we wanted. At that point, we didn't need multiple ports. However, we started using batch mode. Batch mode, even if it's a read, runs as a post. It runs as a write. And so, therefore, we had to then split the things up to make sure that we talked to the right servers in the cluster. One of the things that we've done to facilitate that is we wrote a health check jar. This is probably, it's probably something that you want to investigate if you're using Neo4j behind a load balancer. The reason being, you can access the state, MiMaster, MiSlave. You can access that directly through REST calls, talking to JMX, and then parse that out, find out what you're looking for. And for the most part, it's pretty fast. What we found out is, if you're hammering a server, it's not fast at all. You start getting pretty slow responses on that. But then again, you're sending report data that much, and all you need is one line. So in the end, we wrote a jar to actually talk to Neo directly through Java, get that one piece of information that we needed, and run with that. And ever since then, it's 10 millisecond response time. Every single time we hit it, no matter what the load is, it works great. So we basically got it answering when you ask Slave or Master, it answers 200 or 404. Either yes, okay, or not found. Either way, it turns the load balancer off. It won't put. So we've got two pools pointing at three nodes, and this one's only going to talk to the Slave, the Read, and the Master, or the Right will only point to the Master, and that may switch back and forth based on the health check. One of the most recent things that we got from the Neo team was the ability to do live upgrades. And updates were one thing, where we changed configurations and we could just restart a server and move on. Upgrades were the big one. They were the big one for us. Database format changes with 1.4.1.5, 1.5.2, 1.6, I think, as well. These causes to go down. And yeah, we've got a queuing layer, but that queuing layer is only going to run for a little while before it basically says this failed, and that's not a good thing. So these guys work really hard to get it for us. 1.8 has the ability to do a live upgrade. We've actually tested this. So you've got a running cluster. You've got a Master here. You upgrade all the slaves to the newest version, not just a small update, but an actual version with database changes. And then finally, you move the Master off somewhere else and then upgrade that last guy. And you move it never down 24-7 uptime. That's what we were looking for. These guys delivered it. However, a loan isn't enough. You had to do something with a load balancer as well. So what we did was, with this help check jar that we wrote, that otherwise, just querying the status, are you master or are you slave? We had the ability to trick it. Pretty basic, right? We just created text files that says you are a master. And then this thing, no matter what's actual stage, will respond as a master. So doing that, we were able to trick the slaves into saying that they were masters and then the master to say it was a slave, and then we could take it out of the pool, upgrade it. Bring it back in, we were never down. And it wore us to the street. It was absolutely brilliant. I never thought I'd say that. Here we go. This is it. This is what a global Neo4j cluster looks like as implemented by Adobe. From my laser pointer. So let's say we're right in the US. Now, we've got a layer up above. That's our application layer. Maybe some queuing, et cetera. Again, not too specific, but you can imagine what's probably up above. And then down below, we've got Hamachi. That's how all these nodes are communicating with each other directly. But as far as application traffic is concerned, if we've got a front end app here that says, hey, I need to write a record. It's going to write into this load balancer here. That's the US load balancer. I wasn't specifically going to have to say that was a load balancer. I guess I've been doing this too long to need to clarify those things. So this is the US load balancer. This is the right service. And so that's a distinct port from lead. So there is a master here. So the right pool is green. So when it receives the right, it sends the traffic in, goes to the master, and then the master distributes it to all the slaves. Same thing with reads. And this is the same for any of the clusters here. So the solid line means active member of pool. Dot line means not. But capable of becoming. Same thing here. Asia gets a read. It can go to any of these nodes, receive a record. So what happens when we write to Europe? Well, we could have told the applications that are running up here. It could be in clients. It could be another interpreter level for us. We could have told them, like, well, here's an address, and here's an address, and here's an address. But that really wasn't the right solution. The right solution was to make this as transparent for every layer as possible. None of these layers know about each other. They all just assume everything beneath and above them is working properly. So if a request comes in from here, it goes to Europe. It goes to the right pool. Well, we've got dotted line, dotted line, dotted line. The right pool is out of business. But through this read write report, that I mentioned to you two slides ago, we've actually set up the U.S. endpoint as a node in the right pool. And so it's actually encrypting all this traffic, because this would be going across data center, or sorry, across Amazon regions. This is actually a member of the right pool. And so when this traffic comes in, it's actually sent over to the U.S. transparently. The application above Neo4j and above our system has no clue that that just happened, which means that this could all go entirely down. Not that an entire region of Amazon would go down. And either one of these guys, which we did zero, we could make this master in for Asia. Same thing here. Write comes in. Currently it would go to the U.S. If U.S. was down, it would follow that dotted line over here, find the master, go through the right pool, and be done. This has worked wonderfully. As you can probably guess, it's kind of spendy. We've got a lot of layers going on, a lot of instances running, and hey, did that skip for me? No, I didn't. We've got a lot of instances running, a lot of software running on top of it, a lot of Neo licenses. Although I don't think we can charge for staging, do we? And so we don't do that in all of our environments. However, we do have one environment up and running and ready for any developer to test this kind of stuff to make sure that we are in these days transparent. Yes, sir. So you mentioned that the application, you want to keep it transparent and not know about the GI. Correct. So how do they discover their endpoint when they're talking about it? Well, that's actually what that is configured, right? So during deployment, any app in the U.S. region is told that this is its endpoint here in the U.S. So if that endpoint were down, if that region were down, then that front-end app wouldn't be receiving traffic either. Oh, they're dependent. Well, no, no, we're assuming that if the region went down, that the apps hosted in that same region would also be down. So we would assume, now this is it, right? You never know for sure. We would assume that that app would be down. So something could go sideways where maybe these are, by the way, these are all distributed across zones, right? None of these occupies the same zone. Every single one is in a separate zone in each region. And actually, not in Asia. I think there were only two zones in Asia when we deployed. But anyway, yeah, we would assume that the front-end app servers would be down as well. So that they wouldn't be trying to communicate with the back-end, although it's possible that they could. But they just can't see the rest. And if that's the case, that back-end Neo layers have failed as well. And we're going to see the stats that we've lost to Master. We don't know what's going on. Anything else on this? This is kind of a pretty brutal drawing. Yeah, we're good, okay. So from command and control perspective, this is something that I'm very actively trying to get Adobe to either open source or to bless Neo to give to all of you guys. It's my pride and joy, it's my little baby. And we've named it Keanu. I'll probably guess the reference there. So Keanu is our command and control software. And it started off as just a bunch of tools, just a handful of tools that were all shell scripts. You've all done this, I'm sure, or at least most of you, where you've got the set of tools and it grows and it grows and it grows and it grows. And the next thing you know, you're giving copies to guys under NDAs so they can see how your stuff's working and you build on it, you improve it, you put APIs behind it, you start automating it. And the next thing you know, you've got this disparate collection of stuff that's being depended upon by a lot of people. So we moved it forward. We've taken this set of tools as capable of doing deployments, doing DevOps, managing the systems, doing logging, doing monitoring, doing every single thing that you can think of from a DevOps perspective with the exception of actually being the guy to pick up the phone and call somebody to say, hey, serve it down. It comes close though. Everything's built into this system that we're calling Keanu. It was only possible through what Neo tells us when we ask them and through some of the stuff that we do with Homage. I'm trying not to oversell them, but really they did kind of make this work for us from the automation, from the global deployment and management perspective, certainly the features themselves Neo put in and we would not be up and running today without those, but the ability to run this stuff from an automated standpoint with the command line tools, with a GUI, with API, run it as a service. I'm probably not supposed to mention this here, but our command control service is a lamp. We are using it. So if anyone could help us with that, we'd like to get off. But it does everything for us. It does deploy. It does upgrades where we need to, you know, get from 1.5 to 1.8. It does updates where we just change the jar file, like to help check jar or something like that. It'll do that for us. If we make a configuration change, if we go from six nodes, maybe we're only to, you know, two servers in two regions and two servers in one, and we decide to make that three or four, you know, we can make these changes centrally in one location and it will intelligently and dynamically figure this stuff out for us, put it together, build the configuration files that Neo4j needs to run the way that it's capable of running, and then send those down the pipeline and restart the services and the systems up and running with little to no downtime. And again, with upgrades, if we're doing configuration changes, we might as well have some downtime, but with upgrades, we're not going down at all for anything. Back in, as you can probably guess, Nagios says log, cacti, that kind of stuff. And that's pretty much it. I went way faster than I practiced this thing. So, yeah, as far as Neo4j, they've been phenomenally helpful in terms of the amount of work that they've done to get us into a global position. They've been very receptive to everything that we've done in terms of putting into the cloud. And the challenges, the very unique challenges that entail, actually, as much time as I spent in the data center, I was stunned at the amount of differences you face when you're deploying something like this in a cloud. And they really stepped up. They made it happen. Adding the layers above and below also very much required for what we're doing. We had to have that load balancing layer and we had to have that backend VPN layer. Again, somebody using MPLS or something similar could very easily implement that hamachi layer that we used, however, as a public cloud client, we didn't have that option. And yes, there are other tools, again, not available to us, made it work. But there are other ways that that can be done. Either of these can be swapped out. The load balancing layer, the backend VPN. And that's pretty much it. So I am sorry for going so fast if I spoke too fast, if I missed material, but by all means, I'm happy to answer any questions anybody might have. That would be no questions. All right, well, I'm going to go hang out with those guys at their booths. So if you change your mind or you're just nervous about saying in front of everybody else, come by and talk to me if you have to and share with you anything that I can that doesn't require an MDA.