 All right guys, welcome and have a good news for you. This is the last session. So welcome to my presentation on architecture for scaling Java applications to multiple servers. In case you're wondering, I'm originally from Russia so that's where the accent from. My name is Slava, I'm a founder and a software engineer at Cachonic Systems. So back when I was a boy, me and my family lived in a city which is called Baikonur, which is a city that's a home to engineers that serve Russia's main space launch pad. And the entrance to the city, we have this structure, monument, and it's supposed to be an astronaut welcoming visitors with wide open hands. We are welcoming them to the city. We call this guy the fisherman because it shows how big of a fish he caught. So to start, let's look how the normal, our typical application looks like on a single server. We have users hitting the application through the network. We have the application, we have a, maybe we have a local front cache to cache the data produced by business logic. We have business tier that uses hash maps and concurrence logs. We have a data tier with ORM framework and usually a local level two cache. And this all goes to a single source of truth such as database, Hadoop clusters, file system, but not. But sooner or later what happens is that you will exhaust the sources of your single server, right? And you don't have any choice but you have to scale, you have to scale to multiple servers or it's also known as scaling horizontally. You can scale inside your local data center using your local area network or you can scale in Amazon cloud, it doesn't really matter. But it's gonna happen, resources are limited. That's basically give you an idea. You go from the standard architecture into the clustered one. And when you do this, it's actually not that easy because distributed applications have to deal with a lot of stuff that you don't see in the applications that sit inside a local JVM or inside a single JVM, right? And things you have to take care of is horizontal scalability, reliability, concurrency, stage sharing, data consistency, load balancing, failure management. And you also want to make sure that it is as easy to develop as if you were developing inside a single JVM, right? You don't usually deal with this stuff inside when you're developing a single server application. To define horizontal scalability essentially is the ability to handle additional load by adding more servers. It's different from vertical scalability because vertical scalability is when you try to handle more load by adding more resources into a single server such as faster CPU, more memory, faster network cards, using SSDs versus hard drives, right? And horizontal scalability gives you much better benefit when you add more servers because in order to scale, you just keep adding servers. And if you take a single server, you can go that far. Well, right now, I mean, if you look at the hardware available, you can have a 32 cores, 128 gig, 128 gig RAM server, that's it, right? What if you need 500 CPU? What if you need a terabyte of heap? And that's where horizontal scalability helps. But it's not that easy because usually horizontal scalability is hard to achieve because it hits bottlenecks and it's pretty much happens all the time. And usually those bottlenecks share the sources that require sequential access. And this usually includes databases, again Hadoop clusters, file systems, mainframes, external web services, right? And it's not that databases are bad because databases are important now and they're gonna stay important because they provide you single source of truth. And they make sure that all your requests are transactional and properly stored and available for your, for later retrieval. But the result of it is that they process requests consequently and that becomes a problem when you begin to add application servers. And databases are notorious for being hard to scale horizontally, at least within the ACID requirements. To give you an example of a bottleneck free system, you have an application server that is capable of processing 5,000 requests per second. You have database that can process 10,000 requests per second and users basically pulling at application server speed, right? There's no any problem here, right? So it goes straight through. This is a different example. So business grows and now you have to process 15,000 requests per second. The nature of decision is to add more application servers, right? And you add a couple more. You expect that you're gonna be able to process 15,000, but it's not gonna happen because even though each of the application servers can give you 5,000, right? In summary, 15,000. The databases still can give you only 10,000, right? The demand is 15,000, capacity is 10,000. So you expect it to triple your capacity, but you only double it, right? You have a bottleneck. And the solution to this problem is using what's known as distributed caching. And distributed caching essentially is a large in-memory data store that stores all the frequently read information. And in case of distributed caching, you can have caches which size, those sizes exceed your typical JVM 10s times, 100 times. You can have a distributed caching which has a size of a terabyte if you want. And right now the state of art is that you can cache everything. Any possible combination, just if you just have enough servers in the cluster, you can cache it all. It's not that you have to do it, but. Yeah, and application benefits because instead of going to the database, it now goes to the in-memory cache and just reads from memory instead of being stuck in the database. So back when I was living in that city, this is how my backyard looked like. So it's kind of hard to see, but this is my window, right? And this is what I had in my backyard. It's called a technological model. Essentially it's a full blown rocket. And it's rocket in all parts except one. It cannot fly. And those models are used to test ground systems. So they test if you can drive it, if you can put it into the starting position and everything. So that's what I had. This is the view from my window. That was the view. And distributed caching, distributed caches compared to your simple local GVM caches have distinct requirements. One is data consistency. And it means that when someone puts something into the cache, all servers in the cluster should observe it, right? There's this notion of eventual consistency, which is another word for inconsistent. If you put something in the cache and you don't see it for two hours, it's not consistency, it's inconsistency. Load balancing. In case of load balancing, you wanna make sure that once new servers join the cluster, the data is automatically moved to that server so that each of the servers carries fair share of load. And high availability. Servers, networks break, servers die. And your system has to continue to operate. And you don't wanna see the situations when a server joined for extended period of time or for any period of time, application sitting on that server doesn't see the cache data from the other servers. So that's important. The capabilities that your typical distributed caching system have to have noted for a system to work properly is cache coherency, partitioning, and replication. The next problem your distributed applications have to deal with is reliability. And reliability is an ability of the system to continue to operate in presence of failures. Even though servers fail or join the cluster, your system has to continue to operate without any hiccups, delays, or any other problems. And it's really hard because of the cluster reconfiguration. The solution to reliability in case of distributed applications is replicate web sessions. And it works this way. If you have web session, automatically replicate all servers, all it's replicated in the cluster so that it's available to any server that wants to access it. And even if the cluster node dies, load balancer will automatically start feeding the data through working nodes, right? Those working nodes will have access to the user's data. So you will see that even the server is dead, user won't notice it, right? Because the session data is automatically synchronously and reliably stored in the cluster. Another problem that is kind of really hard to deal with is distributed concurrency. In your local environment, all you have to do to synchronize on shared resources is use either synchronization, which is built into the language, using synchronized keyword, right? Or you can use a concurrent package, which provides you read, write logs. That's, I mean, it's really easy. This is as easy as it gets. So you have your re-enter and read write log from concurrent package, right? And this is an example of using those logs to access a shared map. We have our map for right operations. We have write logs and for read operations, we use read logs. It's fairly easy. But it's a problem because in the distributed environment you cannot use your shared memory to establish those mutual exclusions because there's no shared memory, right? Your servers are separated by the unreliable network. Again, there's a problem. Imagine that one server or one JVM holds a log, then it dies, dies instantaneously. And you can kill something instantaneously just by pulling the network plug out, right? Or just pressing the reset button. What happens to the other clusters, other members of the cluster? Are they gonna ever get their log? The solution to this is distributed read write logs. And for this to work, there are several important capabilities that are distributed read write logs should provide. You have to have full tolerance. They should be reliable and they should be strictly consistent. Full tolerance in our case means that even the server dies, in the dies when holding a log, the system should automatically release those logs and other servers in the cluster should be able to acquire logs and continue doing their business. Without it, the system is gonna block and you have to restart the whole cluster, right? Again, even if the configuration changes to any local JVM, your distributed logs should look like your local logs. The system should not know that it's dealing with the distributed application, right? Because pretty much this is the goal of the distributed architecture. To be able to code, to write your software without caring that there is a network. Network is unreliable, slow, can break, servers can die and everything. But there are certain, there's basically a price for this and we'll talk about it later. Speaking of consistency, when you talk about different layers, for example, the data consistency means slightly different things when you talk about distributed concurrency. In case of the distributed concurrency, the consistency means that even if the cluster is reconfiguring and, for example, a new node joins, right? This node should be aware that there are already logs and when it tries to acquire those logs locally, it should wait until those logs release in the cluster. Another problem is distributed shared state. For threads to do useful work, they have to access shared resources at some point. It can be named data, it can be queues, anything. Threads have to have access to shared resources, right? And in case of local JVMs, it's really trivial. Because, again, we saw this example before, but in our case, if you wanna have a, for example, you have a map that contains user names, for example, or passwords, right? That are stored and accessible to all threads in the system, all you have to do is to, you put your keys and values to the map and you are done. Well, of course, using proper synchronization or locking, but it takes roughly 40 lines of code. It's too old. Again, it's not that easy when you are inside the cluster because your map is not on the network, right? It is local and how do you do it so that you can access that map over the network? The solution is a distributed hash map. The distributed hash map must have these capabilities. It should be reliable and strictly consistent. Reliability means that after you put data in this map, even if other nodes die or new nodes join, the data should be available to all members of the cluster and all members of the cluster should see this data consistently and coherently. And this is true for updates. So basically, when I lived in that city, we had our fair share of night launches and this is how our typical night launch looked like, which is basically a dark sky. And that's because the launch pad was about 40 miles away from the city. You cannot really see anything during the night, right? But once in a while, you would get something like that. And that's a shot from a balcony. It's hard to see, but this is like a five-story building. This is a horizon. And what you see here is maybe 30 seconds into launch. This is where beyond the horizon it started and then it flew up and away. And this structure is basically the trails of the first stage, the trails of the engines that are working in the first stage. And they're lit up by the sun from under the horizon. And this is where the rocket's flying. Failure management. It's impossible to develop a distributed application that is not exposed to failures because as we mentioned, the concerns that applications have to deal with are not present in local applications. Networks fails, networks fail, servers fail, latency changes, topology changes. And all, sooner or later, leads in the situations while the requirements that are presented by a software towards APIs that support a distributed development are gonna be violated. It means that strictly consistent operations are gonna fail. And there are a few cases when it's gonna happen. One of the cases is, imagine you have a cluster and it's made of 15 nodes, right? And then some switch fails and then the cluster breaks into one consisting of 40 nodes and one consisting of 10 nodes. You have a situation what's called minority majority cluster, right? Because one of the clusters is bigger than the other one, but they still are accessible to the user. If they continue to operate like nothing happened, there's a situation when they will be dealing with inconsistent data views. For example, one user updates cache inside a small cluster and the other user updates a cache inside majority cluster. What happens, right? They will be observing different data. This is unacceptable for mission critical applications. And this is happening. The only real solution to this problem which doesn't violate consistency requirements is to block the minority cluster until it becomes majority cluster or until it joins the majority cluster because the network infrastructure got repaired. But if you are a user, what do you appreciate if you click on the button, go to Amazon for example and wait for two minutes, right? Before getting a response, it's not good. Another problem is that when the server inside the minority cluster and some operational minority cluster but then the system repairs and it has to leave the minority cluster during the majority cluster to provide the consistent results, right? Any consistent operations that was in progress, for example, placed distributed logs, working or waiting for responses when you execute, put or get inside a distributed hash map or if the systems try to read or write from the replicated session storage, what should happen? Well, those operations must be canceled. And in Java, if you use standard concurrent package API, if you place a log and if you say unlock, right? Nothing ever is gonna happen because you own the log, you unlock it, then it's available to everyone else. But in this situation, those operations may and have to throw exceptions because whatever you were doing before that is no longer valid. And your assumptions that you released the log or if you acquired the log, right? And whatever you were doing was valid is no longer valid because the node is no longer belongs to the cluster. So to manage those failures, the application must be able to receive a report about cluster state. And the normal, I think it's a sort of a pattern. If the cluster blocks because it has become a minority cluster, your application must receive a synchronous call saying, hey, we are no longer operational. That gives you a chance to display to the user some sort of a graceful notice saying that our system is undergoing reconfiguration. We appreciate you waiting. Please come back in 30 seconds or two minutes. Because these days, I think, if you click on some on the link and it doesn't respond within 10 seconds, you can see that the system broke. It used to be, I think, 30 seconds. Now, maybe it's not, it's even five. Especially if you use some big systems. Same true for consistent operations. If reconfiguration does happen, the system must break and cancel operations between progress or the system was blocked on. When you write this distributed application that uses transparent concurrency primitives or transparent but distributed hash maps, you no longer can expect that if you call put or get into a map, you will get the response back because it might break. Usually, in normal situation, it won't happen, but it will happen eventually and the system must be prepared. This is basically, in a normal, well-developed system, this is the only price you're gonna pay. You don't have to code complex queuing mechanisms, cluster management, and anything. You just make a call, obtain a reference to a hash map and just use it or get an instance of a lock and use it. But it might break. For this to work, the system must have cluster management and data distribution protocol. And usually, it's a wire-level protocol that enables all those nice things they've talked about. Session replication, distributed caching, reliable, distributed locks, state sharing, cluster management. So we started with a small picture of your typical local JVM architecture and this is how your distributed architecture looks like, we have a web application. This load balancer, it must be present. How you do it depends. It can be hardware load balancers such as Cisco or a big, big five. It can be Apache has built in modules for load balancing. But anyway, we have our application. We have our replicated sessions. We have our distributed front cache. We have our business logic talking to distributed hash map and using distributed locks for state sharing and the distributed concurrency. We have our distributed level two cache and we have cluster management and data distribution protocol. And this is basically, as we mentioned, it's a wire-level network protocol. And I gave an example of two servers working, but it can be expanded to any number of servers. Well, to develop such, architecture, to implement it, takes, it will take anyone, even the smart guys, noticeable amount of time. And the good news is that this effort has already been made as it take three years to work out the bugs. So it's an effort. Those tools usually go under a common name which is known as Enterprise Data Grid. The Enterprise Data Grid provides cluster management and data distribution protocol that works over a reliable network, plus a set of transparent APIs that support reliable distributed caching, session replication, strictly consistent data access, distributed locks, and the cluster management. What you have to expect from such API is that it's easier to use. It does not require you to learn any networking programming or anything else. It just, usually you would expect that these are transparent implementations of primitives found in Java, implementations of locks, maps, session replication should be invisible. There are several implementations of such Enterprise Data Grids. There are commercial ones which are pretty high quality. There are also a couple of open source ones and those often don't provide all the capabilities you need and user sacrifice consistency. It will offer you so-called eventual consistency. So back in those days, when I was in 10th grade, we were taking into a field trip to the launch pad, one of the launch pads. They're actually like solid or something but we were taking one that launches satellites. And in the early morning, we were basically woken up by sounds of sirens and we were running to the desert, about 700 yards away from it. And we sat on a small hill and that's how the launch looked like from that distance. It was pretty cool actually. And I would say that it's one of the most profound experiences I had. Because I think the only more profound experience I ever had was a Pink Floyd cancer. And even from that distance, when we were standing on that hill, when you see launches on TV, they give you maybe 5% of what actually is going on. Because on that distance, your whole body vibrates on very low frequencies. Plus it's basically a full spectrum from really low frequencies that you cannot even feel but feel on the vibration to really high pitch noises. And it's just definitely loud and it's very intense. So in about 30 seconds, the rocket looks like this. Well, that's it about architecture, the good news, it's possible. It's really easy to do if you choose right tools. And next time you think about it, go for it. Now we are going to talk about best practices for about, I hope, for another five minutes. And then I'll be here to answer questions. Our first best practice, don't do it. Don't go distributed. First of all, most of the times when people decide to go distributed, they didn't use existing capacities of a single server applications. You can always, if your servers were sitting in a co-location for two years, you can double or triple its performance just by replacing the server. You can have faster CPUs, faster memory, more memory, faster hard drives, faster networks. And it costs almost nothing. Like you can get a good server for $3,000, $5,000. Really good one. Or if you're on a budget, you can buy a pretty good server for $1,000 of eBay. It's really easy to scale vertically. And distributed development is a different story. First of all, distributed applications slower because they have to use network IO and they have to spend CPU cycles on maintaining clusters, maintaining data consistency. They're gonna be slower. The good news is that as you keep adding servers, your capacity is gonna increase linearly if you use proper tools. Also, distributed systems require configuration and it also takes your time. And there's no such thing as free, basically. If it takes time, it's essentially time is the only thing you cannot buy. If you spend a month doing something that you could do in two days by paying 500 bucks, oh, this is the price of your time. Yes, and especially if you test, for example, if you write a unit test for local application, it's trivial. If you write a unit test for clustered application, you have to start cluster nodes. If you're really serious about it, if you wanna do functional testing, you have to have a lab in your QA environment or maybe even on Amazon Cloud and this lab should be running your cluster, you should be able to deploy same versions of the applications. It becomes pretty cumbersome. So if you can, stay local, stay in a single server. Another thing that you wanna look when you're trying to stay local is to try to optimize. If your application has never been optimized, from my experience as a software engineer, well, it's a blanket statement. I would say that five times improvement in capacity performance dash response time is easy. If it's never was run under a profiler, right? Invest into a good profiler. We use the J profiler and it works really well. You have to have, even if you are doing distributed development, you have to have low tests. Without low tests, if you just click a couple of times and see what's going on, you won't get a good picture. You can develop synthetic low tests, which hit a single point, or you can have there are tools that give you an ability to write life-like tests that hit the application just the same way a normal user would do and hit the critical path. The anti-pattern is really the only one, the big one, and we call it catch them all. It starts like this, hmm, I looked in my profiler and it looks like most time is spent in allocating memory and string buffers. Why don't I cache string buffers? Well, here's the thing, caching can be expensive. It makes sense when the cost of caching is less than the cost of acquiring the object. Memory allocation in Java is almost free. If you compare costs of maintaining caches, and especially in the distributed environment because in the distributed environment, your cache can go to network and will go to network, to maintain coherency, to maintain load balancing, to partition the data and everything. So it can be expensive. So cache only objects that are hard to get. Those are really coming from the databases, coming from external data sources such as mainframes or external web services or if you have a Hadoop cluster, Hadoop jobs can take from tens of seconds to minutes and even hours. Now imagine that you issue the same request to Hadoop twice, right? And you crunch through five terabytes of data, spend 30 minutes in the cluster, got the result back. Do you wanna do the same thing again next time? The results usually are much smaller than the dataset you are processing. So it makes sense to cache only the data that is hard to get. One thing I didn't mention that, especially when it comes to front caching as opposed to level two, which is maintained by the framework, if when you begin to cache, you have to keep in mind that when there are updates to the data, you have to invalidate those caches appropriately. So the design becomes more complex. So if you cache, cache write things and possibly develop a thin layer that is responsible for doing it, there's also not all products heavily, but some products provide the disability to have a cache read the data. Not you read the data and then you put it to the cache, but rather you ask cache for the data, cache checks if the data is not present, then get it from the data source. You know that, hey, if I wanna list of my employees, I just go to the cache and I'll get it. Objects that are write mostly, they don't make sense for caching because you spend time caching, but before you read it, it's gonna be updated a thousand times, so you just spend time caching not getting any benefits from the cache. Classical example, I think, these days, big data is big, right? You get, for example, lots of data that is coming from instruments, for example, right? For example, energy readings. Do you wanna cache them? No, because they're gonna be stored, you're gonna be receiving them once every second, they're gonna be stored, retrieved maybe once every year and that's it. And as I mentioned, never cache memory locations. In the profiler, especially when you're not running in the mode, when you're running in the instrumented mode, instrumentation can take a lot of time and your memory locations may look like, hey, this is the most expensive part of the system that I create object. This is not true. Creating objects is almost free in Java. Doesn't make that you have to sparenly create and drop them, but it doesn't make sense. My experience comes from back in, I think, 2003, we were using XSL by IBM, it's called Xelen, I think, right? And they created a string buffer pool where you basically check out the buffer, right? String buffer, clear it, use it, check it in back. But the thing is that in order to check it out, you have to clear it and it essentially means that every time you check it out from the pool, you allocate new memory. So essentially you do the same thing you would do normally if you just created the string buffer, but rather also spend time and synchronization in the string buffer pool. Never cache memory location. So as I mentioned, best candidates for caching results of database queries, results of heavy IO, XML, results of XML transformations and XML access cell transformation. Cache objects that I read most. Another best practice is infrastructure one and usually your infrastructure, if you do use data grids, if you do go distributed, you would have application traffic that comes from the users through firewall and the load balancer, right? But you also are gonna have traffic that serves the data grid, cache coherency traffic, partitioning, log management, cluster management. This traffic has nothing to do with what your users and your application wants, right? It doesn't really have to be there. So the best part is to take that traffic into the back end. Usually most of the service you currently, if I'm talking about service that you are running, most of the service are coming with two network connections. So you can use one connection to serve the application traffic and then you can plug your back end traffic into a separate switch and separate them. And this benefits actually both because they don't compete for the limited network bandwidth. So the last best practice we wanted to share is we wanted to suggest you to use existing solutions. I must say that when we started developing Cache Onyx and we continued, it's really fun because distributed computing and coding this stuff to make sure it works reliable and unreliable network, it's really fun for the engineers. I'm still enjoying it. But it takes, if you wanna do it right, two, three years to get it right. So I think what's your plan for the next three years? Serve your users or develop something fun but it's gonna take quite some time. Well, I think we have arrived to the end of our session. If you have any questions, I'll be happy to answer them. Well, guys, if you need slides from this presentation, you can shoot me an email or if you cannot read it from the presentation, you can stop by, get my business card. I'll be happy to share those slides with you. Thank you.