 Okay, so I hope you all had a great lunch break and welcome to my presentation on silo-based architectures for high availability applications. This is more of an architectural presentation than a really cold samples one, so you might be disappointed of not seeing any PHP code. I can really tell you that we devised this system in a real PHP application, so this is how we discovered it. My name is Georgiana Gligor. I'm coming from Romania. I am developing with PHP for quite a long time. It's been more than 13 years now, and I have a wonderful theory of waiting for me at home. My specialty is crafting enterprise applications, so I tend to go the extra mile and work on the large portions of the apps. Over time, my role has shifted from junior and medium developer to more of an architectural, and this is something that I really, really enjoy doing, so that's why my presentation is also an architectural one. I hope you will get the most out of it, because PHP is a community, and in my hometown in Romania we didn't have a PHP group. I started one, and it's been the most rewarding thing I've been doing over the past year. I learned so many things from the PHP user group, so if you are still not attending them, I strongly encourage you to do that. Eli's keynote from yesterday was also talking about community and how information is shared, and this is one of the most efficient ways of grasping new knowledge. I strongly encourage you to do that. Because my agenda wasn't full enough, last autumn I also decided to start a PhD in systems engineering. So it's going to be like a pretty tough period coming over in the next couple of years for me. So that's already too much about myself, so let's get moving to the actual up. So how many of you are in such a privileged position as to being able to put your website down in the middle of the day without being able to sell anything and still not losing customers? And not that much. So that's also the situation I find myself often into. So I'm mostly working on projects, like I have to do something like this, so it might look like a joke. But for us, it's a normal day-to-day work. And these guys are really awesome. If you go online and search for Arab guys doing this, you will find many more videos on the same topic. So let's see what we are going to be talking today about. I want first to introduce a little bit the need for high availability, because maybe you don't know, you need to address these topics. So we're going to step back a little bit and look at the opportunity for this. And then we'll move over and define high availability, have a look at what it really is and what it's not. And then we'll discuss the approach that we've been using in a very large and complex system for US customers. This is the first time we've been using, but after that we've been using it on other projects as well. And then, of course, list the advantages, there are quite a few. But we also look at the disadvantages that come with this solution. I myself, for learning, I go to user groups, but also read a lot of books. And last year, while on an airport, I discovered this one. It's an excellent book that talks about legacy code. I strongly recommend you read it. David has a very nice way of explaining things. And out of this book, I discovered a very interesting idea. So not only about legacy, but how we think about our applications. David says that software industry is built around anticipating change. And that is something that is happening on current projects. But it's not necessarily the ideal thing to do, because we tend to over engineer our solutions. And maybe we are thinking of problems that don't exist, and our application will never need to handle. And instead of anticipating change at the very beginning, and he suggests that we can reduce cost by looking at our application from an accommodation perspective. So how do we accommodate future needs? How do we build our application in such a way that it will be easy to accommodate those needs when they arise? So have a little think about this. For me, it was kind of a mind-changing, because I didn't realize how much over engineering we were putting in the first place. And it was much easier for us to move away from there. Now, in order to talk about high availability, maybe it's a better idea to start from how a typical application looks like. And over the break, I actually talked to a couple of guys refactoring re-engineering an existing application, and still keeping it really small, because that was the purpose of it. So simple architecture might look like for an MVP or for an existing project. It doesn't really matter, like a clean separation of layers. So you get the front-end layer, you get the business logic separated away from the front-end, and then you get the data layer. And this approach kind of works very well for MVPs, the most valuable products. And it also works when you have to deliver something really fast. So you make it fast, make it work, then make it fast. But this is the cheap approach. And it gets the job done, it gets your paycheck in, but it's not really enough from the moment you actually start thinking about the spread of it. So if all of a sudden, your application becomes popular, which in my case happened quite on a couple of occasions, this approach is really not enough when we get back to the changing wheels attitude that we saw at the beginning of the presentation. So we get in the adjusting mode that we discussed a bit earlier. So the first thing that we do to adjust is to work at the data layer and spread the load from one single data point to maybe a master slave approach. Maybe we are lucky enough and we get very good competent developers in our team. And we do a CQRS approach. And we do fancy stuff. But we work first at the data layer to mitigate that problem. Separating the reads from writes has certain advantages. And it's quite an easy thing to do. But doing it while everything is up and running is already risky. Also, the second approach would be to go and build redundancy at various layers of the application. So maybe in the business logic, you discover that you do a lot of processing and you need to horizontally scale that part and add the redundancy in there. And of course, building a resilient application is the actual goal. So moving towards the application that doesn't fail is quite a tricky thing to do. But adding a load balancer would be like the first thing that comes to mind, right? Now, if you turn the previous diagram upside down, the next thing that you really want to do is to add caching and take the stress out of the data layer. So instead of going to the database and grabbing all sorts of information, maybe just grab them from a cache. And this is why I depicted the cache as being quite fat on the horizontal level. Because you want to grab as much information from the cache as possible. But if that doesn't happen, we will move over to the normal request and response in the application server and then in the data layer. Now, this is only a simple architecture. But over time, the architecture of an application might evolve to something more complex. And this is kind of a typical setup that I've encountered in a lot of projects. So having a front end load balancer would actually allow you to fine tune your front end. So if you need like a couple of machines or five machines on the front end, you would do that. And then the middleware, which is performing the really hard part, you would also have like a dedicated load balancer. So these are steps that we get to move our application to a better state and get in a better shape. And the caching, as mentioned before, must be as close to the user as possible. So if the caching is buried down in the data layer, you will have too many levels to translate to reach it. So it won't be as efficient as you needed. So it won't kind of serve its purpose. How many of you are working on an application that is kind of this complex in a setup? That's about half of the room. Do you think it's getting you closer to being highly available or not really? Thank you. So why have all this complexity and still not being able to move towards our highly available goal? That's kind of an interesting question, I'd say. So let's have a look at what high availability is. In order to get to it, we need to understand what it is. Well, we take the high word out and just define availability first. So an available system is the one that has the ability to retrieve information. That's like the first thing. But you also need to be able to change information in the system and add new data. For example, you're on Facebook, right? And you look at your feed. That's the information retrieval part. But if you're unable to edit a misspell in your last post because something goes wrong, then all of a sudden, it's not fully functional. And you're not relying on the entire application. It's not working with all its features. And if you're not able to send new data to it, that's even worse. So even if it's up and running, you can get information out. Maybe you are unable to perform other tasks. So it's not available. And I want to give you a very interesting example. It happened about a year ago in a surgery room. And there were some doctors who were performing an operation on a patient's heart. So they had the patient opened right. And they were actually working on his heart. And one of the screens where they could see all the patient information, like pulse and everything, went completely black all of a sudden. This is one of those systems that you really need it to be highly available in any situation. And you can imagine how bad it was for those doctors. And nobody from the IT was in the room. They couldn't fix it. So somebody had the brilliant idea to just press the button, the reboot button. And it fixed the problem. They could go on. They didn't need anybody from IT to come in or change the device. But it's kind of interesting what really happened. So what happened? Because it's something that can happen to any of us. So the trick is that computer is quite a simple thing. It's a unit that has only two functions. The first one is the part that gets the information from the patient inside the computer. And the second part is one that does the display of information on the screen. Something quite trivial for us. But the problem with that particular computer was that the antivirus started automatically to work. And it was using entirely all the systems resources. So this is something that we cannot control or predict. We cannot predict, but we can control. So just think a little bit if you have something that might be running in your system. Maybe you don't really want it to affect your system in this way. And this is the classic 9th example. So high availability is defined by numbers. And those numbers are in the 9th range. Imagine you're working on an application that has a promised uptime of 90% a year. That allows you to have 36 days of downtime. Pretty nice. But if you want to move closer to high availability, you get in the area of three nines, which means eight and a half hours per year. It doesn't grow that much, right? But is any of you working on, let's say, a shopping system where people want to spend their money on your product? Thank you. What happens if those eight hours are continuous and they happen on Black Friday when you make half of your income? All of a sudden, it becomes a problem, right? So even if you are doing 99.9% uptime, it's still not good if those hours are continuous. So you're losing a lot of money. And the holy grail of five nines, it just means five minutes of downtime a year. That's very complex to achieve. And if your application has some external dependencies, like maybe you're using Amazon entirely or partly, just read the fine print, the SLA that says, we only promised 99.95%, and that means four hours. So in a shopping scenario, in a very crowded day, that's half a day. It's a lot. So we have to think of all the dependencies that we have in our application, all the systems that we depend on. If you want to move below the gray line and move in the 99.99%, that's going to be quite tricky. And some of you might think quite expensive. I measured the impact for our US customer, which was an airline. And they were an ultra-local airline. So I took the entire amount of money that was coming in the system in one year. And I discovered they make, on average, $40 a second. So for them, for our customer, one hour of downtime actually meant a lot of money. This is about the average salary of a developer in the States. So one hour of downtime means somebody will not get their paycheck, trying to translate it like that. Another thing that I'm always looking at when working on an application is what the application is actually doing, so what is the actual user behavior. And I picked some of the most well-known websites because I wanted to look at some numbers. And in particular, Amazon, the shopping part, a user spends daily more than 12 minutes on Amazon. So before they make a purchase, they kind of browse really a lot on the website to find out the right product. And even looking at almost 12 pages per visit is a lot. What this means and how does it translate to us as developers? Well, the problem is if in those 11 or 12 pages per visit, some of them are failing. The customer will lose trust in our system, and they will not come back. So just by having a really good system that will allow not only one page, you know what the normal measurements are, how many views can you serve, how many requests, whatever. Yes, but those requests are continuous. They have to work for the entire user visit. You cannot just drop them before they actually press the pay button in a shopping environment. And Facebook and YouTube, these are sometimes time trainers, but users spend even more time on those websites. So depending on the industry that you're working on, the Amazon might look like a joke, and you might get even more time on the website and more pages being viewed. Therefore, the user experience needs to be continuous for a longer period of time. Is anybody familiar with the data cap theorem where your distributed system can only have the partition tolerance, availability, and consistency, pick two. You can't have all three of them. For high availability, our triangle actually relates to cost, complexity, and risk. So the more money we throw at it, the more complex it might become. And it will mitigate some of the risks, but it's a very complex task that we have to play. So there are a lot of variables in play. And because we talked about availability, if you look at the other way around, it's all about downtime. How much downtime can I afford? The downtime can be categorized in two ways. The first one is scheduled, something that we do, maintainers, and something unscheduled that maybe it's my fault. Maybe my health check ping will ping a server that will die, and it will cause a chain reaction, and some other system will be taken down. That's something that we can control. But what if somebody else is performing something that is affecting our product? So one example, I was trying to search for some travel dates on Expedia. And they were down, but they were quite interesting. You can see the architecture. They have an API that the website consumes. So if the website was down, they suggested me, OK, maybe you want to use the mobile application because it will work. So at least my experience as a customer wasn't graded because I was able to go to the mobile app and it did work. So this is an example of a scheduled downtime for them. But I will give an example of unscheduled downtime that you cannot control. So if Michael Jackson dies, which kind of happened a few years ago, the news spread very fast. So first of all, the TMZ site spread the news, and then it was immediately down. Everybody was rushing to see if it's true or not. And then the press Hilton, which was also very used by interested people, went down because they wanted to confirm the TMZ information because TMZ was down. And then LA Times wrote an article saying, no, he's not dead. He's in a coma. And everybody went to the LA Times website to see if it was working. And it was completely white, like nothing, not even the maintenance page or anything. And the problem was so bad that it affected Twitter in a way that they just couldn't handle things anymore. So what they could do to survive was to disable the search results on the home page. So we don't let you do searches. We let you use Twitter in any other way you might want to imagine. So this is something outside of your control, out of your boundaries. And it's quite difficult to predict these things. So any of them can apply to your situation. Now, if you want to build a highly available system, we need to keep in mind three characteristics that it will need to have in order to be able to label it as highly available. The first one is to have no single points of failure. And as you know, Murphy's Law is like if you have one single point of failure, it will eventually fail. So this is the first thing that you need to look at holistically and try to address. The second one is quite interesting, and it relates to microservices in a way, which is like a hybrid, a reliable crossover. Just imagine cashiers in the supermarket. It doesn't matter which cashier serves you. It's important that you get your service done right. So this is the actual goal of microservices because you can scale up and down each individual piece of functionality. So you might want to look at those devices as well. Now, detecting failures as they occur a couple of years back, I think about four years, in Amazon, AWS had a very big problem on the East Coast data center. And a lot of websites went down. One of them was Netflix. But Netflix was one of the fortunate customers because they were able to survive because they have a system. It's called Chaos Monkey. And what Chaos Monkey does, have you ever heard of it? It randomly goes and takes down your production machines. So it forces your system engineers to work on real production problems that are random, just like Michael Jackson did. It's completely random. This is something that Chaos Monkey does. And at the very end of the presentation, you will even find the link to GitHub because Chaos Monkey is now an open source. So you can actually go and use it today in your application. So how do you get to those things? We have a list of best practices, right? No single points of failure, stateless application design, meaning this is the reliable crossover part. If you don't carry the state with you, then any of the endpoints will be able to serve you. Infrastructure automation, this is like must have in today's applications. If you're still doing manual things on the servers, then you will get for sure data center unique snowflakes. And those servers are quite a problem because you can't replicate them. Monitoring is the part that you need to know what's inside your system and get meaningful alerts before somebody else tells you from, I don't know, Twitter feed or something. You need to know first and address it. And a very interesting thing is to geographically distribute your machines. So for US customers, this is like a well-known issue. East Coast and West Coast are quite far away. So if you only have like one data center, it will not be close enough to any of them. So it will serve very well half of your customers. In Europe, we don't have the same thing because we are much more crowded, let's say. But this is a lesson that can be learned from websites that need to be globally available. And one thing, it's relating to the monitoring part. Do keep spare capacity. So have your monitoring system tell you that you've reached a certain threshold above your spare capacity before that spare capacity is 100% used. Those things are very nice, and they look very theoretical. I don't think we can apply all of them in this life. We have to know our limitations. So we get to work on real-life applications with real-life constraints. And the silos might be one way to go in that local list and only solve a couple of them. So silos in IT kind of mean a silo mentality where all the departments are working independently. As in, technical is very separated from marketing and very separated from system infrastructure and so on. This is not our definition of a silo. So the silos that we're going to talk about are not about being isolated in terms of a company. And if we go through an example, I want you to think how your system might be upgraded to PHP 7. How difficult will it be? How much time will it take to plan this thing? And how would you effectively do it, like one by one, each machine, or all of them in a burst? Do you take the system down? Do the upgrade and hope for the best? What if it doesn't work? What if you encounter problems? How do you go back to your previous state? So this is kind of a problem that we have. PHP 7 is twice as fast as its previous incarnation, right? So we do want it. It's like a scenario that can be applied today in our system. So what would exactly a silo be and how will it help us? When we say silo, we mean the entire stack of servers that serves the end-to-end functionality for the user. So it's not like the authentication service or the grabbing of offers or the selling part. It's the entire functionality that is available to the user. Front end, back end, and of course the caching part, right? Too professional to be too memory from the previous diagram. So if you look at this particular diagram, you will notice that everything on top of the data layer can be considered a silo. So if you think about it, there is exactly one change that you need to make. And that's putting a traffic controller on top of everything. Because I want part of my traffic initially 100% to go to my current stack. But I want more silos to happen. So it's quite simple. It will look like this. You take your perfect stack and horizontally scale it. So in the upgrade to PHP 7 example, I would just raise a new silo with PHP 7 installed on all machines and start directing traffic to it, like 1%, 2%. And if everything goes OK, I will just kill the first silo and have the second one serve the purpose. It's kind of a risk free with a fallback procedure in place already. And it's not very complicated to do. All you need to do is add a traffic controller. And hopefully, you have automation in your DevOps area. So that part is like you can't escape it. But this might look like it's too simple, right? Let's see what we can actually put it to good use. So one thing that is very interesting, and we actually used it on two occasions, was the fact that we have an independent static assets cache in each silo. So now imagine our caching system was memcache. And it was almost up to its limits. So when we wanted to change memcache with Redis, what we did was to raise another silo and use the new caching system. And it was a very painless experience. Actually, in the cache, when you look at the application as a whole, caching may not only be static stuff, but also the user cache. So once a user is performing some searches and giving you information, you will want to stick that user to a specific silo and deliver the end-to-end functionality for that user from that silo. So when their session is recreated, you might want to move them to a different version of your caching system. That's like one example. Another one is the AB testing part. But AB testing, don't think about the JavaScript library that you just inject in your website and you would turn buttons from green to pink, because that is one part of AB testing. But the really good part is that you can AB test your infrastructure. So if you want to make changes inside your silo, you want to see how it performs with a different middleware. You just write your middleware in Node.js or whatever. You put it in a second silo and check that it works with production data, which is a very different approach from going and testing it on a QA environment where things might look OK, but with a lot of production data, they will fail for some reasons. This is the part where you get to check and validate your architectural decisions in production, which I found quite interesting. From the traffic controller, you just give a certain percent of your user requests to the new silo. Yeah, exactly. And for the marketing team, if you will go home at the office on Monday morning and you go and talk to our marketing department, one thing that you can tell them, I learned something really cool. It's a new trick that will allow you to segment the users in the emails that you send, or however you might want to reach those users. We can segment them, and we can show them completely different things with the existing technology that we have today. So if they want the users of your website to get different promotional items, if in the link you put some information that the traffic controller will know and find out and direct people to the new silo, that will give you a lot of flexibility. You can actually A-B test a lot of things, not just individual pieces of your page. But you can A-B test your system as a whole. And of course, geographical distribution that we were talking a bit earlier about, it's the closer you get to the machines that are serving your request, the faster your user experience is. So if you get a special Europe silo, it will be really fast. And please remember that you have the caching of the static information inside your silo. So it's not like I'm going to grab all the hotels from the database, which sits wherever, because that's static information. The dynamic information is only the availability and pricing. And for that, I can get my trip to the data layer. But all the static information, all the page, the HTML itself, if you want, can sit in the cache and can be very closely aligned to the actual user presence on the globe. And of course, the best one, this is my favorite. I've always loved it. And if I want to upgrade my system, I release a new version of my code. I can actually raise a silo with a new version of the code and validate that it works. If I want to upgrade from Symphony 2 to 3 or I just add a new piece of functionality that I don't really know it works until I get it validated with production data, I can do this with silos. So I can completely release my version in a silo that is independent from the others. And if I have a problem, that's why I put version numbers in there. So if I put version 17 and my logs are starting to get full or I have some other issue, I just take that silo down. I have production information that will help me understand what the problem was. And I am able to release a new minor version that patches the original one. Whereas in the current approach, if I upgrade my code base, it's a very painful procedure to go back and it will take time. And that is why we're talking about downtime previously because if I don't afford downtime, I don't afford those moments when I go back from version 17 to version 16. Maybe. But the difference is that we're not Facebook and we're not sitting on a big pile of cash. And by using silos, this is the method that allows you to use exactly what you have today and just duplicate it. So the increase in hardware cost is minimal, in my opinion. There is an increase, but it's minimal. So I don't get the best engineers that Facebook does. I don't get all their infrastructure. And I'm still able to do what they do. Kind of powerful. So of course, the advantages and disadvantages. I've left quite a few time for questions. I hope you don't mind. So the advantages would be, first of all, you get to use familiar technology. You don't have to learn anything new. Just put a traffic controller on top of everything, right? It's quite simple. So real-life testing that we've been discussing, it's very powerful. Do go and talk to your marketing department because they will love you. I promise it happened. And another thing is you don't need big hardware upfront requirements. You don't need to buy anything completely new and expensive on the hardware side. The brand loyalty that we were discussing a little bit earlier about, it's the fact that you're not losing customers. So even if variation B of your website is, let's say, unstable, it will be unstable for a minimal amount of users, not for all of them. And even if you spend a little bit more money on hardware because you need to raise another silo, the actual total cost of ownership of your entire product is lower because you don't have to invest in more fancy tools. Also, when you want to be scalable, it's all of a sudden more simple because you didn't make changes inside your code base, but you stepped one level up and you did the changes on the architectural bit. So scalability is now very interesting. You can inside the silo fine tune your configuration and see what different machines you need and so on without too much pain. Now, as a list of problems is a bit more important than the list of advantages because you can't do this without having a very good DevOps team. So if those guys are not using automation and they're not really skilled at this silo game, immediately raising them and taking them down, then you will kind of have a problem. You won't be able to apply this approach. Talking about hardware, there is an increase in costs, but the increase is minimal. You still have to budget for it, so we have to mention it. And the real issue that people only discover after using it for a few weeks is that the monitoring layer is now all of a sudden more complex because I have to be able to see different information for different silos. So in my monitoring dashboard, I need to be able to see my silos independently and that requires a bit of extra work on the monitoring side, but it's only done once. So another problem that you can put it as a problem on the disadvantages list is the traceability of what did the user do and what silo served that particular user. So in order to trace a problem, even if I only divert like 2% of the traffic in the new silo and there are issues, I need to be able to trace that individually. So again, DevOps is your best friend. When you will work on reproducing a bug and actually hunting its source codes, you will also probably want to think a little bit differently about it because you have doubled your machines, let's say. So you will address some other aspect. But quickly going through the takeaways, so if you left this room and we had like only one piece of advice, if you only need to remember one thing, it's this one, monitor everything. Even if you don't use the silo approach, do build situational awareness, know where you stand, and add clever monitoring to it. It will look like a big investment, maybe, but it will pay off very, very soon. And the other thing that might be of interest is how do you identify the outage detection? Because all of a sudden, now you have two, maybe three, silos. And what exactly is an outage? Where did the outage actually happen, in which silo? But once you have the monitoring in place and you have the tools, you are able to take one silo down and replace it. So, yeah, in the A bit testing, we kind of discussed it quite a few. So I have a list of items that you might want to read. If you're like me, you read a lot of text. I don't like to watch too many videos. So the high availability concepts are introductory level, are very well explained in the Wikipedia page. But once you learn the basic words, you want to move to the open stacks dedicated section. You will love it. It's very, very good. And I have there a link. It's the US presidential policy directive 21. Now, the thing is, why did I put that link in there? Because it has some definitions. So under the Obama administration, when they were giving out the IT projects to external companies, third-party companies, they were defining a certain vocabulary and certain criteria that they wanted those applications to meet. And defining resilience and high availability is very interestingly defined by one of the key players in this. But now it's archived because, of course, now we have the Trump administration. So it's on archive.org. It's no longer on the official White House website. Also, as promised, the Chaos Monkey source code, just click it. Have a read. It's very interesting. You can just download and install it. And a much better presenter than me is Brian Adler. And he has some more insight on high availability with Cloud. But as I said, the Cloud part is sitting in the middle between four and five nines and three nines. So if you're using external providers in the Cloud, maybe you're getting a little bit further from the four nines. Thank you. Questions? I was just wondering, what tools have you used to handle actually deploying silos and monitoring them? How have you done that in the real world? You mean the monitoring part? And the deployments. How have you actually managed doing that? Yes. First of all, the deployments. Automation, of course. We started with Puppet. But our system engineers discovered that Puppet's way of executing things. So Puppet decides, you give it rules that you want it to apply. But it decides the order in which to apply them. And therefore, they decided to move to Ansible, which is much more predictable, because it will execute this script top-down. So we put the first rule and the second, so it's much more predictable. I had some problems with getting our machines running on Puppet. What I want to put as an order of magnitude, inside one silo, we had about 30 machines in total. So without automation, you couldn't do it. So that's why they probably reached the margins of Puppet. Puppet is an excellent tool, and I strongly recommend it. But in our case, Ansible was better. And to answer the second part of your question about monitoring, if you were here last year, or maybe you will search on YouTube for my presentation last year, we wrote a logging system. So the problem with looking at the server logs is that they don't tell you what went wrong with the user. And normally, the bugs are related to the user experience. So we wrote a logging system that we are able to use in production. And one of the cool tricks about it is that you have an endpoint that will receive logs. And you just give it the piece of information that you want it logged. And it will immediately give you a back response, and you can continue working. So it doesn't degrade performance. I can tell you a little bit more after the talk. I was just wondering, with users, do you have shared user sessions between all of your silos, or is all the users only have a session with one silo at a time? Yeah, thank you very much for the question. We use session stickiness. So our traffic controller is a bit more expensive. We used F5. But of course, in real life situations, when you don't have an F5 available, you move to open source solutions. With F5, we were able to get the users sticky to one silo. And the good thing about it was that the endpoint user experience was delivered by the same stack. That is why I said the actual tracing of logs is quite different. Because if you would allow the users to be served by any silo, we would probably get problems from a second silo with the new versions and everything. And the performance would degrade from the user's perspective. Once the user went to the B version, started with the B version, then his entire session would be in the B version. And we would get the actual measurements from the production environment. Does it answer your question? Yes, thank you. Thank you. I apologize if this is a difficult question to answer. But I don't suppose you could just simply explain the difference between what you had right at the top, which is the thing that deals with the traffic, what the difference is between that and the load balancer? If I reformulate the question, why didn't we put a simple load balancer instead of a traffic controller? Yes, yeah. Okay. Well, part of it, the answer is two-fold. First of all, by continuing the answer to the previous question, if we gave the user the possibility to run through the silos independently, then we would not be able to achieve A-B testing. So I want one user to be served by the same stack because I want to measure that individual stack. And if I show my user the B version and then the A version and then the B version again or the C version, then I will confuse the user. So that's like the first part of the answer. And the second one is I want my silo to be completely independent from the others. So even if it's delivering the end-to-end functionality, I want it to be completely independent. That is why we use the silo word in the first place because it's only communicating with a traffic controller. So in the traffic controller, I can build, F5 is quite smart in this respect and you can build very complex rules as to where your user can go. Okay, thank you, yeah. F5. So F5 was our solution for several customers, but we only used it because we had access to it. So the customer was paying, so we were able to use the best tool on the market. Beg your pardon? It all depends on what you have as an infrastructure and as knowledge, because depending on your technology stack, the traffic controller might be different. You know, when you choose a certain solution for your project, it also depends on the skills people have, on the money you have on the table, so many factors that if I will spit out a technical name, maybe it will not fit your solution. So I would rather not give you like a very straight answer. What was the scale at which you were able to just deploy different silos? Because I assume that at some point, you will reach a breaking point where having too many silos becomes just too expensive or might become too expensive. We were normally using three production silos and that's because inside there were architected in such a way that it was like a lot of traffic. So we had an automation part where we would detect the increase in requests and would automatically raise a new silo. So we kind of had like two production ones with a third one waiting already to be built and we only needed the fourth silo one time. So as I said, it all depends on your particular application. Because if your application is small and you have only like one or two layers inside, you can really go crazy with the silos. So if your silo size is small, unlike ours with 30 machines, it's kind of crazy. But if you have only a couple of them or maybe by 10 machines, it's already manageable, you can have more silos and I don't think it will be a problem. All right, thank you. Hi. In terms of high availability, the data layer, it seems that it's kind of out of the silos approach in this scenario, but do you have any notes in regards of having high availability? I mean, eliminating that single point of failure somehow or trying to prevent it? Yes, thank you for this question also. The data being highly available would be like the subject for two or three presentations in a row. It's a more complex topic than the coding part. But what I can tell you is there are some very good solutions today. And just by, we discovered that only by taking the static data outside the data layer and putting it as a cache inside the silo, we started with the file system cache. So you had like a key value, just like images and all the other static assets, you access them via URL and you grab the contents. So we started with the file system approach and then removed to memcache and then to redis as a caching mechanism. But if all the static information is not living in your database and you only work on the transactional level and you get availability from the database and you only work with IDs, you don't get like a full-sledged data. It will not be such an issue. So you will be able to scale your data layer much easier and it will not quite be a point of failure if it makes any sense. Another question I guess. Okay. So in this approach, we've been attending conferences and talks about Docker. How do you integrate into this solution somehow? So instead of using Ansible, the inside of the container can be composed of Docker images. So if you want to migrate your current infrastructure to Docker, you put a traffic controller on top of it and do a Docker-based silo and see how well it plays. Yeah, this is quite an interesting use case. I never thought of it. Thank you very much.