 Okay. Welcome to the second talk in this session given by FlavioPairCoco on OpenStack. Thank you. Hello. Hello. Does it work? Yeah. Okay. Cool. Hello, everyone. Today's talk, well, my talk is about system integration. It's not actually 100% related to OpenStack. It's mostly related to integrating systems to each other. And I'll use OpenStack as an example because most of the methods I will present to you are being used by OpenStack itself to integrate all the services that we're using. So that's me. That's my Twitter handler. Pretty much everything you want to know about me is out there in the Internet. Something I want you to know. I work for Red Hat and I'm part of the RDO community. RDO is a community of a bunch of really great people working together to make OpenStack amazing on role-based distributions. Other things about me. I'm not going to go through those. The one I will mention is I'm a Google Summer of Code and OPW Mentor. And I wanted to mention this because I really believe it's very important. And if you have spare time in your day and you want to mentor people, please sign up. We need more mentors there. So let's get to it. Before we go through the methods that you would use to integrate systems, let's first define a little bit what system integration means. System integration is basically what you do when you have a set of subsystems and you want to make them work together towards a common goal or scope, right? So all the methods and technologies and strategies that you would use to make those system work together and to get towards that goal is what we call system integration. This is put in a very simple way. There are a bunch of different definitions of system integration. You can integrate systems. Systems are not necessarily software. You could use hardware as a system. You could use many other things as a system. So system is a very generic term that you would use to say that you have a set of subsystems working together for a single cause, so to speak. There are many different generic strategies to integrate systems. These are the three that I will present very briefly, and we will dive a little bit more on the last one. So vertical integration, vertical integration means or basically looks like a small graph up here. It looks like stars. Basically, you have a set of systems and you will talk to the system. The system that are above will talk to the system that are below. And this is done based on each subsystem features and what you need from them. So you will have a web service that integrates to your database. And then you have two systems that are working together. Or you will have your authentication service and then you will have your other services below it. And you will make this service with the real features to talk to your authentication service. And you are integrating those two services vertically. The star integration, well, it's called star integration because it's supposed to be like a star, but it's more like spaghetti integration because all services know what other services do and they all talk together. And they do that in a case-by-case basis. So you have service A that needs something from service B, but before doing that it talks to service C because it needs something from service C before getting to service B. So it's quite a mess. Use cases for these, there are plenty of them, but it's very risky and it's very, very error-prone. And there's for sure a very high risk of not having a contract when you're talking, when those services are talking together. Not having a contract basically means that you don't know what you're going to get back and you don't know when something gets wrong, when you get something from service C to talk to service B, but service B is expecting something different. And it turns out that service C was updated. And the other method that we are going to dive a little bit more today is horizontal integration. And it's based on a service bus and service bus is called a communication bus. I don't like to use the term messaging bus here because it doesn't matter. I mean, it's not about messaging itself, but about making those services communicate together through the same bus. So you have service A, service B, and service C, and they all communicate through this communication bus, sending either messages or just a data asset that will make the whole feature work through this communication bus. So diving a little bit more on horizontal integration from an application's point of view. So how would you make all this? So imagine that you have a set of applications that you want to make them work together. You need to have this communication bus. So you have to come out with an idea, with a technology that you would use to make them talk together. What I'm going to do now will present like four different methods, four different methods to make those applications talk together. And these are not new methods. They are actually been around for a long time. Many people use them, and they actually don't know that they are basically integrating system, and they actually don't know what the whole thing they're doing means. And each one of these cases are good for very, sorry, each one of these methods are good for very specific cases. Some of them are more generic and others are more specific. And the first one is, sorry, and the first one is files. Files is probably the oldest way to integrate different services talking together. For a long time, people used to open a file, get a file descriptor, put something in there and having another application within the same piece of hardware, reading out of it a dead asset that it will use to do something. So some people would use files too. Who would use files as a messaging bus, like you would use a RabbitMQ right now, so a messaging broker. It's good for very specific cases. Try not to use it. It's good for cases like embedded systems. And at the embedded system, you won't have RabbitMQ running for sure. So if you have very limited hardware and very limited processor and memory, you probably want to use something that's really cheap. Files are cheap. Definitely accessing the file system has a cost. It has a high risk in terms of security and reliability. But it works very well for embedded systems. We used to use these files, we used to use files in OpenStack to have some kind of in-server distributed log for some time. Many things went wrong with that, so don't do it. We're now working on another way to have distributed logs. But that's one of the cases where, for example, in OpenStack, we used files and we moved away from them because, like I said, they're very good for hardware that has very limited resources. But if you can afford something more expensive, you probably want to do that. Databases. Databases are probably one of the most common. By the way, all these statements are based on my own experience. I don't have actual data that proved that this is the most common or the files are the most oldest. So this is all based on my experience of my own researches. Databases are probably the most common way to integrate services. They are asynchronous data-wise. What that means is that when you have a message and you want another service to get that message, you would just store it in the database and you are done with it. So the producer stores the message in the database, the producer is done with the message, and then the consumer eventually will get that data out of the database and will do something with it. Databases are really great for starting states. And I'm saying this is probably the most common one because I, like, most of the web services out there, like, I couldn't think about a web service that does not rely on a database. And if you want to scale your web service, you definitely have, or most probably have, a single database for the whole thing. And you have several services talking to that database and getting data out of there, right? And they're great, great, very great for states. And the way we use this in OpenStack is, in OpenStack, most of the services are probably the biggest services have been split in several smaller services. So in Nova, for example, how many of you know OpenStack or have heard of it? Awesome. So Nova is the service that is responsible to spawn in new instances, virtual machines, so it probably is, like, easy to do for AWS. And Nova has three sub-services, or well, it has many more than that, but the main services that you need from Nova are, like, three or four services. And you have the API service, you have the compute node, and you have the scheduler, and you have conductor that gets messages and stores everything on the database. So when a request for a new instance comes into the Nova API service, a new record will be created in the database, and then a message will be sent to the scheduler, then we'll talk to the Nova compute node to spawn the new virtual machine. So what Nova compute does is it gets the data of these new instances that were requested out of the database, it spawns the new virtual machine, and when the virtual machine is running, it will update the state of the virtual machine, saying that, hey, the virtual machine is running, and it will update the data. So that's system integration in that really small scale. And that's a way you could use databases to integrate systems, so that's why you say that they're probably the most common way to integrate systems, and probably many people don't know that they're actually integrating systems by using databases. So LibreOffice is, there you go, a new LibreOffice. So does any of you have any questions so far? Feel free to interrupt me if you have questions. LibreOffice is stuck. Okay, finished. So messaging. What I mean by messaging here is not a broker, it's not NQP, and it's not the specific technology that allows you to send a message from point A to point B. What I mean about messaging here is the message itself, like the message as a unit to send data from point A to point B. Whatever method you use to send that message from point A to point B, the benefits of using messaging is that it's loosely coupled, and it adds way more complexity, because by being loosely coupled it means that you don't have a contract on the message, so the service A can send a message to service B, but service B has a hypothetical idea of what it's going to get, and what it wants to do with that message. It adds more complexity, because if you don't know what the message may look like, you probably will have some parsing errors, type errors, or whatever, depending on the language, and what you want to do with that message. Some benefits, though, it's like being loosely coupled, you can say I will send this message, and whoever gets this message can do whatever it wants with it. And one of the places where we use this kind of messaging or loosely coupled contract is in Cilometer, where Cilometer would plug into the notification stream of OpenStack, and it will get all the notifications of what's happening in your infrastructure if you spawn a new virtual machine, a new notification will be sent, so Cilometer gets it, parses it, and does something with it, creates new events, creates stats, and allows you to build users based on what you've done. And one thing about messaging is that it may depend on message routers and transformations, so when you use messaging and you want to send a message from point B to point C, but you have to go first to point B, you will need in point B some kind of logic or a technology that will allow you to route that message to point C, and you will do that based on the message information itself, so you have to know what to do with it and you have to try to parse it and get information out of it to know where the message has to go, and this is something that Nova scheduler, for example, does, it doesn't get a notification, it gets an RPC message and we'll go to that later, but it gets a new message, it parses it, and it tries to get a Nova compute node that will do the work for it, and it will send that message to Nova compute, and it does that by using some filter information, and it does that by using some filter information, and it does that by using some filter information, but let's not get to, let's not dive into that. They are very easy, they are very cheap, but they add complexity to your system, and the last method that I want to present today is RPC, and RPC, it stands for remote procedure calls, it was probably introduced pretty much by the enterprise war, when system integrators wanted to integrate systems for the customers, and they would go and use RPC calls to do that, and RPC calls, the way it works is you will send a formatted message, so you have a contract on that message, to point B, point B will do that, and it's called remote procedure call, because you're basically calling a remote function just by sending a message, and it will send and pass these arguments to that function, and give me the result back. It's the most used method throughout OpenStack, and I do have numbers for this. The message channel may vary, you can use databases, message brokers, so like I said, I'm not talking about, here I'm not talking about the method you would use to send a message from point A to point B. In the OpenStack case though, we have one of the drawbacks, but it's actually something required for RPC is that it's tightly coupled, so you have a protocol, you have to invent something, you have to agree on a contract when you send a message from point A to point B, because you want to call a function that you know it exists in point B, and you have to pass some arguments to that function, and you want to get a result back, so you have to know what are you going to get from RPC, and you have to design your own protocol to this, but you know, it's really common, it's very useful for doing that kind of remote calling function thing, but you have your benefits and your drawbacks taken from this. In the OpenStack case, this is pretty much a high level overview of how OpenStack works in terms of system integration. It's based on shared nothing architecture, if you don't know it's basically a very simple way it's services working together, but not sharing anything, and by not sharing anything, they don't share memory space on your box, they don't share processes, PIDs, and other resources, like they can live together on the same box, but they won't share the same resources, they don't have their own space within that box, so every service knows very few things about other services, and with that, we manage to keep all those services very isolated from each other, which is really good if you want to integrate systems together, you want your services to be independent, you want your services to be isolated from each other, and if something happens to one of your services, you would see, you definitely want your other services to still be alive and being able to work on top of other services. We use RPC for inter-service communication, like I said, Nova API will store a new instance record with booting state, and then Nova Compute will update that state, and we use RPC for inter-service communication, when Nova API gets a new instance request, it will send an RPC message to the scheduler, and the scheduler will get that message, and then it will send another RPC message to the scheduler, and I already mentioned this, the way it works is that services, when something happens with OpenStack, services will generate notification, then they will send it to some specific topic in the broker that other services can just plot into and get messages out of it, and they can do something with those messages. So since OpenStack relies a lot on brokers, and it's probably right now one of the common tools to integrate services in many deployments, I would like to say a few things about brokers, and the technology you could use, or how you could do integration based on protocols like AMQP, or just using technologies like message brokers. So the first thing I want to say is that scaling brokers is really hard. If you've made red or heard something like broker scaling is already fixed and you can scale Rabri AMQ, I'm sorry that's a lie, that doesn't work that way. There's a lot of documentation, yes, there's some explanations how you can do it, yes, there are some demos that people have done it yet, when you get it to big scales, it doesn't work that way. So scaling brokers is hard because synchronizing messages between different nodes is really hard and it doesn't work that way. Another thing is that brokers need a lot of memory. It really depends on your use case, if you don't have many messages traveling around your system you probably won't use a lot of memory, but if you have a big deployment your broker is definitely going to use a lot of memory and it really depends on how fast you write to it and how fast you read from it. If you write really fast and you read as fast as you read sorry, if you read as fast as you write your broker will probably use less memory, I mean the memory footprint will be pretty linear and stable, but if you have more writes than rates your broker will use a lot of memory. Brokers need a lot of storage and if you want to have durable queues and you have your messages to stick around if something that happens you will use a durable queue. If you use a durable queue your broker will have to write everything on this because if the broker goes down it has to start from somewhere, right? So you will read all your messages out of whatever database or storage system it is using and it will make those messages available again. So again, if you have a lot of writes and not as many reads as you have writes your broker will use a lot of storage. So I was looking at the time and it says nine minutes because LibreOffice went down and I was like I'm already done. So since I've been ranting about brokers for a bit, I would like to say something about this. If you are going to use brokers or any messaging technology prefer federation instead of centralization. What I mean by that is if you have a centralized broker and you want to scale the broker and that broker goes down your system is off. You will have HA and all what you want. You want to scale the broker and you want to have it replicated and all those kind of things. But if you prefer federation instead of centralization you will have a whole bunch of nodes that are lightweight brokers and if they go down you will probably have one and you won't rely on a single broker that is in the middle of your system processing all your messages. And one way to do that is relying on NQP 1.0. NQP 1.0 I'm pretty sure most of you are familiar with NQP itself. Current latest version of the NQP protocol being used by Rabri NQ and most of the brokers is NQP 0.10. NQP 0.10 is not a standard and many brokers have implemented it in different ways. Whereas NQP 1.0 is actually a standard it detects how messages will go from point A to point B so how you can send messages between two peers. NQP 1.0 is peer based in a message basis and it explains how a message will travel from a point to another point but in the specification there's also an explanation of how you would do that with an intermediate broker so it doesn't say that you have to have a completely federated system you could also have a broker in the middle that is capable to speak NQP 1.0 NQP 1.0 is all about messages and how those messages will go from point B. And if you want to scale it and have more routing intelligence so to speak in your system you could use something like HP Dispatch that would allow you to create new rules to send those messages between your services as you would do with using routing keys in NQP 1.10 So in NQP 1.0 you don't have exchanges, you don't have queues you don't have binding rules and you don't have routing keys in NQP 1.0 you just have messages and you have links and every link is basically a connection to one of the peers in your system So after having said all that about methods to integrate systems and technologies that you could use and protocols and all that stuff I would like to give you some tips and tricks about system integrations this is mostly based on our experience in the open site community first and foremost transmission protocol matters by transmission protocol I'm not talking at the lowest level I'm not talking about UDP against TCP I'm talking about probably a higher level like whether you want to use a protocol that's TCP wide or you want to use HTTP or you want to use some other RPC protocol whatsoever Transmission protocols matter depending on the protocol you choose you have some extra cost on your messages and then transmission of your messages so be aware of that depending on your use case make sure you choose the best protocol for your use case use versions for your wide protocol if you choose to use RPC to integrate your systems you probably will have to agree on a protocol and you probably have to define that protocol by yourself something that has been around in open stack for a long time is the version of those protocols so when you define your protocol you probably will say my protocol is a dictionary that I will send between services and that dictionary has a key that is called function and that function is the the value of that key is actually a function name and then I will have R and K keywords and in the dictionary and that will be the value to pass to the arguments and keyword arguments of that my function but then you want to update that protocol you say I want to also specify the return type I want from that function and if you have if you have your system deployed and you want to make a change to your protocol you can do that but if you don't have versioning you will have to tear all your services down and then up again once you update the protocol because if some service gets a message with an RPC format it doesn't recognize it will probably fail instead if you have versioning you can do rolling updates on your system by restarting services one at a time and updating those services so you don't have any downtime versioning is not just useful for upgrades it's also useful for backward compatibility if you do a change and that change turns out to be really bad you can go back to your previous version and you still have your services that used to work with that version keep everything explicit I have a really nice quote that I got from Jeff Hutch's talk at the RIAC conference he basically said in a distributed system having implicit things is the best way to fuck yourself that's really true if you have implicit things happening in your system you send a message like an RPC message and you don't agree on the contract for that message you will probably face some several issues that you didn't expect it to happen so keep everything explicit even if it is more verbose even if you need more code even if you need more nodes running that's fine, just keep everything explicit if something bad happens you will know what it is you will know how to debug it and how to fix it most of the time can you step to the microphone please? I can repeat it he's asking for an example of an implicit oh I can get one of the open stack issues well let's see yeah for a long time FireCocker currently uses messages out of the notification stream in open stack there were some implicit fields being sent by some services and those fields were were sent by other services so Silamer didn't know about that and when there was a case where it failed when you got those messages a good thing is that it was before the release so it could be fixed but anyway you don't need to like something that you want to keep explicit in your system how where your nodes are running and what nodes can run alongside with other nodes you don't want to have all nodes running on the same server so if you keep your architecture and your distribution very explicit and even in the way you use separate services it will be easier for you to estimate the scale and how to distribute those a good example of this is Nova itself again Nova has a Nova API service it has a Nova scheduler service so if you are getting a lot of API request you will get a lot of messages going to your scheduler so if your scheduler is under a lot of pressure you can add more schedulers to it and that will you can scale them horizontally very easily so the way you distribute your services in terms of code you have an API service, a scheduler service a conductor service and a compute service it's another way to be explicit in how your distributed system should look like designed by contract and I've been using the word contract a lot today if you design by contract you don't have to like you know what service B is expecting you to send and service B is expecting you to send something so service B can run let's say a set of assertions before running anything and it will be replied back if some of those requirements are not met so it's pretty another way when you integrate your system and you want two services to talk to each other you have a contract between them pretty much like your account manager and yourself you have a contract with him and you know what he is expecting you to do when you pay something and he wants you to get all the receipts he will do something with it and you will pay him for that service and he will expect you to pay for his service right so you have a contract with him the same thing happens with services when you send a request to service B service B is expecting something from you you know what the service is expecting so you will send that if you don't met those requirements he will reply back with an error and if you send all the requirements you are going to expect something back from it so you are expecting you can just call back again and say like hey this was not what I was expecting so please keep me what I want this design by counter was probably known by most of you it was introduced by AFL the programming language and it's basically part of the coding style of the language itself keep services isolated as much as possible like I said share nothing architecture it's very useful when you want to keep your distributed system safe from failures and it's not completely safe from failures but if one of your services goes down and it's isolated from all your other services you probably can just run another one somewhere else and just make it talk to you so keep them isolated keep your services very stupid if you can and I'm not talking about microservice architecture I'm having like thousands and thousands of microservice doing just one little function thing but keep them isolated and very focused in context on what they have to do avoid dependency cycles between services I wouldn't recommend using this star integration method it's really messy and when something goes wrong it's very difficult to avoid having dependency cycles between your services if you can have a service person make sure you don't depend on both services don't depend on each other to get something done mock is not testing if you have a distributed system you probably want to test it if you want to test it you would say hey the easiest way to test it is by mocking what I'm expecting from the other service yeah that works and it probably will succeed every time but that's not testing if you want to test your distributed system get it installed run everything live when something is working and it's not working we have mocks in OpenStack but we also run everything live for every single patch this is very important like many bugs we have found in OpenStack and that are related to how services are distributed we're not tested live and we have mocks for those tests so mock is not testing and before closing since this is a Pythoner interconference here are three libraries for doing integrations like Combo is for sending messages Combo is a library that is actually used by Celery and it supports transport and every transport is basically a messaging technology you could use you can use RabbitMQ, MongoDB, Redis and some other sort of technologies that it supports during Q Celery is a distributed task manager well there was a presentation before mine about it basically allows you to have distributed workers doing something based on messages and Celery itself uses RPC implicitly to tell workers what they have to do and also messaging is an RPC library and that's what we use in OpenStack to send RPC messages through services and it also supports well it has the architecture to support many brokers it just supports RabbitMQ and QPIT for now and we're working on the MQP 1.0 support for it and these are some messaging technologies that you could use you probably already know them like HAFKA, Marconi during Q RabbitMQ and the QPIT family you have the QPD which is the broker and it supports 0.10 and well it actually supports well you could use QPIT which is fully MQP 1.0 and QPD is but for routing messages throughout your system and that's pretty much it any questions? Please come to the microphones if you want to ask questions Hi thanks for your talk I was really I would be quite curious about how do you do your systems integration testing like do you have also some automated system integration testing of setting up a cluster with all the services and so on what tools do you use? So in this stack we use Garrett for code review every time you submit a patch there there's tool which is our testing integration tool it basically gets notification from Garrett and it runs a Jenkins job every time it gets a new patch and those Jenkins job will install in a single node and it will test we have live tests that call APIs and it will send messages throughout the whole system and like simulate a live environment like spawning new virtual machines, taking it down creating new volumes, deleting volumes creating new images and deleting images and all that kind of things so it's been tested live we do have automated tools Jenkins is basically the one that does everything and we use DevTag to install pretty much all the tools Jenkins jobs Thank you I have a question you didn't talk about security if you run this messaging infrastructure how do you secure it? Sure So right now in open stack security is pretty much done by binding everything on your private network in this layer we have some work going on on signing messages and encrypting messages or sending them through the whole pipe so to speak there was a talk about Marconi that was done yesterday where one of the good things about Marconi that was presented is that it is good when a message broker is not good enough one case is especially security we have guest agents running in virtual machines we don't want those guest agents to talk to the central broker so Marconi would be good for that use case where you can just set up a new service that doesn't have to take a high load of messages in your infrastructure and you will isolate everything from your message broker so the security how it's done in open stack right now is just by binding everything on the private network and we don't allow anyone to talk to that except for the services running in the open stack deployment and like I said we have some work going on to sign messages and encrypt messages before sending them through the wire yes I have another question do you have a way to make the dependencies between your services visible because when I see this communication bus it looks very clear and simple you just put a message on the bus and somebody else will get it but in the end that's just a way for the services to communicate with each other and you can easily build a spaghetti dependency system by just using a very clean bus so how you prevent this logically like we don't have any any assertion between services that say like hey we can't depend on each other it's just done logically when the design decisions are being taken like we cannot make service A to depend on service B and service B to depend on service A so let's try to figure out a way to do that which basically means create a service C unfortunately but yeah it's done logically like cycle dependencies in my opinion are bad but they are not always bad like everything in software but we try to avoid them as much as possible it's all done like logically we have everything explicit so since we know which service depend on each other logically speaking or function wise or feature wise we know that we cannot create cycles in some of the services or we try not to as much as possible can you use the mic sorry and that's explicitly written down somewhere in the code it is in the code definitely but we have also documentation about it written on wiki pages and the documentation of each other service and the operations book obviously because you have to know how to deploy the whole thing if there are no further questions I'd like to thank the speaker again and thanks for attending