 My name is Sergey, and I'm glad to see you at this session. I work as a project manager at Altorus, and today I'm going to share the experience that our team gained working on one of the projects in the health care industry. The project is about building a highly available cloud solution for customers who operate various medical devices. And today I'm going to focus on some specific technical aspects related to connecting various medical devices to the cloud and using different network protocols. Also, we'll talk a bit about porting the cloud solution between OpenStack in AWS infrastructure. OK, let's see about the implementation requirements. I won't describe in details business and legal requirements that initiated the project, but instead I'm going to focus on technical challenges and how we're able to solve them. So from the technical point of view, we need to build a cloud solution that connects medical devices and users located at customer sites in secure and reliable way. There may be thousands of devices located at each customer site and from thousands to hundreds of customers. Each customer must be provided with secure and isolated connectivity to the cloud. And they also need a space to store the data from the device. So we call this solution as Internet of Things for health care to some extent, at least to some extent. And it is planned that cloud implementation should be portable between OpenStack running on the hardware and public cloud provider like Amazon AWS. AWS is our first choice of a cloud public provider, but the architecture should be quite generic in terms that it can be installed or deployed on other public cloud infrastructures. So even when cloud is deployed on AWS, the access to the cloud should be limited to only the users and customers who subscribe to the service. OK, let's talk about a bit high availability for HA and scalability. It is common to think that when we have a cloud like AWS, cloud infrastructure like AWS, scalability and HA features are available out of the box. But it's true to only some extent because HA, it means that it should be supported on all layers of the solution from infrastructure to the applications. Of course, it includes virtual machines, networks, and most importantly, applications. When we talk about AWS, probably we don't care about the physical networks, servers, switches, and redundant power supplies. But it's a very common case when AWS sends the notification about the retirement of a virtual machine, and we need at least be able to migrate a virtual machine without downtime to applications and services. A few words about the security. Because it is a health care, and the security is a central part of the business, the main mode to connect to the cloud is the VPN mode. So there are various devices located at customer sites, and they operate WebSocket, TCP, and HTTP protocols. And what is interesting that old legacy HTTP devices, they have B-directional mode. It means that the device can send a message to the server, and a server as well can send a message to the device, which works well in isolated network when there is a direct rosin between the device and the server. But it's challenging to solve this problem in the cloud. We will see a bit later how we designed the solution for HTTP devices as well. But at the same time, non-VPN mode should be provided as well, at least when we have a cloud in AWS. And we have a public endpoint, which is protected by the list of, let's say, AP addresses for customer offices who are allowed to connect to the cloud. So this is a kind of mixed mode to access the cloud, the VPN access to connect to private IP address space and a public access. It's quite obvious that we talk about moving cloud implementation between several infrastructures. In our case, it's OpenStack and AWS. Cloud Foundry is one of the best choice to use as a platform for applications. But it's maybe challenging to configure highly available Cloud Foundry deployment, especially on OpenStack. We'll take a look closer on the next slides and how Cloud Foundry components are distributed on OpenStack virtual machines and the physical nodes. Cassandra is the choice for many projects where there is a need to store the data from devices because it's almost unlimitedly scalable. And in our case, we did a lot of benchmarks with Cassandra to identify the performance of cluster on virtual machines on OpenStack. And we found that it is enough to run Cassandra on virtual machines to process and store around 2,000 or 3,000 messages from this device data per second. But according to data stacks, it's not a good or best practice to run Cassandra on virtual machines. And we also evaluated additionally Cassandra on OpenStack bare metal. OpenStack bare metal support is available from OpenStack version 8 with the project called OpenStack Ironic. For structured data, we use myEDB Galera cluster, which is also an open source project used in many solutions. For example, in OpenStack, the database that contains all the configuration of OpenStack services, it uses myEDB cluster. We did also the benchmarks for myEDB cluster. And we found that the performance of myEDB is much less in terms of DML operations compared to Cassandra. But it was enough to our use case where most of operations are read operations. And myEDB cluster scales for read operations, but it doesn't scale well for insert or update operations. And in our deployment, we just tuned or updated several configuration parameters for myEDB that controls the behavior of commit transaction. RabbitMQ is also the solution that we use as the message queue to store data messages from devices for subsequent processing. And because all the messages from devices they hit the RabbitMQ, the performance of RabbitMQ should be also around several thousand messages per second in terms of being able to store a message in RabbitMQ and to retrieve the message from the RabbitMQ. ElkStack is our choice for storing application logs. And it is integrated with Cloud Foundry. Kibana is the web interface probably who were on our booth seeing the web interface of Kibana. And the Kibana web interface allows to access application logs in easy way to do full context searches. And it is protected by default. Kibana open source project does not have any authentication. And the way that the plugin that allows to protect Kibana web interface by with Cloud Foundry authentication. And it also allows to filter the data that are stored in Elasticshort cluster to the data that should be visible just for organization and spaces in Cloud Foundry that is allowed to the user of Cloud Foundry. We also found that the most critical part of ElkStack is the logStash process, which is quite slow because of the set of rules. And each rule in logStash it is processed sequentially. And let's say if an application generates like a million of lines of log output in debug mode, it puts a big workload on logStash processes. So in our development deployment, which we use as well for the testing, we run from six to 10 virtual machines for logStash process. But it's flexible when Elk deployment is managed by Bosch, we can spin up new machines for the logStash dynamically. And most of the monitoring and delercing features is provided by Zabix. So we have a separate server for Zabix installation, and it contains a database. The database grows very fast if we collect all parameters from the hardware, open stack nodes, and if we collect parameters even from the Cloud Foundry jobs. So let's see the deployment diagram or deployment view for our open stack. When it comes to open stack, we need to choose, of course, the hardware where to deploy an open stack. And one option is to use the blades chassis. There are a number of vendors who provide blades chassis. These are AHP, Dell, SuperMaker, at least these vendors that we are evaluating for our development environment. But the recommended deployment of open stack is vendor neutral, I would say. To achieve high availability of open stack deployment, we need to use three physical nodes for open stack management components. They are in yellow on my slide. And open stack compute services. These are the services that are responsible to run virtual machines on open stack. They are distributed across three availability zones. In our case, three availability zones are represented by nine boxes in gray. And each availability zone is a group of physical nodes or physical blades in the hardware. So if we need to extend the capacity of open stack deployment, we just add a new blade or new chassis with blades connected to the networking and extend the capacity of our open stack deployment. Also in open stack, we have a storage services. There are two major storage services. SAF, it is a service that provides volumes for virtual machines. And the second one is Glance that stores virtual machine images. And Swift is optional because it provides open stack. It provides object storage in open stack. Also we have additional services. We call them administrative services. These are the main name services, time service, and mail service. We deploy them on separate nodes and also at least two machines are dedicated for these services to achieve resiliency for these services. We have network switches and we have a firewall, the physical hardware that protects our cloud deployment. Everything is installed in a data center, in a data center cabinet with redundant power supplies. So let's see how the network is represented in open stack deployment. For our development cloud, we use Cisco SA5545 hardware as the cloud firewall. It supports up to more than 2,000 VPN tunnels. This is a concurrent VPN tunnels. And the total bandwidth of encryption traffic is about, should be less than 400 megabits per second. This is the middle model of Cisco encryption hardware. And if we need to support more VPN tunnels and more bandwidth for encryption traffic, then let's say we can use Cisco SA5585. What is important is that it provides connectivity to personal user accounts for administrative access to the cloud. And it also provides access for side-to-side VPN connection between networks. So when we need to connect a remote network to the cloud, we use side-to-side VPN. It also can be clustered in active standby mode, which means that we can achieve even a high availability on the layer of the firewall, which the failure of firewalls from Cisco and switches from Cisco is quite a rare case. But nevertheless, there is an option to build HA for firewall. As the networks, we have administrative network, which is 10.setty.0.0 slash 24. We just designed it in this way. It is used to connect to management interfaces of physical nodes, to connect to a switch and firewall as administrator and be able to configure remotely the hardware. So no need to go to the data center. Then we have a cloud, I call it the public network. The public means that this is the network which is exposed from the cloud to a client which connects to the cloud through the VPN. That's why we call it public. So we used it. It's one of the examples. And actually, it's from one of our deployment. 172.setty.0.0 slash 24, not a big network, just 250 addresses. That's enough. Also for OpenStack, there are management and storage networks. They're internal for OpenStack deployment. The traffic in these networks, it does not leave an OpenStack. So it goes only between the nodes of OpenStack. And also we have, I think, about six subnets for virtual machines. They are represented as the last line. On the next slide, I show the physical diagram for our network and the hardware. So I mentioned that we have a firewall, 6.com, I say 5545. It has three interfaces. One is external interface, which is a public IP address provided by Datacenter, for example. And then two internal interfaces. One is for administrative network, 10.setty. And the second one is for public cloud network. So a firewall is our entry point to the cloud. And it also controls the access between these two networks, internal network management and external network of the cloud. Then we have a switch. This is a regular 48 port Cisco switch. It has just the management address in the management network. And we configure also all virtual ones on the switch to be able to provide communication between OpenStack nodes and between the virtual machines. And we have, with the chassis, super microchassis 16 blades, we have three nodes for OpenStack management services. Actually, in OpenStack, the role of these nodes is called controller. And we have 11 nodes for OpenStack compute and storage services. In a summary that practices for OpenStack deployment, storage nodes, they are separated from compute nodes. But in our case, to save the budget for development deployment and just to evaluate if it works, we use OpenStack compute nodes as well for the storage. In each node, we have just two hard drives. One is configured as the drive for the base system of OpenStack. And the second drive is used specifically for storage service of OpenStack. And also, we have two physical nodes for administrative services. And the interesting point is they are on the left side. And the interesting point is that for to run these nodes, we use ESXi, VMware ESXi, a free licensed hypervisor. So ESXi hypervisor can be used in free until you have two CPUs on the hardware node and probably without limitation by memory. So almost all the nodes, they are the same in configuration, 128 gigabytes of memory. And one CPU with eight cores. So once again, in our case, firewall, it controls the access to from outside world to the cloud. It also controls access how any service or visual machine can access the internet as well. So in our case, we just allow to access NTP services outside of the cabinet. And probably the mail service also should have access to the internet. That's all. Sure. What do they administer? Do they administer the OpenStack system or the legacy model? On VMware, we have several virtual machines. One virtual machine is used for OpenStack deployment tool. In our deployment, we use Mirantis OpenStack. And it has a fuel project or fuel tool that is GUI-based tool to deploy OpenStack and to manage OpenStack. So we spin up an OpenStack. Yes, the answer is we spin up an OpenStack from ESXi node. And then we run virtual machines for DNS, NTP, and SMTP services as well in ESXi. OK, let's see. Distribution of Cloud Foundry components and our backend services by OpenStack availability zones. As I already mentioned, to be HA highly available on the level of services on top of an infrastructure, we agreed to have at least three availability zones. These three availability zones, they are a group of physical nodes in blades chassis. And if we distribute Cloud Foundry jobs in these three availability zones, then we may presume that we'll have HA on the hardware level, on the platform level. And then we have sync about the HA on the application tier. So in Cloud Foundry, almost all components, they can be deployed with at least two or three instances. I don't have the three zones that's working fine for my perspective. But the administrative nodes are only one zone, isn't that the problem? It's not the problem because our administrative nodes, these are also two physical nodes. And virtual machines, for example, for DNS server, we use open source Linux tool called Bint. It has master slave configuration, and it allows to put the DNS records on master server, and then they are replicated to the slave server. For some services, we agreed that we cannot achieve configuration with two or three instances of virtual machine. And for example, in Cloud Foundry, these are database that contains all user records, all application states. And when we test it, if we lose the database, we cannot push an application in Cloud Foundry, but we can access application. That's OK. That's OK. So for Cloud Foundry, we deploy virtual machines for application containers in all three zones. In our test, we identified that to support from about 50,000 concurrent connections from the devices, which are emulated, of course, we need around six VMs for application containers. And each VM is 64 gigabytes of memory, and as far as I remember, 16 virtual CPUs. That's OK. And another important component that should be made as HA is router. Because it has a routing table for all applications, it also maintains all connections between external client and an application. In the case of the device, web-socket devices, they establish persistent connection. And this persistent connection, they consume, of course, memory, CPU, so the routers should be also adjusted. We apply the same principle of three availability zones to Cassandra, myEDB, and RabbitMQ. That works well, except that for Cassandra, we can extend the cluster, combining virtual nodes into racks. This is a terminology from Cassandra Rack. So in our test deployment, we have, I think, six nodes of Cassandra cluster, two nodes in each availability zone. OK, let's see what is outside of the cloud. I call this cloud resources. So we have a VPN endpoint, which is our entry into the cloud. And we have a domain name. Because this is a private cloud on the hardware, on a data center, and it can be accessed only through the VPN connection, we decided to use private domain name. For simplicity, I called it, and it's like a pattern in our real deployment, cloud1.cloudprovider.corp. So the corp, it means that it is a private cloud. And the main name is not resolved outside of our connection to the cloud. We have two DNS servers for HA mode. We also have NTP servers that are optional to expose to clients. And we have Cloud Foundry API endpoint. Actually, for the customer who connects to the cloud, they don't need access to an API of Cloud Foundry. They need to work with applications that are published in Cloud Foundry. But because when you access an application, you can hit the endpoint of Cloud Foundry. These are two addresses that I used to connect to Cloud Foundry API. And these are two endpoints to connect to an application, which is published in Cloud Foundry. Actually, it can be more than two. And at least we have to publish two endpoints. And to have two endpoints. And as I already mentioned, there are two main types of VPN connection. One is provided by any connect VPN adapter. This is a small software which is installed on the computer when you connect to the VPN endpoint of Cisco. And it is installed automatically for it is available. And the VPN adapter is available for Linux, Mac, and Windows. So no manual configuration. We just go to the VPN endpoint in the browser and it downloads a Tiscani Connect VPN adapter. And for the network, we set up site-to-site VPN connection, which is the well-defined process. So with the remote networks, we exchange with VPN template that defines all VPN connection parameters. And then we exchange with security password that is used to establish VPN connection. For every VPN connection, the recommended practice is to use the password of at least 20 characters. So it's two-step process. I exchange the template and then configure the firewalls on both sides, exchange the password, and the VPN tunnel is established. Let's see what we have from the network perspective when we have VPN connections. On this slide, these two VPN types are represented. The purple circle on the left side, it is a Tiscani Connect administrative VPN connection. For this connection, we expose from the cloud completely two networks, the public network and internal administrative network. So it allows using this network connection to manage all the physical hardware in the cloud and it allows to manage virtual machines, it allows to manage open stacks, so do any kind of administrative actions. And for the customer VPN connections, there are two situations. The first one is when the customer network does not overlap with our cloud network. And the second one is when the customer network can potentially overlap with the cloud network. And in the example on this slide, the customer one has an internal network, 172.30, which overlaps with our cloud network. That's the problem when you try to access any resource in the cloud using the same network address, which is inside our customer private network. To solve this problem, there is a technology which is called network address translation. It is defined by the RFC, special RFC. And the range of this network is defined by 100.64.0.0 and network mask, yeah, probably it's 10. So it is a range of four million addresses. So we can translate the address from the cloud using this technology. And it will be represented to the customer as an address from this special network address range. So in our case for the customer one, we translate addresses from the cloud. These are cloud fundering points and DNS servers. We translate these addresses using the special network range. So let's see how it looks. The representation of network addresses to three major types of VPN connections. For any connect, we expose from the cloud two networks completely. So we can manage hardware, we can manage OpenStack, we can manage virtual machines. And when we connect using the VPN adapter, CISCANI Connect VPN adapter, it establishes an automated way access to DNS servers and cloud fundering points. So all the main name resolution, it goes through the cloud in the case of CISCANI Connect. For side-to-side VPN connections, we publish on the cloud firewall, we publish only end points for DNS and for cloud fundering. Just in our case, it's just four addresses, four addresses. If the VPN connection without address translation, then we have original addresses, 172.saving. And the last number in the network address, it represents the real address of server in the cloud. And with side-to-side VPN with address translation, we translate these addresses to 100.64. And the last number is the same in addresses. And it's easy for us to remember and to understand what we expose from the cloud even with not. One just additional comment is that there are clients who requires to set additional address translation on top of these addresses inside a network. So the user inside the corporate network doesn't know the origination of cloud address. This is a double address translation process. It works with CISCANI perfectly. So let's see how to resolve the main names in the case of VPN connection. There are two approaches. And one is to set up the process which is called DNS zone forwarding. It can be configured on customer DNS server. And it defines that all requests to the zone, wildcard.cloud.provider.corp, they should be forwarded to our cloud DNS servers. And there are two cases. If we have a connection without network address translation and if we have connection with network address translation, the configuration on customer side will be different. But for some customers, it turned out that they don't want to do any changes inside a DNS policy. For that case, we decided to use public domain name. Let's say the cloud provider. I mean the customer, not the customer, but the owner of this health care cloud. It has a domain name cloudprovider.com. And for that, to be able to designate that it's cloud one, we create a record VPN dash cloud one. And the addresses are private addresses. So the resolution process is served by public domain name server, for example, Route 53. But addresses are private. And finally, to work with applications, when we have two domains, at least in Cloud Foundry, we will have a main domain. In our case, it's cf.cloud1.cloudprovider.corp. And we also add a domain for public resolution process. All these domains, they are shared in terms that can be used by applications published in different organizations or spaces. And the final step to be able to provide access to an application with two domains, we need to add a route for public domain name record. And in the last part of presentation, I would like to describe briefly the problem of connectivity for various device types. So in our case, we have to provide connectivity for TCP devices, which establish persistent TCP connections for web socket devices. They also establish persistent connection at some point of time. And also we have legacy devices that operates bidirectional HTTP mode. I mean that device can send a message to the cloud. And an application in the cloud also should initiate a message to the device. And when we have two customer networks, very likely that they will have the same network range. Let's say 192.168. And the routing from the customer network to the cloud, it is transparent. And it is supported by Cisco hardware. That's no problem. But the routing from the cloud to the customer network is quite complex. And in case if network ranges overlap between remote networks, in that case, we have to design a way so the pair of device and service in the cloud will be unique. And we identify that we can do this creating proxy services in the cloud. And they map on this picture by the color. So from customer one network, we have device in the red color. And in our cloud opens stack cloud. We also have a proxy server with red color. And for one remote network, there is enough to have just one proxy server. For the customer two network, we have a blue box with the proxy server for this customer network. And it works so it will work because the pair of addresses of device, proxy server, and VPN connection is unique for all connections. And this is a quite easy implementation. And it works on standard IP with version 4 protocol. We evaluated to solve this problem. We related some expensive solution like Cisco application-centric infrastructure. But it turned out that they don't work as we expect. And as an implementation for proxy server, we use open source engines. It can run on OpenStack. It can run on VMware ESXi. So in our case, to provide HE, we started inside an OpenStack. And the last set of comments is about migrating the platform to AWS. So our first platform was an OpenStack. But then we had to migrate to AWS. So in AWS, architecture is pretty similar. We use Visual Private Cloud in AWS. And for Visual Private Cloud, right from the start, we decided to use a network which will not have an intersection with customer networks. This is a 100. In our case, this example from real deployment in AWS, we use 100.64 as the network. And it contains Cloud Foundry, all our backend services, and also contains proxy for old legacy HTTP devices. And for proxy servers, we just allocate a subnet within VPC network. And as the VPN, we use the Cisco SA Virtual Firewall. Because we found that AWS VPN Gateway, that is a standard way to connect to Cloud based on the AWS recommendation, it does not fit to all parameters, does not match all parameters required for the VPN connection. That's why we decided to use Cisco SA Virtual Firewall. As the comment for Cisco SA Virtual Firewall, I would like to mention that the price is quite high. It's about $2 per hour, which turns into almost the same price during the year as the hardware unit. So this proof that our architecture can be ported between OpenStack and AWS, and it's fairly infrastructure-agnostic architecture. There are a lot of other chances that we saw during the project, which, of course, I would not be able to explain during the short time. But I will be glad to answer your questions after the session right now. That's all that I would like to share. Thank you very much. Any questions?