 It's on already, so. Hey, hello. Oh, it's working. Yeah, Muring is actually good. Just a minute. Hey, Muring is good. Just go there, Muring. Thank you, everyone, for coming here today. My name is Siddhartha Mani, as you can see there. I've been working for two years in the container industry. I started out as the senior software engineer at Rancher Labs. I'm a founding engineer there. So far, since I've started working there, I've had the wonderful opportunity to contribute to a bunch of open source projects, including Docker and Kubernetes. And let me start off just introducing some of the work I've done. I added the syslog log driver in Docker. And then I added a bunch of logging flags for it. And then I also implemented log rotate there. And then another open source project that I contributed to is Kubernetes, where I'm still contributing something called as the Cloud Provider Enhancements. This is a change in Kubernetes, where I am trying to disentangle Kubernetes code from Cloud Provider Specific code. The thing is Kubernetes today is very tightly coupled with AWS and GCE and Azure Cloud. So if I wanted to add a critical security patch into AWS, I have to wait till the next Kubernetes release cycle in order to get it into Kubernetes. So this was actually proving to be really hard to be agile with this kind of model. So I started introducing this change, where all the Cloud Provider Specific code would run separate from Kubernetes. So this was a change, which after I started making, I realized it's affecting a lot of components of Kubernetes, starting with the API server, which is the central management server in Kubernetes. And then the controller manager, the controller manager does all of the heavy lifting and the processing in Kubernetes. And also the kubelet, which is actually an agent that runs on every node in Kubernetes. So what it does is it adds a new binary into the existing plethora of binaries in Kubernetes. If you've already set it up before, you'll know that Kubernetes is hard to set up. It requires you to set up six different binaries in a specific configuration to work. And now I'm adding a seventh binary. So I was feeling bad because I'm really making everyone's lives harder. So I figured I'd talk about it and maybe remove some of the pitfalls that people face while setting up Kubernetes. So let's talk about what Kubernetes is. Kubernetes is a set of microservices that work together to act as a framework for running the sphere platforms. If you, I'm sure you've already read the slide, where I've underlined set of microservices, just like I explained a minute ago, Kubernetes is not just one binary. It's many binaries. Kubernetes helps you run microservices, but it itself is designed as a microservice, actually a bunch of microservices. So if Kubernetes is supposed to help you run microservices, who helps you run Kubernetes? Right now, you have to do it yourself. There are tools that are being developed, for example, Cube ADM and CubeOps, to help you do it. But it's still not GA or 1.0 yet. So you're kind of left on your own. And the second thing I want to learn in this definition is that it's a framework for running the sphere platforms in that it's a framework, and it's not a solution. So it gives you a set of tools, but it doesn't give you a complete solution. So you take Kubernetes, and then you add a bunch of stuff on top of it in order to get a complete solution. So today, I'm going to talk about what we've done at Rancher, where I work for Rancher Labs, by the way. Well, we've done a Rancher where we set up Kubernetes for you, and we've been running it in production for various customers for more than a year now. And I'm going to tell you how we have set it up, what we've learned running it for a year and a half, and just how we think it's common pitfalls so wide, I guess. So yeah, that's the kind of things we'll cover, which is how to set it up, how to do upgrades, some of the common lessons that you might already know, but that we've learned the hard way, and how we chose a networking provider, and how we do networking, and then we also how we do configuration management Kubernetes. So like I said, Kubernetes is actually six binaries combined together. There is the picture there depicts the binaries on the left and just a representation of your cluster on the right. So if you have a cluster of machines, you want to figure out how you're going to run Kubernetes in that cluster. You would have to figure out where you're going to put these six binaries and how you're going to run them and how you're going to configure them. So the six binaries are starting with HCD, which is the persistent storage layer in Kubernetes. It's a distributed key value store that satisfies the raft algorithm. And HCD is by far the most critical component when you're running Kubernetes because it has your data. Then there is the API server. The API server is the management server, is the central management server of Kubernetes. It has all of the con, it runs the central server. It has the gateway to HCD, and it knows how to create resources, delete them, watch them, subscribe to events, and it knows how to authenticate users, et cetera. Then there's the controller manager. The controller manager is a microservice that actually does all of the heavy lifting in Kubernetes. It creates services. It's like the central processing unit of Kubernetes in a nutshell. Then there's the scheduler, as the name implies, it schedules containers on horse. So it figures out which container to put, sorry, which horse to put your containers in. It does a simple task. Then there's the proxy. The proxy is responsible for setting up networking on each of the horse. Then there's the kubelet. The kubelet goes and starts containers on machines. It monitors it. It collects stats. And most of the machine-related things are done by kubelet. So if you can see, Kubernetes orchestrates a set of microservices. But it itself is designed a set of microservices. And you can see that it is designed such that each microservice does one particular job really, really well. So once you start understanding what these six microservices do, we can start thinking about how to break them down and how to run them in your cluster. So what we do at Rancher is we divide these six microservices into three different layers or three different tiers. The first tier is just the persistent storage with HCD. The second tier is the API server, controller manager and scheduler. So these are the three heavy lifters in the orchestration part of Kubernetes. These are the three services that actually manage your containers. So we put them into the orchestration tier. And finally, there's proxy and kubelet. They do node-specific tasks. So we call them the third tier node agents tier. So once you have this kind of architecture, you can start looking at how you're gonna distribute these microservices in the cluster. So if you have a cluster of machines, you definitely wanna make sure HCD is safe and it's running. So HCD is a distributed microservice so you can run multiple instances of it. So what we do is we dedicate specific machines just for HCD. So what we would do is just dedicate specific machines just for HCD. And similarly, we also want to dedicate specific machines for orchestration. So a part of the cluster is also dedicated just for orchestration by us. Then the rest of the cluster are the nodes which will actually run the tasks that you give it. Like it'll run the containers, it'll run your load balances. So we leave all the other nodes to run Proxy and Cubelet along with your workloads. So here I've just shown a representation of the division between management nodes in the cluster and worker nodes in the cluster. So the brightly colored ones which run HCD and the orchestration tier would be your management nodes and the rest of them would be your worker nodes. Yeah, we also do one thing where we make sure that HCD also is running on separate nodes separate from the orchestration tier. That is, if you want extra resiliency where you want to completely avoid a single point of failure, we like to run HCD just on a separate node. So that leads to our first lesson which is we run Kubernetes and I would really recommend that everyone runs Kubernetes, core storage and workloads on separate machines to avoid setups with single points of failure. There are a lot of blogs talking about how you would run it. There are a lot of people talking about it. People just divide it into master versus slave where master runs all of your orchestration and storage and slaves just run your node agents. There are some people who even run everything on just one machine. But if you want to seriously run Kubernetes in production, this kind of setup has worked for us in the past and I would recommend it. So now I actually have a setup. So I have six nodes on this setup where I have dedicated the first node just for HCD. So this node just, I can just show you a working cluster which is set up this way. Sorry. Player, sorry, I didn't hear that. Clear, oh. So this node, so this node just runs HCD. This is just a storage tier and I'm not running any other Kubernetes process. So this is my, in this cluster, this node which is C-H-J-S is simply just dedicated for HCD. Then I took another node from this cluster of six machines and I run the orchestration tier here. So you can see this cube scheduler, API server and controller manager running here. Then I took another tier and here I don't run HCD. Here I don't run HCD. Or the orchestration, which I've dedicated, so there's no HCD and there's nothing else. There's a graph for Q-Blet, but. So this is my third node which I've dedicated just for the workloads. So let me start. Cube C-T-R. The Wi-Fi is kind of slow. So I'm starting Q-Proxy and Q-Blet on this machine so this becomes a worker node. Then once I have this kind of setup, I can get the IP address of the master and I can start using this setup, this cluster to run containers. So this is N8-X-H, which is here. Just copy that and then go to my shell and I can replace my config file to have that URL and then I can start using it. So this is how you would edit the config file to get to start using the cluster, just doing a quick live edit where it's communicating. It's gonna return that one node that we just registered. Oh, yeah. So it's returning to other nodes which I haven't started on, which I haven't started Q-Blet yet. So it shows the other two as not ready. It shows TPRZ as ready and if I go back here, this is the node where I just added a Q-Blet. So once you divide your cluster into these three tiers, you can just see that it's easy to get it up and running. Going back, all right. So I talked about the Cloud Controller Manager that I added into Kubernetes, which is an extra binary, which is another controller manager that only does Cloud-specific stuff. So the Cloud Controller Manager would also go into the orchestration tier. So this is a new binary that's being introduced in Kubernetes 1.6, which will release March 28th. And you would add that also into the Cloud Controller, into the orchestration tier and you would run it along with the other orchestration services. So the way you would upgrade Kubernetes, there's only one rule for upgrading Kubernetes. It is you always upgrade master before you upgrade the slaves. The APIs are designed such that it expects the master to be upgraded first before the slaves get upgraded. If you do it any other way, there is a chance that your Q-Blet and API server go out of sync. So that is our lesson two, which is that you always upgrade master before upgrading the slaves. And also another recommendation is that you upgrade HCD before you upgrade master, or if you are upgrading HCD, always snapshot it and back it up. But these are general guidelines, I guess. All right. Moving on, so once you set up Kubernetes and once you have it running and once you can see the nodes like we just did, the next thing you would do is set up a networking provider. As a sysadmin, I work with a lot of them too and this seems to be a huge point of difficulty because it requires a lot of research, it requires a lot of information gathering and has a steep learning curve. So I looked into this and I saw what are the different networking providers out there and I found out there are many, many, many and in this slide I could fit in just eight but there's a lot more out there and most of them kind of fit everyone's needs but there is still questions about how do I pick one over the other. So for this presentation, I just went over a comparative analysis of a few of them to go over and then I went over different features of them and I've created a small table which might be useful in helping you choose a networking provider. So I chose the top three here which seems to have the most amount of noise in the community, Flannel, Calico and Weave and I analyzed them on seven different parameters. Those are application isolation, that is how well do they let you isolate different containers that you run on it and then networking model. Networking model will decide what kind of protocols you can run on it and then if it comes with a DNS service or do you have to set up your own DNS service and how hard it is to set up based on distributed storage requirements needed or not and then some of them support encryption, some of them do not and similarly three other parameters that I've mentioned there which is protocol if it supports partially connected host and performance. So if you notice this table here you can see if you really need encryption within your system, within your cluster you wouldn't go with Calico. So this table kind of helps you easily quickly decide what you want. Similarly, if setting up, if you already have HCD running and if you don't mind using that HCD cluster for your networking needs as well then you could choose Project Alicorn Flannel, Project Alicorn Flannel do not really require extra setup but if you did not want to dedicate if you did not want to use your Kubernetes HCD for networking too then you'd have to set up your own HCD for this or any other form of distributed storage for this and so that's extra work. Then if you also notice Calico supports something called profile schema for application isolation the others only support SIDA. So what this means is Calico lets you configure really complicated network policy based on something called profiles. It lets you add and it lets you do fine grain control over profiles like you could say allow traffic on this board only from Pod X or allow traffic on this board only from this set of nodes or nodes that belong to another specific profile. So that then there's also a really important enterprise feature that a lot of our customers ask for which is air gap support. Enterprises tend to run all of their infrastructure within their own private data centers rather than being on the public cloud and when they run it that way some of those clusters are not even connected to the internet and in such cases we would be an easier setup compared to Calico or Flannel because we can work with partially connected networks. Moving forward. So that was just a quick comparative analysis of just considerations that you would go into while choosing a networking provider. Finally now I'll go into the networking model and Kubernetes and if you've already set it up the next thing you would wanna do is run containers on it. So before you run containers there are some concepts and Kubernetes that you need to understand. The thing is Kubernetes is a really, how do I put this? It's just a really complicated piece of software to even work with not just to set up because it introduces you to a large number of concepts and I'm just gonna go over some of the concepts and some of the lessons that we learned while running Kubernetes with our customers. Like how do you design your applications to run with Kubernetes? I hope to answer that question at the end of this section. Okay so Kubernetes requires that every pod gets its own IP address. So if you do not know what a pod is pod is a collection of containers. So in the VM world each VM would run a set of processes and these processes were all co-located in the sense they all ran inside the VM. So a pod is analogous to that. You can run different containers but they all run in the same machine inside this abstraction called pod in Kubernetes. And Kubernetes requires that each pod gets its own IP address. Not each container but each pod. So in the VM world too the entire VM generally had one IP address per network interface. So I've just shown a representation where there's a completely connected network of different pods here. So then once Kubernetes takes the, starts working with pods, it creates pods for you. It abstracts a set of pods together as a service. So what it does is going back to the VM world if you dedicated a bunch of VMs for a particular purpose like running MySQL database those set of pods in Kubernetes, those analogous pods would correspond to a service in Kubernetes. So these services are a collection of similar pods that work together to a similar functionality. So Kubernetes automatically decides how you run pods. So you just tell Kubernetes, hey run a MySQL pod for me. It'll figure out, okay, I'm gonna put it on host one and host three without requiring an operator intervention. So once it does that, when it does that it automatically figures out how to which, sorry, it automatically figures out which host has the least amount of resources being used currently and it tries to pack in pods such that maximum resource utilization can be obtained. So I've heard of instances where people after switching to containers and Kubernetes have started using resources more efficiently and their public cloud bills have gone down significantly like 66% was what I heard from one of our customers. So going back into Kubernetes services, there are few rules that we follow while using services. One is that it's always important to create the service abstraction Kubernetes first before creating the pod. What this does is it lets Kubernetes know that the pods that are being created belong to a particular service and it lets Kubernetes distribute it more evenly and spread the pods out. And then one more thing we do is we, when we run pods, we run them in an abstraction called replication controllers. So this is another concept in Kubernetes which ensures that you always have a specific number of pods running. So if you always wanted to have at least three instances of HCD running to make sure your storage is up and running, you could create a replication controller and you could say set the replica count to three. And if one of the nodes go down, this replication controller will detect that and start the pod on a different machine and ensure that there's always three replicas of HCD running for you. So we found that it's better to create one replica first, check that it's working before just creating three. The reason being when you start running three replicas at once, it kind of starts polling the controller manager which is the CPU of Kubernetes. And if it's failing, the controller manager keeps restarting all of them and your events get clogged quickly. So debugging gets harder. So once you start with one, and if it's not working, you'll get a clear line of event stream which says, okay, this pod is not working. You can go debug it, fix it, and then you can start scaling it up. So on the right side here, I have provided an example service file. So I told you a service is an abstraction over a set of pods. So a service is like this umbrella that lets you talk to several pods of the same type. So the way you select pods using a service or when you create a service, the way you tell the service that these are the pods that you cover, you do it through something called as labels. So if you see on the service definition file that is there, there is a label which says Kubernetes app, K8S stands for Kubernetes. So K8S app colon engine X service. So this label is a label that is set on the service. And at the bottom, there's another field called selector which says K8S app equals engine X service which is a selector for pods that have that label. So in the pod definition, you create a label like above. And then in the service definition, you would select those pods using the label below. And then once you select it, the service gets a bunch of properties which I'm gonna go into in the next slide. So as soon as you create a service using Kubernetes and you start selecting pods, it gets inbuilt service discovery. So if you start a new pod and say, and put that selector label there, saying K8S app equals engine X service, Kubernetes will automatically recognize that this pod belongs to the service. And if you hit the service URL, that service will automatically start talking to the new pod as well. It all, the next feature of Kubernetes services is, so yeah, this is where you, yeah, this is my slide for sure. So this is, it also gets you automatic load balancing. That is, if you have, if you added the third pod, it'll start by default, it does round robin load balancing between the three pods that you've selected using the service. So when you start a service and it selects three pods, the first time you hit it, it'll go to the first pod. The second time you hit it, it'll go to the second pod. And third time, the third pod, simple round robin. It is possible to configure these options too. It's possible to configure this option when you set up the cube proxy, which was a part of the node agent here. So before you set up services, one of the things we learned while working with enterprise customers was that it is important to move all of your apps to a microservice architecture before you can fully leverage Kubernetes. So if you have a part of your apps running within the Kubernetes cluster and using its own networking and using its own private address space, and it's trying to talk to an external service, it's actually hard to configure and you don't fully leverage all of the automation that you can get just by using Kubernetes. So we found that it actually, you will start seeing the benefits of using containers in Kubernetes when all of your apps, all of your legacy apps move to a microservice architecture and are running as containers within the cluster. And another thing we learned while doing this was that a lot of people when they start using Kubernetes, they create services called host port or node port services. These are services that directly bind to a certain port on your cluster, on the node in your cluster. So when you start doing this, you're taking up the responsibility of managing the node and figuring out which nodes are free and which nodes are used. But that's something that Kubernetes already does for you and does it really well. So unless you're running a node agent that has to listen on every node on a specific port, it is not advisable to run host or node port services. So once you've created a service that is selecting a bunch of pods, the next step you'd wanna do is expose it to the real world, expose it out to the internet. So Kubernetes has a neat concept for that. It's called an Ingress service. And Ingress, as the name suggests, lets traffic into the cluster. So on the right side, I've shown an example of an Ingress definition in Kubernetes. So Kubernetes, Ingress controllers are fairly advanced. So it can do TLS termination or pass through base, it can do SNI-based routing, it can do wildcard-based routing. So it has a lot of features that people generally require. And it is also extensible in the sense that in your private cluster, you can run your own Ingress controllers in Kubernetes. So I'm gonna go into how the Kubernetes Ingress controller works. So going back into this service, selecting a bunch of pods situation, what you would do is once you have the service up and running and you can hit it from within the cluster, you would, and once you're ready to show it to the external world, you would create something called as an Ingress object. A Kubernetes Ingress object was the right side file, the file that I showed you in the previous slide. And what it says is at the bottom, it selects a service again, saying service name is nginxservice, and it selects a port. So what happens when you create this object in Kubernetes is there is controller manager starts doing this work where it goes and talks to an external load balancer or a load balancer for your choice, you can configure that. And it configures that load balancer to route traffic on a certain IP address and port 80 because we had mentioned port 80 before. And it configures that load balancer to redirect traffic to the service. Now the service automatically does load balancing and service discovery for you like I earlier showed you. So an Ingress controller kind of acts like a service, but for the external world. So when you talk to the Ingress controller, when you talk to the URL returned by the Ingress controller using from the external world, you would be hitting one of those spots when you hit it. So using the concept of service, it is possible to do, it's really easy to do zero downtime upgrades. So a lot of our customers either ask for rolling upgrades or blue-green upgrades. If you haven't heard of those concepts, rolling upgrade is where let's say you have three pods of a particular type like MySQL for it. And then you want to upgrade from MySQL five to MySQL six. When you want to do that, if you shut down all of the three pods and then started a new set of three pods, the time in between the stop and the start, you've got downtime. That is you can't serve traffic because MySQL is not running. A rolling upgrade is where instead of shutting down all three at once, you shut down one and then you start the other. And then you shut down the second one and then you start the new one. And then you shut down the third one and then you start another new one. So that is rolling upgrade, but there are certain softwares where there can be version incompatibility and it's possible that when you hit the server once in the middle of a rolling upgrade, it hits the old version of MySQL the first time. And then when you hit it the second time, it hits the new version of MySQL. And sometimes there can be API incompatibility and it's as good as a downtime in that case. So there is this concept called blue-green upgrades where what you can do is you can, let's say you have your ingress controller serving to this ghost pod like I've shown here. And what you can do is you can start new replicas, you can start two replicas of ghost V2. So you'll have two new ghost replicas, but it's not selected by any servers or ingress controller at this point. And what you can do is you can label the old pods as green and then they label the new pods as blue. Then I can just switch the selectors in the service. So ingress controller remains the same, the pods, there are new pods which is just running. All I have to do is do an atomic switch of the service selectors. So as soon as the selector gets switched, the pods are already running. Earlier, if someone is hitting your instance while the switch is happening, either they'll hit the previous version or new version, but there won't be the case of them hitting like V0 first and V1 next. So it's always, so blue-green upgrade is where you, in Kubernetes, you just upgrade the service selectors that I showed you earlier, and you say, select the new set of pods now, and it upgrades. So while doing a lot of blue-green upgrades for our customers, we learned that it's important to use labels with semantic meaning behind them. The thing is, this is the same as programming language basics where you don't use variables named A, B, or C, but give it actual names like var testfile equals openfile something. So this programming language basic principle applies here too. It's really easy to get confused with YAML files. So after a lot of pain and trouble, we really recommend that when you use service selectors, use names that are identifiable and have meaning behind them. So that was an overview of all the, just some of the networking concepts in Kubernetes. The way I structured this presentation was that I would go over most of the concerns that most of our customers have brought up, and along the way, I would talk about the lessons I've learned. And the third topic that I found people really being concerned about was config management and secret management. So Kubernetes does inbuilt config and secret management too, just like a lot of other features it has. It is really good at holding onto a small amount of sensitive data. You can, there are objects in Kubernetes called secrets. No surprise there. So the way secrets work is you can create a Kubernetes resource just like it's shown on the right hand side. And you can provide it values, you can put any values you want to put in. There's really no constraints on that. As long as you can serialize it into string representation, you can put it in there. So you create the secret resource. The secret resource gets stored in the Kubernetes persistence store, which is at CD. It's stored in plain text. And then when you want to use it, you can directly refer to these secrets from the pod. And when you, so this is how you create a secret. And then you can directly refer to the secret from a pod. And when you refer to it from the pod, it gets mounted on the node. So it goes from HCD to the node and that is coordinated by Kubernetes. And once it goes to the node, so as it goes to the node, it is encrypted by just a TLS connection. And then once it goes to the node, Kubernetes QBled actually creates a TempFS file system and mounts it just into the container that is using it. And when it does that, you can access it from the container as if it's a local file on the container. So these are two different ways to create secrets in Kubernetes, which is you can create it from a file or you can just specify a literal. So when you, you can specify any file from your file system and just say create secret from file X and it'll create an object and put it in HCD for you. So there are many ways to use secrets in the pod. The most obvious easy way is to just, whatever file you put in there can just be transferred to any pod. All you have to do is in your pod definition, you would say secret equals X and you would say a path you wanna put it in and it'll mount it for you. If you noticed in the secret definition here, there is a data field with username and password. So the username and password, the keys of that map are keys in Kubernetes too. So in your pod definition, you could say only put key username in my container and not key password. So you can only project certain keys into the container and it's also designed such that it's only visible to the one container in the pod. It's not visible to all the containers in a pod even though they're running together in a shared environment. You can easily specify a secret as an environment variable too. And after specifying it these ways, you can also control the file modes in it. So you can say it's only accessible by user with user ID equals zero, others can't use it. And the last really slick feature and secret that most people don't know about is that if you update a secret after it's been mounted, Kubernetes automatically updates it in the pod. And so the mounted file system automatically gets the new data. So once you have secrets up and running for your sensitive data needs, you might also want to do non-sensitive configuration management. So Kubernetes has another feature for that which is config maps. So you don't have to maintain a lot of configuration data. Manually you don't have to have a flat file system where you maintain your config files. It is really easy to use Kubernetes config management. It is really good at it. So we recommend that you all use Kubernetes config management rather than doing it yourself. It is very similar to secrets in the way you create and use it. It provides all of the features that secrets provide, except it's not meant for non-sensitive information. It's purely semantic. The features and the implementation internally are the same. So you would use it the same way. You would create it the same way. And these are three different ways in which you can use Kubernetes config maps. So that concludes my presentation. I hope I was able to help you in some way today. Thank you. It's gonna be posted on the after scale, on the scale website. Can you come up front if you have questions? Come and give you the mic. I think I have a different question. So one of the lessons you said you learned was not to use node port or host port unless absolutely necessary. Just wondering when it would be absolutely necessary. And I guess one example I'm thinking of is like a high bandwidth application, maybe gaming or video streaming where you want to avoid the bottleneck of sending everybody through a load balancer. So his question was what kind of applications would be appropriate for node port or host port services? So if you wanted a service to always bind to the same port and you wanted to reserve that port for that service, you would use a node port service. If you want to avoid the extra encapsulation that comes from having network, and so sorry, data going through two network stacks, the container and the host, there are different, you could choose a different network provider rather than using a node port service. Is there a software program or have you done it yourself where you can create those passwords dynamically instead of just putting them in text files? Okay, so his question was is there a program that lets you easily set up secrets that I might have written? I haven't written anything like that. I don't know of any. The API is easily automatable. You could use simple shell scripts to automate it. I would just do that to begin with, but if I had a huge need where shell scripts are just on scale, then I might have to sit and write on myself. So that's the answer. Any other questions? In the networking section, you said that for Veev as well as for Flannel, you saw a performance to a near native with VxLan. Were you doing some kind of offload or was it just like plain software install? Yeah, it was a simple type of test. More questions? Nobody else has one. I'll just grill you. So you mentioned the secrets YAML file and that's great, but where do you store that? You're not gonna wanna check that into Git or anything, so. So his question was, there's a secret YAML file that you can create where it comes from a file. So where do you store that? Well, you mentioned Git as an option. So Git is definitely an option. Sure, yeah, it's an option. So what happens is it lets you version the secret files and I don't actually have a better solution. Yeah, so okay, so you're saying that you don't wanna check in a password because it's stored in plain text in a CD. So what Kubernetes says about this, a lot of people have raised this concern and what Kubernetes says about this is that at CD shouldn't be accessible by any of the nodes other than the orchestration tier. So only if you specifically tell the orchestration tier through a pod secret request, you can use the secret. Otherwise, you shouldn't be able to use it because there's simply no network route to go to at CD. So is the question, where would you save the YAML file? Is that the question? So you don't have to use a YAML file, you can directly create a secret from a file or from a literal. In that case, you're not saving anything on this or on Git, so that's an option. Any other questions? All right. I don't know if we're out of time or not, but a bunch of your examples had MySQL in there and one of the lessons you said you learned was to port all legacy apps. I would assume you're not condoning running MySQL in Kubernetes or I would assume the data store you'd still want to live outside of the cluster. Is that not the case? So it's easy to work with a data store that's outside the cluster if you're just referring to it through a URL. But if you want to manage that too, and you'd have to have two separate ways of managing MySQL versus managing your Kubernetes apps, you won't get the automation advantages that you get by running it as a container if you still run MySQL in the legacy way. Are there any other questions? I guess not. That was so good, oh my gosh. No, it wasn't, yeah. I'm next to speaking. Oh, hello, nice to meet you. Thank you. This is first off. I know, I'm just joking. Because it was over here. Tell him that. No, no, no, I will. That's why I'm just waiting for him to be like, could they help? Yeah, it takes time to come off that cloud. Yeah. Let's come on up. Are you gonna go get beer? Yeah, babe, let's see what he's gonna do. Let's see what he's gonna do. Okay. Do you guys have more bottled water by any chance? Just so it comes to our, because my throat's getting dry. But this thing kind of spates on a beard if you have a beard. Then let's, I think I have a beard. Check, check. Yeah, no, I was here for that. The first few were. Yeah. Well, the other one was also, it was a good time. Yeah. It was a good material on it. Well, I'm sorry if I disappoint you on that score, but we'll see. You'll suffer through it. You tell him that. How you been? Yeah, I know. Damn you. I'm working with Mike's son. You guys, who do you, they're looking at you, but you guys are on the news. They have them on the news, they're on the news for on the news. They're really, really, so awesome. They're gonna be real white power. And also to you, and I've got a file on there. Yeah, I got some good shit. Whatever it is, it can't be hardware, right? It can't be hardware, right? Whatever it is. I mean, I know this hallway is somewhere. I'm saying to all of you guys on memory, I guess I'll say this. That's good. That's correct. Thank you for coming by. Yeah. Dude, I'm sorry, man. And to think that we can do the first talk with you, I really am sorry. Yeah, a couple minutes, depends on how steady the stream of people is. If it's full, then I'll start. Usually we've had technical difficulties, so I was a couple minutes late. Yeah. Yeah, you got here super early. Yeah. Very proactive for you. You work at Ticketmaster? Yeah. Oh, very cool. We used to work with Ticketmaster. I don't know what happened. I don't work on that side of the business stuff. We do staffing. Yeah. They're all here to see you. Yeah, I love you. Yeah. I actually had one of my meet-ups at Ticketmaster a few years back. Yeah. There was a recruiter there. Yeah, there was a recruiter there. Bethany or Heather or something. I'm greatly named. Paul Blanc. Been there for a long time. Yeah. So I can't get an internal job as a recruiter because I'm not Paul Blanc. Yeah. Yeah. Yeah. Probably they're not using agencies. Yeah. Yeah. Yeah. Yeah. Especially depending on the agency. That's not how we usually work. But yeah, there's agencies that just have like a bunch of crap people ready to go. Yeah. Yeah. We do the filtering for you and then do a couple that are really good. But it takes us a couple weeks usually, you know. Yeah. Yeah. Yeah. Yeah, that's why I picked it. It's always a really tough choice to figure out which one to choose. You don't want to pick their own horn. There's so many. I know. Me too. I think you know that person too. You're short. They're coming, they're coming, they're coming. There's a lot of seats. They're just in the middle. They're all in the middle, yeah. We've got more people coming in if everyone can move towards the center of the road so everyone can get a seat. That'd be awesome, thanks. She speaks. She speaks. You see Omar? Yeah, he's in the back. He's on Omar. You guys have an IPA? That's Kevin. That's Kevin. Marketing expense. It's a great way to meet people. Oh yeah. Of course, we're running out of cans. That would do. Spassmanics. There were some cops in the meeting. Yeah, they thought there was a different one. Also, there's got to be a different one. And I was like, wow. We really made an impression. No, Spassmanics. We do like that. Comp-site location is good. Welcome, everybody. I'm Carolyn. I'm from Q. We are the room sponsors. We're an IT recruiting firm here in LA. We specialize in connecting talent to our top clients in the technology industry. Let me know if we can help you out with job search or your hiring needs. We are here to see Chris Smith. He's going to talk about doing it live, machine learning at scale, and in an instant. Hello there, folks. Yeah, talking about doing it live. I kind of love Bill O'Reilly. Thank you so much for giving me that line. Yeah, we're talking about what we do at Ticketmaster that involves using machine learning, but doing it in perhaps a more real-time fashion than you may do it yourself. So yes, my name is Chris Smith. As mentioned, I'm VP of engineering and data science at Ticketmaster. And one of the core teams that I manage is the fan graph team, which is responsible for providing a single view of the fan across all of Ticketmaster's business. So every way that you engage with Ticketmaster, we want to have all that information in one place. Well, we'll get to the reasons why that would make sense, but it seems pretty obvious, right? So I wanted to share one brief thing that one of my teams does that actually is visible to you folks, because in the machine learning business, a lot of what you do goes on behind closed doors and nobody really sees it and can touch it. But if you look at the lower left corner of that screenshot there, you'll see a thing that says be in the know, get a text if more tickets become available. And if you sign up there and give your phone number, we will apply a lot of machine learning techniques to make sure that the shows that you really want to go to, you get notified about them and potentially even can purchase tickets without even having to go to the website, just purchase them over SMS with no risk of third parties getting in the way of your tickets. So check it out sometimes if you're interested in going to a show. So if you're not familiar with Ticketmaster, has anyone here not heard of Ticketmaster? Yeah, okay. That's one of the great things about working at Ticketmaster is I no longer have to explain to my parents what I do for a living. But really, if you get down to it, and this is a great phrase that one of my colleagues dropped down the other day that I thought really wrong true is we're in the business of selling fun at scale. We're the ones that help you get access to whatever kind of live fun you are looking for, be it a sports game, a concert ticket, a comedy show, we help you get there. And it's a fun job because first of all, everybody's very excited when you do it right and they're a little upset when you do it wrong. But also because there's a lot of interesting challenges that come from this particular problem of trying to marry people up with the things that they love. And to give you an example of it is we sell out stadiums and that seems like a nice little thing but we sell out stadiums in minutes and sometimes several stadiums at the same time multiple times in minutes. And that creates a little bit of excitement. You've probably if any of you have looked at activity graphs at a typical e-commerce site, you're used to seeing these like shifts between the normal traffic and the peak traffic that teach you that, you know, average traffic really doesn't mean the same thing as peak traffic because usually it's like 2X. This is what it looks like at Ticketmaster. You may notice so the one on the lower left shows the unique visitors per hour and that spike doesn't look as sharp as it otherwise might because it's by hour in the middle of the hour. In reality that kind of a spike effectively shows up in the space of like a minute or two sometimes. In the top right there you see all of the requests that are coming in through our load balancers around one of these moments and again you can see that pretty massive spike that's much more fine grained and so it gives you an idea of just how gigantic the spike is. And what makes this particularly troubling is that it isn't all nice people that are showing up at our doorsteps, it's not all people with good intentions who just want to get to a show and this is a graph that's overlaying and showing you a little bit of the underlying reality there. The reds are people that we were able to identify right from the get-go as not being up to any good just visiting the system in order to disrupt it. It would be people that we had enough reason to believe that they were causing trouble that it made sense to at least slow them down a little bit and the little bit at the top there those are the people who actually wanted to go to the shows. So this answers the question of why you need to apply some data science. This is one area where we apply data science not the only one, but it is one of the ones that's most easy to understand and this gives you an idea of why we might want to do this in real time because by the time you build a machine learning model even if you do it in a completely automated DevOps-y fashion the whole show is over by the time that model is built if you do it in a traditional batch model building fashion. So we have to learn in real time and this is additionally complicated because our adversaries in these cases these are not just random script kiddies who have really strong intent to disrupt the system their strategy is to create an arbitrage opportunity by overwhelming the system and so they are intelligent and constantly adjusting which means the model that works well for you today might not work well for you 15 minutes from now. So our model has to be constantly evolving and learning in order to prevent everybody from not being able to get their tickets. We know how that plays out. People are not happy. So now you got the larger context for Ticketmaster. We can go back to talking about why there's an interest in having a single view of the fan and what it's really about is captured by this moment. There's a whole bunch of people having a great time. For somebody, this is a great night. For some other people, if they were in there it's probably a bad night. This is not the band they want to see and listen to. This is not fun for them. And it'd be sad to mix people up and put the wrong people at the wrong show. It would make everybody disappointed. The band wouldn't like it. The audience wouldn't like it. Whoever paid for the tickets wouldn't like it. Everybody went happy. So we really want to be able to know who our fan is, know what their needs are and help marry them up get them into those experiences and, if possible, make those experiences even better than they expected. All of this is a lot easier to do when you really understand your fan and what they're about. Here's the fun part, though. How do you get an understanding of people? You take advantage of all the data that's out there. Problem is that means you've got a lot of different functions all of which need data and, if you do little point-to-point interconnects between all the different pieces that want to talk to each other and all the different places that you need to collate data and pull it together, you end up with what it would best describe as a web of mess. Data systems that you really can't manage very effectively. You get some other fun things here because everything is cross-interconnected. There are other things that pop up between these systems that are incredibly complicated and difficult to manage. From an operational standpoint, it becomes quite a bit of fun, even ignoring the fact that every one of those gray lines could potentially represent an additional development effort in order to get the code connected to it. We don't want to do this. In one other important detail, this is a very high-level diagram. There may be a lot more lines and it's also only for one ticketing system. We've got a lot of different ticketing systems because we operate in a lot of different countries and we've got a few through acquisitions as well. We have over 20 ticketing systems. Now try to imagine this overlaid with 20 other ones in three dimensions all with the same kinds of black lines between them and now you can't manage anything. This is the problem that we're confronted with about how do we really understand our fans without making such a complicated set of gobbledygook for hooking it all together that literally it would collapse under the weight of its own overhead. So we developed a few core principles of how we wanted to approach solving this problem. The biggest one was that the source systems, including the raw data, so if you buy a ticket wherever the tickets are managed, the inventory system that manages the tickets, that would be the source of information about inventory for those tickets. Those systems have only one job which is publish their data and we want that job to be as simple as humanly possible the least amount of development effort because otherwise it won't get done. If you make something at all difficult that doesn't provide most of the value that comes from publishing that data is not actually to the team that's working on that source system, they already have the data they can already work with it for their needs. Publishing that data is mostly a benefit to everybody else in the ecosystem and so you want to make it really low effort so that it really does get done. The other thing is that when you've got other systems that are consuming data from a source you don't want it to ever risk disrupting that source. You can have a system where a thousand different machines are all trying to pull the same information from one database and that starts to create a lot of load on the database and that's not at all helpful and it creates this complicated relationship now where every source system every time it takes on a new customer there's an additional amount of risk that it's putting in play and if it's an operational system like managing, I don't know, say an on sale for a Dell you don't exactly want to risk that thing going over just because somebody wanted to run an experiment in data science and model the transaction processing and the visiting for it. So we made sure that there's a buffer we want to have a buffer between the consumers and the producers of the data to protect from that. So we wanted as much as possible process data one record one distinct action at a time in a very stateless almost functional programming like fashion and the reason to do that is actually many fold and we'll see some examples of that going forward but there's a couple of pretty obvious things that come from that if you're processing them only one record at a time you don't need a huge amount of data data to process at a time you don't need huge massive disks and you don't have to worry about disk failures your fault tolerance and failure scenarios become a lot simpler and your code in general becomes a lot simpler now sometimes you can't actually certain problems do not fit that case and when that happens we go the other way but we try very hard to fit that kind of framework and in so doing it's amazing how many things you can get done in that fashion I guess if you actually try as opposed to just throwing stuff in a database and hoping the query will solve it for you. Other key thing is data is available everywhere now if you've ever worked in data science one of the horrors of it is you spend all this time learning about math programming languages and data structures and algorithms and then you get on the job and you find out that actually most of the job is schlepping data from point A to point B and if you're really unfortunate which is most of the time it's also just about figuring out how to interpret and parse that data and it's a colossal waste of time it also doesn't line up with any of your training so you do a terrible job of it and you end up with a system that doesn't solve the time and you don't know why because this is not the kind of problem that you solve. So the idea was if data is published it should be easy to get to from anywhere, any system it shouldn't be that I have to ask permission to get to it it's just there obviously there's PII type stuff where privacy is a concern that information is protected but anything that isn't of that nature we don't want to have some arbitrary division of labor between one ticketing system and another between the inventory system and the search system and all these different pieces we want them all to be able to understand our fans and that data is always archived this is really important if you're in data science because you want to be able to reproduce your experimental outcomes and again it's one of those things they don't train you on how to archive data when you're learning your math or learning how to do algorithms or data structures and so if you do it yourself you do a terrible job of it and you have that one moment where someone says hey let's test if the backups are working and that will invariably be a fail if it's the first time you've done it so again we make sure that that process is handled automatically and in a robust fashion so that the data is always available to go back to you can say hey experiment we ran last year and you can actually grab the data for it and rerun it and verify that the findings from then apply to now and then the last part it's a little bit kind of interesting and a bit of a tough concept for some people because it flies in the face of traditional models on this which is allowing each project to organize the data as it needs and what that effectively means is that each project can potentially have its own database for a given data set so if you have say a system that's the inventory system like we were talking before that shows all the tickets that are available and which ones are available at what price etc there's a tendency to say well that should be the one source of truth about what tickets are available where but then you have a problem in any place you can go to to get that information first of all it becomes a scaling pain point but secondly it becomes a one system that has to serve 20 different masters there's the actual purchasing process that just wants to be able to record who bought what ticket but then you've got marketing systems that want to figure out trends across the entire inventory that's out there and it's really hard to build doing everything and so instead don't bother trying to do that you can make everything very easy for everybody if you say hey I'm going to give you the data and you can organize it how you need for your particular problem and it's fine that nobody else can use it because I'd rather you have the data the way you need it and not have you spend a lot of time working around some generic interface that wasn't intended to help you do your job and I'd rather you go through that process there than you know fight that metaphor so now let's talk tech right we've made some choices about how we've been approaching this and they've been playing out pretty well so far the one that I always like to put at the top is using Kafka for our data transit we'll talk a little bit about the specifics of that component but it's the way that we make sure that the source system has published data it's available to everybody now we have other technologies that are related to this that we use for specific circumstances because no matter how hard you try to have a core technology there's always interesting alternatives that are useful in certain circumstances but it's fair to say that wherever possible our preference is to push the data into Kafka first and then work from there another great tool that we use is called CCAN and we use that for what I call data discovery so for example if you don't know what data sets out there say you came and worked for Ticketmaster starting tomorrow and you walk in and you know nothing about what data is anywhere and you want to actually accomplish something on your first day because you're ambitious it would be nice if someone just told you here's the data set you need but what would be even nicer is if you didn't need to ask anybody if you could just go and say I want to find all of the schemas that are out there that have tickets and have a price on them that way I can learn about ticketing prices and overall structures for them and let's you essentially search on the metadata of the data so you can find out what data you want to look at in these two because it seems like no matter how long you work at a company there's always that one system you don't know about that one data source that changes your whole perspective on things and this is a way to make that happen organically for people we've also tried as much as possible to standardize on Avro for serialization and most importantly extensible serialization the biggest mistake that you see people make with serialization is the format that works great for the data as it is but data is constantly evolving and changing just as much as systems are if you don't have a plan for how your data is going to evolve in a way that provides backwards and forwards compatibility ability for a system from five years ago to work with the data from today or a system from today to work with the data from five years ago you're going to find yourself extremely limited in the number of other technologies in this space they're all designed to be more efficient than say like JSON particularly for a machine to process which again at the levels of data we're talking about no humans ever going to look at this data let's not kid ourselves so the human readability thing that people get all excited about with data complete waste of time you can worry about human readable data after you read the first billion rows of this then we'll talk in the meantime I'm going to let the data be in a format that the machine reads very efficiently that's compact that doesn't waste space and also having a schema for it really critical because that way you can have code that isn't tightly coupled to the source of the data that can work with it without making false assumptions about the nature or structure of the data so you don't get errors because you thought this thing was an integer but it turns out it's actually a string even though like 10 billion records in they're all numbers but that 10 billion from one it says Popeye and suddenly your code blows up right so this avoids that problem we also use a tool called secor for archiving remember I was saying making sure that all this data is archived for all time so you don't lose it so we use that tool it's actually a tool from Pinterest and then we use Storm and Trident for what I would describe complex lambda processing so this is actually kind of the heart of the coding part of what we do and if you're not familiar with Storm and Trident we'll get into a little bit of details about it but it lets you do that sort of functional programming like fashion of working with data but allowing you to orchestrate very large distributed systems that do that kind of processing without getting confused in your own code without accidentally introducing all kinds of problems in your code it provides a lot of the plumbing for how to glue everything together and to execute in a principled fashion another popular tool for us is one called VOPOabit that's actually where the machine learning comes in everything else on the slide is technically just big data not really data science at all and again remember I said that schlepping the data version was most the job it's still true even if you do it right but VOPOabit is this fantastic tool and we'll talk a little bit about it but it allows you to do machine learning very very quickly executing very fast that's the trick that we use to get like millisecond level responses from machine learning algorithms and then lastly Elasticsearch there's a lot of ways you can store and there's a lot of different engines out there and we use dozens of them but it's handy to have sort of a default option that you go with for working with your data and Elasticsearch is a very flexible document oriented store that is a pretty well maintained product and I think the big thing is it gives a lot of flexibility for you to think about how you want to work with your data in an ad hoc fashion even more so than a traditional art even mess and so it's very helpful for those early exploration cases where you don't really know what exactly you want to do sometimes it turns out Elasticsearch is also the great tool for exactly what you want to do but if it turns out wrong remember the model that we're talking about you can always render the data in a way that's particularly convenient for your needs that's a perfectly reasonable and expected solution so once you understand what you want to do you spin up whatever you want DynamoDB Postgres database whatever makes sense for your particular problem but we like to default to Elasticsearch so break this down and show you some of these tools this is secan here and this is actually from the data.gov site because I thought I'd show you that because actually this is a fantastic tool a place to go for any of you that's even remotely interested in data science there's just an absurd amount of data sets there very high quality data very well organized with lots of information on how to interpret the data and the backing store for the catalog for it is secan and as you can see here there's they got a hundred and ninety five thousand different data sets and it provides a way to help you navigate to the ones that you care about gives you all kinds of information about them lets you either download the data sets or show you how to access the data sets if they're not something that can be just downloaded as a file there's documentation about each of the data sets you can do very helpful keyword searches etc for them and we use this internally it works basically the same in-house as it does out of house and this is Kafka for those of you who aren't familiar with Kafka you can sort of think of it as an infinitely scalable pub sub queue but there's a couple of really interesting properties about how it operates here which is why I wanted to show the slide for people who aren't as familiar with Kafka when you post to make it infinitely scalable you want to be able to split the load of the queue you want to split the queue itself in this case they call it a topic across many machines and that's nice that's handy that sounds great but one of the problems is if I just were to randomly fire off every record that I received to a different machine first of all when you want to consume the data you wouldn't know where to go to look to find the data that you're looking for you also wouldn't have any kind of notion really of the guarantees you would have on when you're reading the data of the order of events would be pretty much non-existent right if you read from one machine a record the only records you know for sure came before it are the ones you already read from that machine if you go to any other machine you can't be sure really you can try to use like a really carefully managed clock but anybody who's done software development for a long time will tell you that clocks are full that's where you get bugs in your code is by pretending you understand how time works so don't worry don't rely on time so what Kafka does instead is with every record that you publish you provide it with a key and that key defines which of those shards which of those machines is going to receive that message and in so doing then you can ensure that all messages with a certain key with a given key will always go to the same machine which also means they will also be stored in order for at least all the records that had that particular key on it so this allows you for example if you split up records based on someone's username you could then go to exactly one machine and consume the records and you could see everything that you know about that user in the order that it happened in the order that you discovered it and it really wouldn't be hard to write the code for it it would just be like read process data read process data easiest thing in the world right so now we've got a way to scale up really fast we can literally just keep throwing machines at it as the data comes in and it's a Q super simple except wait every Q that I've ever used before has scaling problems at the yin yang when things start to fall behind right and remember we talked about the fact that we didn't want consumers to be able to impact the producers so what happens is some consumer falls behind right and then you got to keep all the data in the Q and store it and it goes it swaps off a memory goes on to disk and then eventually that consumer wakes up and he says finally I'm ready now it's all on disk he starts reading it the system starts swapping and then every other consumer is now having problems because their data is getting paged out and then pretty soon with all the swapping happening in the Q the producer now starts having problems writing data because there's swapping going on the machine and before you know it it's knocked over and you've got this lovely experience where the producer who's innocently just providing data and doing their job and doing it well is getting knocked over just because of someone's buggy code one of ten thousand different people who've written code to consume that data it's just not fair and the worst part is the better the system is the more valuable the data you're providing the more likely that's going to happen to you how is that fair that is not right so Kafka helps with this because first of all the data in each of those shards is not stored forever it's stored only transiently and the notion of where you are in that Q is really your job to track they provide some services to make it easy to track where you are in that Q but ultimately it's up to the consumer so there's no notion of storing all the records just for this one consumer that's slowing down you just have this fixed amount of space or fixed amount of time if you want to do it that way but fixed amount of space is the safest you say I'm going to store the last 40 gigs of data in this topic and every time I get a new piece of data I'm going to delete a complimentary amount of data from 40 gigs ago and I will always have 40 gigs and if you can't keep up with me 40 gigs in I'm really sorry but you suck you don't get to keep the data that's your problem Kafka also has a bunch of other nice attributes in terms of high availability where it's purely pure based there's no master node here they effectively self organize and choose who's going to store which bits of data between the different brokers and of course there's a huge amount of redundancy as well Kafka tries very hard to make sure that there are multiple copies so that you can lose many many different nodes and never even know it and that's the important part we can't make a system that never has failures right computers break it's just the nature of the beast but if there's one thing I don't want to have to worry about is wait am I getting all the data on this is the transit really is the data getting corrupted in flight that's a whole bunch of bad code pathways and special corner cases I just don't want to have to think about and Kafka's been battle tested a lot of large organizations you know most famously probably linked in but all kinds of different places and they've been through they've been through the ringer with it they've put it through the ringer in fact they've tweaked it a lot as a result of that there's been significant improvements in Kafka from even a few years ago because of what they learned from that sorry nothing about Kafka but it's a key piece of the plumbing so I wanted to make sure we went through it in detail and here's an example of Avro an Avro schema this is a pretty detailed one because it's about collecting behavioral data on our users so every time you visit a web page on the TML site we populate this structure with information about what your experience is like on that page but as you can see it provides a lot very compact well you can't tell from here that it's compact but a very detailed representation of what is going on all of it with very clear descriptions but also it's very compact so even though this document is huge the actual record size for this can be a matter of 20-30 bytes potentially depending on what data is there well actually in this case it's more like 100 bytes typically but that's because you're only seeing about a quarter of the data in the schema here and this can be processed although this is JSON when it's defining the schema the underlying binary structure for it is pure machine language and so reads very efficiently so and you can add new fields in here and when you add new fields it compares the schema that you currently have with the prior schema and assuming that you're following the appropriate rules about how you extend your schemas it will automatically determine for you how to map the old records into the new space without you having to do any work so then I mentioned C core easy to scale system it takes advantage of the way that Kafka shards up data by topic and allows you to archive that data for posterity in big chunks of time so you have a file somewhere that is like this is all my data for 12 o'clock yesterday all in one convenient file that I can go grab and it can be sharded if you need it to be but it's often you just want to aggregate it in chunks if the topic is small enough it also provides a lot of different file formats that are designed to be particularly efficient for machine learning tasks like sequence files parquet and even the dreaded CSV it does compression automatically on the data for you and we use it primarily to store this data in S3 where it's so cheap that you really have to ask yourself why would I delete this data and by that I mean the process of thinking about deleting the data and then actually executing on deleting the data might be more expensive than just keeping the data and once you start thinking in those terms you start to realize what's going on you start to realize why would I ever delete it there's a possibility that somewhere in those terabytes is like ten cents worth of value and if I have to spend a hundred dollars to delete it and in the process I lose another ten cents worth of value that's like incredibly stupid I could have done nothing and still had ten cents worth of value I'd rather do that so now let's talk a little bit about Storm and Trident this is the part that might take a little bit of time to express I tried to find a simple example of using it but the key thing about Storm and Trident is that it allows you to have what they call its apology which is made up of these different components that are small simple software components generally speaking and it handles how they're wired together and figures out the optimal way to distribute them throughout a large cluster of machines if need be in order to perform whatever computational task you might want so this is a very simple example fairly contrived doesn't involve any machine learning but it's easy to walk people through so we're going to walk through it so this is an example of trying to measure if you had an ad campaign how effective that campaign is so you'll see that you start with the spout there in the left component the spouts are basically literally like a water fountain that's where the data gets thrown into the system from it starts there so there's some kind of source of the click data for an ad campaign that flows in and then you have on the top part there you see that data flows to a filter that is filtering on whether someone actually clicked on it or they just look at it and do nothing so now it's fed to that distinct object there who's only seeing the cases where someone actually clicked on it and what distinct is then doing is saying well I'm going to group them by distinct people effectively or distinct campaigns depending on I think this case it's people and then feeds it into a group by operation which is doing exactly what a group by operation would do in a database but normally when you do a group by operation in a database you got to go traverse and do some clever analysis this is much simpler as the data is each record arrives you just throw it into a bucket and then it does account operation while feeding it into a state and the states are these special objects and trident that are the only place where you have state because remember I said you can't actually always get away with never having to worry about state it would be nice if you didn't have to but it never works out like that and then you feed that into a state of these are all the click-throughs that I got and then the other path was starting from those click-throughs just doing a straight up group by campaign so now you've got not just the cases where someone clicked on it but also the cases where someone didn't click on it and you feed that into a state and that's your impressions that's how many times an ad was presented to somebody and then you can combine those two and you can get and you divide it by sorry the other way around you take how many clicks the campaign got and divide it by the number of impressions now you have a score for how your campaign is performing now you might look at that diagram and go geez this is a really simple sequel query I didn't need storm to do this and you'd be absolutely right it's not a hard problem but this has a couple of advantages over that approach this one is when you do that sequel query you're going to have to after you type that query in you're going to have to scan through all the data before you get back an answer and all that data will have to have been carefully organized on disk in a bunch of B3 indexes in order for you to be able to efficiently get an answer if you haven't done that you're going to have to scan through all of the data in order to get your answer and even then you're going to do a lot of login type operations in order to be able to get an actual useful report to get that data you might actually have to wait quite some time there's also the matter of well what happens if there's a disk failure in the middle of using my database right in this context a lot of that stuff is handled for you very cleanly just by the network just by how the components are wired together and doing this with tried and actually provides the one guarantee that's probably most terrifying for you if you think about this as a completely network based service what happens when there's a packet that's lost right that's the part that would keep me up at night at least right so tried and has a very interesting model the underlying storm infrastructure provides guarantees that you will process every message in every step of this topology across all the systems and each of these components might be replicated many many times if there's a huge amount of data it'll make sure that every single message on every one of those hops gets processed at least once and that sounds pretty useless because at least once means it could be once it could be a million times it could be 42 times what the heck is the good is that well that's a very simple semantic it's very easy to provide that underlying plumbing and make that work it's basically you fire the message and you wait for an acknowledgement and if you never see an acknowledgement you fire it again and you just keep doing it until you get an acknowledgement and then you move on to the next step to get that plumbing it's maybe not that useful but actually if you do proper functional programming you can do item potent operations where every time you receive the same message the results of the system doesn't change because you sort of effectively you know the system already knows and has absorbed the impact of that operation so for example if the operation was set this value to true I could tell you to set it to true once or I could tell you to set it to true a million times it's still going to be true nothing has changed so it's useful in those contexts but of course there's a lot of other context where it's not like that and this is where Trident helps you out Trident is a layer on top of that at least once processing guarantee that provides an exactly once guarantee so it basically the way that it solves this is what we call micro batching it groups the messages that are going through there into chunks of say like a thousand messages and it'll process those pretty much any way it needs to letting storm do its usual thing of retrying messages that have failures and when it gets to the final result for all those thousands that's the point where it'll have a checkpoint where it'll say have I actually successfully handled all of these thousand messages and if the answer is yes it'll say okay we've hit a checkpoint I am done with everything about all possible error states for those previous a thousand messages where like what if number two got lost what if number 99 got lost I'm done I've successfully complete everything I can forget all that state and I can move on to the next batch if there's a failure it can fall back and replay just that that batch and no forget essentially forget what it needs to forget from what it's done already in that batch replay it and know that as long as it finally successfully processes everything in that batch it will get to exactly the same state as if there was never an error in the first place okay and this is all again happening under the covers for you without you having to do the work so it takes a huge set of error cases and bad paths and takes them off the table and just says it didn't happen which I love the part where bad things didn't happen that keeps allows me to sleep at night to give you an idea of what I meant by simple functions here are the two main primitives that don't involve states that are in Trident for doing them. The first one is a general function and the first primer is a tuple which again basically just an arbitrary set of objects of a certain type and the second part is a Trident Collector which is essentially a thing for collecting an arbitrary set of objects of whatever type you define so between those two things what I've really defined there effectively is just a slightly more principled way of defining any function every function you've ever written takes inputs, returns outputs that's all it is and that is you can use that to represent therefore basically in the operation the other nice bit about this is if you think about it that looks a lot like a typical RESTful interface you pass in a whole bunch of parameters and you get back a rendered result that's REST and you take a function like this and with a very simple bit of generic glue code you could make it available as a RESTful service that doesn't run inside Storm without having the logic inside it having to be rewritten and you guys are all engineers so you know how much fun it is to rewrite something that's already been done so I guess we'll skip off that bit and then the other function is an even simpler case which is a filter and it's basically filters realistically are not really any different from functions it's just that they're such a simple case that you can have some very convenient semantics in your topology if you explicitly call out that something's a function or is a filter and really all a filter is a special case of a function where the only output result is a boolean value yes or no is it good or not so those two primitives can be used to build out processing for a simple function so if you go back here you can see everything that isn't shaped like a cylinder there those can be represented as either filters or functions so there you go easy processing and reusable components that you can move around from one place to another and scale infinitely well horizontally at least because infinite's a lie but horizontally and with protection for many failure scenarios now the cool thing is when you combine this with Kafka you get some really nice abstractions that resemble models that you would normally see in the more traditional not so stream processing that you sort of wonder well if I do everything push style how do I organize my code how do I avoid ending up with a completely complicated system where data's moving from one piece of code to another and it's just a big spaghetti mess well the cool thing is Kafka is kind of like a logical high level facade or service interface to an underlying amount of complexity so if you have a topic at the beginning of your pipeline that is like here's everything that's going on on my website and then the next topic is here's a summary of sessions and links that are going on my website you don't have to know what's going on inside that first topology that's publishing to that topic that's consuming all that data and reorganizing it and summarizing the session saying this user visited 12 pages they went to the search page twice they were on the site for 13 minutes all that information they can just provide a summary and so then subsequent downstream topologies can consume and work off that data without being tightly coupled at all to the original upstream component that was summarizing it so you can break up your code of operations into nice clean functional units with clear defined service interfaces that are defined as essentially topics in a Kafka stream topics in a Kafka cluster so now a little bit more details on it it's a very nice engine originally came out of Yahoo Research and then from there, no sorry Microsoft Research and then Yahoo Research one or the other I can't remember which right now but it provides for what's known as online learning or in particular out of core learning what that means is that it's possible for the engine to be constantly adjusting its model as each new piece of data arrives that means you don't have to have that big batch job where you collect all the data and you submit it to a giant cluster and you wait three days for the model to be finished built and then you work with that model from there on out because as each piece of data comes in you just feed it through the system the model gets tweaked a little bit to learn from that data and you continue to operate with it no shutdown, messages keep flowing there's no restart or reboot or anything like that and then the out of core part of it is even more important because one way you could do that is of course have something that just accepts all the data and shoves it into RAM and keeps it all in memory is to re-compute what the appropriate statistical model is it just scans through all that memory as fast as it possibly can instead with an out of core model most of the data is consumed and immediately discarded and all that happens is there's like a state of the statistical model that is computed as the message arrives and that's a very small amount of state that you can comfortably fit in at least a reasonably beefy machine almost regardless of data set size and in fact VOPL has some very nice upper bounds on the size of memory that it needs for any problem using the hashing trick to keep the feature space from blowing up and it does all this extremely quickly and you can imagine the fact that you get to discard a lot of the data after you finish processing it does make it easy to do things fast and basically keep get it through the instruction pipeline on a CPU and then you're done but it's particularly fast for handling problems where there's large sparse feature spaces so imagine if there's like literally a billion different factors that might go into a problem and any given example let's say like literally the features were every word that appeared in this case how many of those appeared in each particular message that you saw on Twitter you got lots of time most of the values are I didn't see this word it'll handle this extremely efficiently yes so you don't feed it a script it's an engine, you just feed it data so you feed it examples of cases that you want answers to the problems that you see so you say like what should I do here or true or false is this a dog and it will guess at first with no data the first guess is what do I know it'll say it's a dog or it won't say a dog but with no real confidence but then after you give it some feedback say by the way that one that you said was a dog the first time it was actually a cat and then it will adjust in real time it'll immediately learn better at guessing whether it's a cat or a dog does that make sense the other nice thing about it is it's not just a single algorithm there's a lot of early engines for doing this kind of work tended to be structured around one specific engine because a lot easier to tune code to be super fast and super scalable if it's only doing exactly one thing but you know engineers being engineers we want to make things generic and VOPL has actually got a pretty impressive personal of algorithms you can plug into it that allow you to cover a very broad set of problems and use cases you can use it to do binary classification with a little bit of work you can do multi-class classification you can use it for regression functions clustering feature reduction all kinds of possible functions it's really kind of a almost a Swiss army knife there are other frameworks that have a slightly broader set of functions but the reality of this is we've been able to use it for handling most of our real-time problems without having to think very hard it's also very scalable if you use the all-reduce technique which is a way of if you're familiar with Hadoop and big data there's a map reduce technique you can think of the map as just a really special case when you have disks if you're working in a true online learning environment there's no need for the map just data always gets reduced there's no way and so you can organize VOPL into an all-reduced structure where you can scale it horizontally as need be what's really nice about it what we like about it is that it has a lot of algorithms in it for active and reinforcement learning this is super important at Ticketmaster because there's really only so many tickets that we sell although it's a lot of tickets and a big data schema things for one event in a big stadium maybe 60,000 100,000 tickets at most that's not a lot of data big data you're used to working with billions of records if you have one and so what that means is you have very little time to learn so if I start selling out at a particular stadium and I start feeding my model and I get to 50,000 records and at that point now my algorithm is like maybe 10-20% accurate so I haven't really ingested that much data that's great now I'm halfway through the work and I barely know anything I'm slightly more intelligent than brain damaged donkey I'm not really very useful in this case and the sales are almost over all the good seats are gone so you want to use what's known as active or reinforcement learning where you take advantage of all the knowledge you have but you very carefully select your cases where you try to learn more to maximize the exploit versus explore operation you basically choose certain cases where it's like you know what I can make a lot of money with this operation I'm going to go with it I'm going to bet that this guy is going to buy want to buy this ticket in another case you might go jeez I've never seen a situation like this before I think I should really take a random guess rather than my best guess because I might be able to learn something more by trying out this scenario sorry do you have a question? okay alright thanks and particularly one of my favorite algorithms for this it's a tough algorithm to work with without the help of a framework even with the help of a framework can be a bit of a challenge is contextual bandit which you can think of as a variation on what's known as the multi-arm bandit problem where you walk into a Vegas casino and there's like four slot machines you have no idea what the state of the slot machine is but you know that one of them might be really close to paying out and a bunch of the others might not be close at all and so you start playing this game where you pull the arm and you know put in a corner pull the arm and it either pays out or it doesn't and technically each time you pull the arm you're learning a little bit about the possible state of those machines and so over time you want to optimize for what's the right thing to do now anybody who's been to Vegas and tried to play this game it's not a very good winning game right like you're not going to do well contextual bandit modifies the problem and says well what if there's a lot of contextual information that you have that gives you that has some correlation to the state of these different bandits but there's no clear obvious way that they're all connected together right so it literally might be you know if you toss a whole bunch of iron filings in the air they all land in a certain pattern around the machine and that might give you a hint somewhere about how it's organized but since you didn't design the machine you literally have no idea but each time you pull you can see how the iron filings move around and start to notice that hey maybe there's a pattern here the payoff comes when they're organized a certain way and so we can use this to say I have no idea how all this information about a fan really informs whether they're a big fan of Adele or George Michael or what have you but I can learn over time in a very smart fashion when's the right time to pull the trigger on showing them at Adele concert and when's the right time to show them something else sorry we're running low on time here so I'm going to skip through the elastic search stuff because I'm hoping most of you know what elastic search is at least a little bit but I will mention Kibana which is the secret weapon for elastic search to get into just about any business it's the cheap and dirty as in completely free dashboard tool right you can get a dashboard on metrics this is an example one it's not one that we use internally but it lets you represent all kinds of useful metrics and data from from your elastic search index with you know without a lot of work this is really you can build this stuff with a GUI tool so here's the example I want to make sure I showed you before we ran out of time how we did that original problem dealing with all those terrible people showing up that we wanted to stop and block one variation this model from what we normally use and it's probably because this is an older system is we're using zero MQ instead of Kafka for pushing messages between the source system and the target system zero MQ has one tremendous advantage over Kafka which is it's it's super fast because it only keeps stuff in memory and so we've held on to that for this particular system because this system has to respond in milliseconds now if you tune Kafka you can certainly make it get fast enough for our needs but it was nice that this was already working and doing what we wanted to do but you can pretend that you're looking at Kafka and that would describe how the system works so what we have as a request comes in it hits our proxy our forward proxy that routes it to whatever particular web app is going to be doing the processing of this request for someone say trying to buy a ticket for a particular show that information gets published to zero MQ and then routed down to the actual web application while the web application is busy trying to figure out what to do with the response it's unpacking the request and figuring out what it needs to do to take the next step we immediately consume the data into storm there's a spout that's pulling from zero MQ doing feature extraction pulling out all the information it knows looking up related information from the from the space and then it feeds that into Opalwabit with this again this machine learning algorithm that's constantly learning from all of the behavior that's going on on the site at the time that system then scores the session essentially that is coming in based on how much we like that user essentially how much do we think that user is abusing the system or not is really what it mounts to and it pushes that information out into memcache which again we use for speed but again it follows the rules that we have of there should be choice about where it gets pushed out and it's left there and that process is fast enough that by the time the web application is ready to finish its processing it can go and check in memcache to see what the score was from the machine learning algorithm and have a very high degree of confidence that there will be an answer waiting for it that lets it know whether or not it should go forward and process that request so that is the high level view but I wanted it to be easy to consume that describes how within a millisecond or two we can have a sense of whether you're a bad actor or not and therefore keep out all the people who are trying to get between you and the tickets that you want here's an example of some simpler use cases that we have and I put them up but it just one of the interesting side effects that we found is once you have an infrastructure like this once you have mechanisms for doing machine learning in this way it turns out that a lot of less computationally intensive tasks that are comparatively simpler but nonetheless non-trivial are actually really easy to fit into this kind of a framework and this kind of approach and so you start seeing this weird thing where you solve this really hard problem and that's great but you've also actually made like 20 other problems that maybe weren't quite so hard but none the less non-trivial they all become less of a headache so yeah, so takeaways from this when you use Lambda architecture which is what this fundamental idea of having these functions that do one specific thing in a stateless fashion one tuple of time you get a lot of reusable code that you can put again we can put stuff from our machine learning algorithm make sure it's the same stuff that's actually running on our services oriented architecture you also get a lot of autonomy remember all those different teams working on different functions they can pretty much work independently of each other, no problem even if they change the data that they're pushing out but it doesn't impact each other we also have much simplified scenarios for handling failures it doesn't mean we don't have failures it just means that the scenarios where those failures happen become a lot simpler to manage we also can simplify scaling I can just throw on new nodes rebalance my topology and suddenly Storm has made sure that everything's routed efficiently in order to do processing at scale you can do very complex computational work with very low latency which allows you to tackle a whole bunch of problems that you might not have thought it was even possible to apply machine learning to or for that matter other problems where you just didn't think you had enough time to do the computational work in order to get the answer you're looking for and that's the key takeaway is even those little simple tasks become even easier and the end result of this of course is you all get to have more fun which let's face it that's what the job's about so thank you very much let me I know I kind of chewed through the clock there and I was actually very appreciative of getting questions along the way I prefer to answer the questions as I go so thank you very much but it hasn't left us much time for questions I'll ask one or two and then I'll just hang out of the room afterwards everyone's got questions but yes see you back there so the question is do you have to be really careful about how you do updates and deploys about this because the system is so time critical and the answer is yeah um so interesting thing about this approach though is you don't have to be we do it out of an abundance of caution but technically you can have essentially two different topologies consuming from the same topic each publishing out to either the same topic or independent topics right and they can be running at the same time which means you don't have to shut down the old one before you bring up the new one and so it's actually entirely possible for us to do a deploy where there's zero downtime there's no increase in latency everything keeps running we just don't like to you know tempt the devil to mess with us right so we try not to pick a moment where something horrible could happen one more there from the gentleman in the back sorry sorry this two of you thought you were the gentleman in the back sorry the gentleman in the far back there and then I'll go to the one right in front sorry the far back there David yeah no you yeah yeah so we we don't do it a lot but sorry the question was can we expose real-time data and archive data at the same time and the answer is yes so an easy way to do it is essentially to run a program that reads from the S3 archive and republishes back to the original source topic and so it gets mixed in there with the other data that does create some potential ordering issues but as long as your storm topology is reordering the records or your operations are item potent anyway which is what you would prefer then it's absolutely no problem and this is this is a great advantage of this is I can use one framework for processing both real-time data and archival data all in one operation sorry so you had a question as well so the the question was like we use Kibana and well it is very pretty and you can string things together the query language for it is a little challenging to use at times there's a little bit of a learning curve for it and for that I would totally agree in fact the most frustrating thing about Kibana for me is the query language is different from Elasticsearch and I kind of know Elasticsearch and I kind of don't know Kibana very well so I kind of have to walk through it a bit so the question was you know how do I get up to speed on this so the GUI components can help you for the really basic stuff right and I actually rely on that so much that I like I said I don't really know the Kibana language very well I just you know every now and then I have to look it up to do what I want to do so you can get some a lot of your typical dashboard has fairly simple functions and you can populate it entirely that way other than that there are a bunch of examples of the guys behind Kibana and Elastic that they provide that are good guides but I but I will say that there is there does come a point of knowledge that you're looking to do something that's maybe a little bit beyond what a free open source tool can provide because you know doing business intelligence properly is a hard problem making it easy for people to really understand data any which way they want that's a hard problem the human brain is just not that flexible and and and as flexible the human brain is the eye is a lot less flexible so you have a lot you don't have a lot to work with there you pay a lot of money for a BI tool and that's invariably where you go to we use it as like a stopgap it's not a destination necessarily all right one more but I got to get out of this room because I know I'm going to get killed if I don't sorry you over there on the side so Heron is Twitter was one of the big consumers of storm back in the day and the gentleman was asking about Heron which was their own in-house built solution is fully compatible with storm it was a replacement and it had a number of advantages it was biggest thing I love about it is back pressure which avoids developers being confused about where their real problem is on their pipeline like you wouldn't believe but the reality is that by the time when they invented Heron it was head and shoulders had advantages over storm but by the time they published it storm 1.0 was almost out of the gate and storm 1.0 and Heron they look like they provide very similar features and it starts to become a bit of a head scratcher to what extent you're getting an advantage and in fact I've found that 1.0 storm has a number of other advantages that come with it that make it above and beyond the things that were added in Heron that make it easier to say do development and debug your topology than what you have in Heron and it's stuck with storm for now Heron is interesting and maybe we'll evaluate that again and see if there's any wins there but we never tested it out at a large scale for big operations so there might be some win there that I'm not aware of but when we looked at it in the small it was not tremendously advantageous. Now before I get into real trouble I'm going to put an end but I will wait outside the room if anyone's got any questions. Can you hear me? Yeah. Can you hear me? Yeah. Yeah. Okay. We have some minutes and yes. Let me check. Okay. That's better. Okay. I need it to make Oh. Can I set it by email? It's yours now. Yeah. Keep it. Would you please pass it around? Some free stickers. Oh yeah. Yeah. For everybody. We're going to start in about 10 minutes so what is your t-shirt size? Small, medium? I bring some t-shirts for people last year I was on the booth and I got a bunch of t-shirts and most of people run away with them and again then I got a lot of ladies asking, hey would you don't have small, do you have extractor L, medium? So for every lady here we have a t-shirt. No, no. We have more but I want to make sure that every lady get a t-shirt first. We can fix it now. Yeah. It is a problem. You go to a conference and ladies always want to have a good t-shirt and it's like bad quality XXL. Let me turn it off. Just two minutes. We are just on time. Yeah. People sit around. Do you want a t-shirt? For people that is just coming in we have some stickers too. Before to begin a couple of questions. Who knows Wendy? One, two, three, four. It's good. Who knows LogStash? So you came to the right track. Yeah. Thank you. I just saw just one more lady so we have three t-shirts for ladies. So please come here later. Okay. Well, my name is Eduardo Silva. There's my handle an email in case you want to send some concern or blame me for something. And today we are going to discuss about look forwarding. Maybe the concept it's kind of new. You have not heard about it but everybody has heard about logging in general. So a little bit about me. I'm an open source engineer at treasuredata. Most of my work happens on GitHub on projects like Flumebit, Duda and monkey project. The last two are personal projects, of course. So when we talk about logging everybody says that login is simple. So everybody agrees that it's not something fancy. It's not something that, let's do some logging. That does not happen. Everybody cares about monitoring a cool dashboard or some graphics or some charts. But logging is not that. It's like very, very, very bottom on the stack. And it's something that you want to just put it at work and never look at it again. So did it work. But it's not like that. It used to be simple in the past. We used to have just one application creating some log file and writing information on that log file. So, but why do we do logging? What's the main reason behind that? So we had the application. Right? But our end is not to log something. Our goal is to try to perform some analysis for the application. So, logging is not important for most people, but analysis it is. Because with the analysis of logs you can know if the application is behaving correctly if it's failing or maybe has some warnings that you need to be aware of. But to get from the logs to some platform or backend where you can perform your own analysis somebody needs to do some work. And actually there's not anything strange like that. There are specialized tools specialized solutions that allows you to take the logs and put this log in some central place. As an example, a database. It could be any kind of backend. And once you get that information in some storage like as an example elastic search then you're going to run some fancy tool like Kibana in order to query the information. At that moment you can do analysis. But this is like the end of the whole pipeline that we have. We have an application that needs to write some log. At some point we need to take these logs and put these logs in a central place because maybe you don't have just one application running. Maybe has 20 servers, 40 servers a huge cluster and you would like to unify those logs in a central place for analysis. Unless you like to do some SSH log into each server, query each log but I'm sure that nobody want to do that anymore. So internally logging is not so simple. It has a lot of complexity. And also in order to scale logging we need to understand how it works. What are the phase involved since I have my application and I have my endpoint of analysis. So there's something happening in the middle. So the goal of this session is that you can understand what are the phases. Maybe they are quite boring. Nobody want to lead with logging but if you want your application to scale you can do some scaling troubleshooting. You need to make your logging scale and you need to understand that. So we introduce a concept which is called the logging pipeline at different phases from point A to point B and it's like this. Ideally you want to go from the log messages to here. And you can do that in a few seconds. Elastic search, inflexdb, mongodb or any database or storage solution that you are using. So, but what that means each one. Think about this. The log messages can come from different inputs. One of the inputs the most common one are log files and most of the companies try to use log files and I have a network outage it doesn't matter. My logs are in the file system at some point I can log into the server and consume those logs. Also we have the new journal D which is the system D logging solution which runs locally or also from a remote endpoint you can talk to journal D because journal D has an HTTP server so you can pull the logs from there which is quite good. And also maybe you can receive or consume logs from TCP. Maybe you have some really small notes, cheaping logs over the network. So, the pipeline starts for collection. I have log messages and these log messages come from different inputs. Once I get the messages think about this the application that generated the messages can have different origins, different developers and likely different formats and different structures if they have it. But one of the common scenarios that everybody is trying to switch right now is JSON. I'm sure that everybody is aware about what JSON is but what would be the main difference between JSON and a common row text file? Do you know what's the difference? Structure. That's a different structure. For example, if you look at the... We're going to see this later. If we look at Apache web server logs you see a whole string but you know that there's a time you know that there's an IP address you know there's a URL you know there's a status code but the application does not know that they know about a string but if we manage to apply some or create some structure things will become easier and why we would like to have some structure? Because if you want to perform some analysis you don't want to perform analysis over a row text message you want to perform analysis over specific key fields for example, I want to query the whole HTTP server status code 500, an internal server error but in order to accomplish that you have two ways you can play with the bash cut the log, pipe, grep, pub, awk and you have a small command for that or you try to create something better with a different tool and in order to convert a structure to structure format you likely want to use some regular expression for that which is called naming capture okay so we get the data in from some place we understand the data we did some parsing but we also would like to filter the data I was in the booth here in the exhibit hall talking with people and they say no I generate like 1TB of logs every day and this is true and that is insane it's insane but it's required oh you were one of them because there were many and what's the important thing do you want to analyze the whole data think about this you have some developers that create some specific custom messages print F something oh this is working that is not a log message it's not useful and if you get a thousand of them every day you're wasting storage you're wasting computing time so a filter phase allows you to take this log which already has structure so you can determine everything that start with this kind of pattern please remove it do not continue processing this log or skip it or maybe append some metadata this log is coming from a remote host and I would like that this log I would like to append the IP address from the origin hosting that is quite good you need filtering once you get to the next phase when you are taking your data and you are buffering your data you need to buffer your data because luckily you want to send that information to a database or cloud service but even on 2017 services are down Amazon was down three days ago I think that nobody enjoyed that and when you get the logs and you want to send those logs out you cannot trust that the destination is fully functional you always want to design something you need to think that everything could fail so the thing is what I'm going to do to avoid to lose data so you do buffering ideally you want to do buffering take the data, keep it on your buffer either in memory or it's better in the file system if something happens I can try to send this information to the destination later that is what a good logging solution does if your logging solution does not have buffering capabilities you are taking your own risk then we have routing so we buffered the data and now the data needs to go somewhere but here's a special case there are many production use case scenarios where people said I have this information live information but I want to perform two kind of queries one is realtime I'm going to use some kind of elastic search which is pretty good for that but also I have a lot of archive information that is coming in but I would like to process it without realtime so at its own phase maybe with Hadoop you want to perform some map reduce so the routing phase allows you to take this data that is buffered and send it to multiple destinations because I'm sure that everybody in your company has different needs for the same data marketing team needs things different than the product management team decisions are different they need different data maybe different analysis over the same data and when you understand the logging pipeline that you get your message your parts, your filter, your buffer then you can route the data to one or multiple destinations that is the logging pipeline and that this is important to understand because in your environment any phase of here could fail now how you can deal with the logging pipeline of course you're not going to create some bash scripts or some cron jobs to do something that's where Fluendee joins in Fluendee is an open source solution that has been around for about five years that has been made to solve the whole logging pipeline problem from log collection to distribution my goal is not to sell you Fluendee my goal is that you understand what are the complexity of logging and how maybe this tool can help you solve one or specific problems because nowadays in the environments you don't have just one kind of a system or database for example people say no we had to use elastic search we had to use MySQL or something different but in reality every environment has many different components they use MongoDB, they use Redis MySQL for different stuffs this is the same logging needs a reliable solution now Fluendee was created by treasuredata but the company that it worked for because the company does big data analytics but if you have a service who does analytics there's something important you need to collect the data you cannot do analytics if you don't have the data so they created Fluendee originally to allow the customers ingest data into the system and it's the official agent for the customers but what's the difference we always made it open source and what happens people start creating plugins extending and they are using for their own needs and everybody is happy and in last November Fluendee joined the CNCF the cloud native computing foundation do you know the CNCF? just five people CNCF do you know Apache foundation? okay cool Apache foundation is a foundation to host projects, protect the projects and manage how they are going to evolve over time and secure that that will happen CNCF born from Linux foundation in order to accomplish the same but for cloud native projects for example Kubernetes, Prometheus Open Tracing and now Fluendee so Fluendee right now is on CNCF hands but the community could continue developing Fluendee at their own pace so it's not like CNCF will tell us you have to have a release in three or four months oh we have a critical fix we want to release now no you cannot do that no that will not happen suddenly that happens with other foundations but not here there are more than 600 plugins available it's insane to think that at treasuredata we build 600 at least we have built like 20 and we maintain 20 and others are made by the community it has a pluggable architecture when you deal with data data comes in different formats, different inputs so you need a pluggable service that can understand different input sources and manage that build-in reliability it has a very strong mechanism you're not going to lose data it's fully integrated with Docker and Kubernetes and it's written in a mix of Ruby with C the critical parts of a service which is internally on Fluendee sterilization data sterilization it's made in C now when we discuss everything about logging we have two more concepts one is log forwarder and the other is log aggregator Fluendee is both it can be a log forwarder or a log aggregator but what is a log aggregator is a forwarder plus a strong buffering capabilities and now how this plays in the whole ecosystem now imagine that you have a node where you have your application with your own database and you have Fluendee running on the edge node this Fluendee which is here it's working as a forwarder because it's collecting data locally and sending that data out to others Fluendee but which are behaving that's aggregators because they are aggregating the data they are buffering and trying to flush the data to a storage a persistent system is really good most of people use this but also everything has a cost if you have multiple nodes or different machines virtual machines or anything you will need to have each Fluendee on each machine so and each Fluendee needs at least 40 megabytes now if we talk about real big deployments for example there's a company here around I cannot say the name but they have like 500,000 deployments of Fluendee 500,000 if one of the big companies kind of color purple you will see so having Fluendee on each node could be quite expensive if you trust and you work with Amazon AWS you know that everything that you do in AWS costs money storage, CPU so consuming a lot of memory here is not a good deal so in some cases you will like to have some options for that Fluendee requires 40 megabytes as a minimum of course if you are filtering data you are parsing data and you are doing things more complex you will need more resources deploying a few hundred could be expensive if that use case is just a forwarder can we make it cheaper that's the solution that we are providing right now so instead of having Fluendee just as a forwarder and just as an aggregator we have created a new project which is called Fluent Bit which aims to behave as a log forwarder only and we let the whole buffering capabilities and all the other 600 plugins and Fluendee and Fluendee was created almost two years ago also for treasuredata it's also open source Apache license and this one also joined the CNCF when a project joined the CNCF also projects at least from the GitHub repository organization all of them joined it together so Fluendee is fully back up and we are creating this in order to improve how the login pipeline can work and make things cheaper for everyone now what's the main difference with Fluendee it's reading full in C it has a pluggable architecture same as Fluendee built in reliability it also does buffering in memory and is fully event driven and use asynchronous operations internally most of people ask why you didn't write it in Go because sometime ago we wanted to write it in Go we have some kind of partial solution but it got a lot of problems with ARM architecture the compiler on that moment did not work well on all environments so we wanted to have something that works everywhere even now it works on Windows it was a pain but it works so what are the futures of Fluendee it can take data from input plugins it can filter the data it has output plugins pretty much similar to solve the login pipeline it also has built-in partial support and as a minimum it requires 450KB so that is cheap and I'm sure if we say that it takes 2MB it's cheap anyways now what is the approach that people try to promote here in login is that instead of have Fluendee on the edge nodes just you have your forwarder a specialized forwarder here consuming almost nothing of memory almost nothing of CPU and making the same work that Fluendee does so the thing is when you have your login pipeline you need to approach it properly the properly tools and make things not too expensive because imagine this use case it happens the other day we have some customer who said ok I have my game my game has like a thousand users and they were logging everything to the file system and they had Fluendee running on it right but I don't know what happened but that game goes viral and they got like 500,000 something of downloads and people playing on it and guess what this is an analytic do you remember Pokemon Go Pokemon Go runs on Google Cloud Platform and they use Fluendee well it's quite expensive for them but they will need to switch to Fluendee a bit later so the goal here is that when you have your login pipeline understand which kind of tool and which place and also of course this will be times cheaper every node we just with a light weight look forwarder and let the aggregators in the other side so what are the cloud native features of Fluendee it's fully integrated with Docker and Kubernetes we are going to do some demos so I know everybody is tired I saw many faces last night in the party so we are trying to be more active also Fluendee has a really good feature which is called a buffering fully controlled do you know what is Docker what is a container please raise your hand okay when you deploy your container luckily in production you said this container is not going to consume more than 100 megabytes of memory but what would happen if your log collector it's consuming logs from a file which side is 1 gigabyte and it consumes more data faster than it can ship it to the destination because let's be honest Elasticsearch InfluxDB any database it's quite slow to receive the data and store the data and it happened to us I was able to crash Elasticsearch and InfluxDB because I was sending data too fast and that created a problem as the data could not be flushed the service memory started to increase so a controlled buffering allows to take some data from the input try to flush it or maybe keep just a fixed amount of data in memory for example you can tell Fluendee use just 10 megabytes in memory to consume data so you cannot consume more than 10 megabytes so I think maximum of 10 megabytes I try to flush it if it fails I cannot read more data because there's a post-resume controllers in the input plugins but you're not going to lose data because you're reading from a log file it's an storage, it's persistent and also it's really easy to containerize I would like to say there's no dependencies there's no language just one binary but it has dependency every application has dependency but the difference they're all built in so you don't need to lead with a shared library because all dependencies that Fluendee has are internally inside the same table and they are built at compile time whoa demo okay let's wake up can you see my screen it's a long path but okay demo one it's about to understand the difference between unstructured and structured data I know that most of you just get it but let's see an example for example let's look at this enginex log is that structured or unstructured unstructured why? it's a raw string right? PHP that's the only example that I found on the internet that's quite old PHP everybody did PHP sometime okay so that log is unstructured for example from Fluendbit I'm going to run Fluendbit from the command line there's a cute configuration file but I want to explain how it works I'm going to load the input plugin which is tail to a property which is the path from where it's going to consume the log file this time it's scale enginex.log and the output will be the standard output Flush one second okay what you're seeing on the screen it's something it's pretty much the same log entry that we have at top but Fluendbit tried to give it a structure and what is a structure? here it starts it's like an array that starts here and ends here the first field this is like an internal representation of Fluendbit it's a Unix timestamp and then it's a map with the log message the whole line here it's here to the end so we got a structure of the real message from a structure one but we are not taking in common the structure of the real message if you look at here the value of log all of these is just the same string did you get it? it's the same one there's no structure okay it said I don't have a structure okay I'm going to create my own structure your log right now what we can do to make this even better Fluendbit supports what is called parsers do you remember the parsers in the pipeline? so parsers are not cute but it's something that you need and of course you don't aim to write your own parsers your own regular expressions because we have templates for the most common use cases but it's basically like this the parser ng-next use a regular expression to parse every entry and this is the regular expression okay but we are adding some names the first field is remote the second one is host the third one user time, method, path code, size, referred agent each one of them try to apply a structure to the content that I'm writing but I'm saying that there's a time key inside the log there's some part of them that it said what time the log was created so and this is the format to understand that log so what I'm going to do is just to read the same log file and apply the parser not a big deal parser name ng-next so we were here Intel a file this is the input plugin I set the path I'm going to set a property parser equals ng-next do you remember that was the name but I need to load the parser's configuration file otherwise it will not work whoa this is the development version guys sorry I know that that would happen I said I'm going to fix that before the presentation no I'm going to type it happened when you type the wrong file name it fails but I said no it will not happen to me shit happens so here we go so what's the difference now we got a different time because this time was processed from the log file we got a remote key with the value we got the host which is empty the user the method get the path the code size referred agent now we have a structure and now this makes sense to insert this in some database like elastic because I can say please give me the whole records which agent is browser or everything that has a size more higher than 300 okay so at this point I think that you really understand what is a structure and structure okay accomplished questions okay so why the difference because having a structure we have an schema within the schema we can do a lot of things and also if we have an schema we can convert these to any kind of representation if we recall about the login pipeline we said that we want to take some log information and set that information back to a database but each database has its own format in order to insert data into for example elastic search is not the same that influxdv so at some point you need to take the data create some structure do an internal representation so then the output plugin can take the data and do their own format for the destination and of course when you have an schema you can filter that's times better so what about docker logs let's see everybody use docker but docker has many login drivers now I didn't put this in the slides but if you run docker docker has a native login drivers so you can tell docker please log this on json please log this to fluentd when you run your docker you can say that your login drive is fluentd and docker will write the logs over the network to fluentd but we are going to deal with this right now but from the log files perspective we are going to tail the log files that I have in my computer but I am going to run with sudo for example input tail the path is bar log containers I am not going to parse everything because it's a bunch of containers and stuff but we are going to parse the etcd the ones that we need is daemon that we have and we are going to send that to the standard output it will be 5 seconds because I didn't change the flow time there you go ok this is not like a good structure if you see this is the same case that ngx here is a message there is no structure but if you look carefully to the content of that log value you will see that it's a JSON because docker logs in the file system using json format so what that means that fluentd will say ok this is a json format it also has a structure but you need to tell it so let's do the same load our parser scale parsers.conf so it don't crash and what else we need to play the parser name the parser name and the parser name is called guess what it's 5 seconds sorry I didn't change it again so now you will see that we have some kind of structure now it's different now we got the right time just we got the right standard message let me check this is the log message ends here this is the stream and this is the time because we requested to have it there ok this is really good so as a forwarder we can gather the data from the file system right now just sending the data to the standard output we support elastic search, influxdv and our destinations inside fluentbit and we can also make it talk with fluentd right now this is really good so who else from here is using kubernetes ah you didn't say anything before so what is the kubernetes use cases things now getting a little bit complex because the applications run inside the containers not a big deal but containers run in a pod and inside the node you can have many pods so for who is not familiar with kubernetes kubernetes is an orchestrator for containers so it allows you to manage and deploy containers in your cluster but how kubernetes works if I have a physical machine or a virtual machine that physical machine is named node but when you want to deploy something in your node maybe your application imagine that is a web server the web server is not working alone maybe the web server needs a database and maybe needs a three-party service as part of their own life cycle so kubernetes say ok everything in terms of dependencies that your application have everything will be deployed in a container but a group of containers is called a pod pod and of course a node can have multiple pods and now how do you solve logging here so as we said earlier we can have fluent bit running inside each node but there is something interesting here I explained it without a slide but here is better when you have kubernetes in a very global overview you have an API server which is like the master guy who coordinate everything we have the nodes or minions used to be called before node 1, node 2 and what we have here inside the little boxes this is a pod for example pod 1, pod 2 they are both independent but both lives in the same node and the idea is that we put fluent bit inside each node it can collect the information from the pods but there is something interesting here everything that runs in a pod can have metadata for example when you parse your log files or from docker you can have the container name the container ID because that is provided by the file name but how you can get information what is my pod what is my cluster name what is my pod ID they are like kubernetes concepts because if you are going to do some analysis later you are not going to query a specific container you are going to query a specific pod or some labels associated with that but labels resides here so the log forwarder that you have here in your node needs to be able to read the logs understand the pods architecture and also gather the metadata from here and fluent bit is doing just write that fluent bit has that future from a year ago so well this is the demo let's see okay the I am not going to run directly inside a kubernetes pod because of connectivity issue that I have but instead what I am going to do is run my internal kubernetes cluster what is my computer it is not a cluster so we are going to talk directly to the APS server so we are going to do sorry we are going to enable the proxy so my machine can talk to the internal cluster it is already running there you go let's see that okay is there yeah so I can talk from my machine to the cluster okay I have a proxy running locally so now we are going to use a configuration file this is a fluent bit configuration file this is a minimalistic version so this file which is called cube.conf it loads the parsers it tells this is the input what we have in the configuration is pretty much the same that we have in the login pipeline we define the input the filters and outputs the parsing passes internally and the buffering is internally so we are going to parse the whole etcd logs of course I am going to read the kubernetes logs that I have in my system then we are going to apply a filter what does a filter a filter can add data remove data or alter data because what we want to accomplish here is to add the kubernetes metadata podname, namespace, id, namespace name so I just give it to our filter the address of the kubernetes API server or master server and once it flows through the data it is going to flow through the data to a standard output sorry, what was the question the filter is like a kind of plugin inside a fluent bit so the filter exists three types of filters as of now one which is called grep that you can enable it to grep that means that you want only certain fields certain records that has a match against a regular expression or you want to use it for exclude some kind of records and we created a new filter which is called kubernetes a filter is like a plugin, it's a source code that takes some data and does something with it this is all in C this is not an external process everything is inside fluent bit I am going to show you a structure later let's look at now plugins filter kubernetes we are loading this this filter well, there it has like 500 lines, 600 lines it's not too long but what this filter does every time that you get some blocks it's going to talk to the kubernetes API server get the metadata and append that metadata to each log so let's run it we are going to use the configuration file, remember kubernetes it should work there you go now the same log files that we get earlier from the bear docker now we got it again but we get it with some extra metadata from kubernetes okay this is the tag this is like a file name here start the log pod name etcd monotop name space name, everything from kubernetes kubernetes, container name that come from docker docker id the pod id their labels, component tire control plate, annotations to to to so that is the goal of a filter that it got an original record that did not have anything did not know anything about kubernetes but got the information and append that to the log because when you do some logging in production do you need that information maybe you have a hundred of machines and you need to understand where they apply, in which pod that application failed how that works, okay that's a good question I was just asking is there any HTTP request for every log file that we get of course not we have some cache why? because every log that we get here has a key value pair the pod name and the name space name once you are tailing a new log file there's one HTTP request to the server because it needs the metadata for that pair and it keeps cache locally, in memory so we got this for kubernetes now the metadata status is like, okay this is like explaining pretty much the same the pod name, name space, container name, container ID and if you are not using kubernetes later do you need to start learning this because this will be the default or the default way to deploy things about full embed and pretty much that you are a little bit convinced now that it works why is so good because it internally it does, it works with coroutines do you know what is a coroutine okay who's a developer here raise your hand okay when you develop some network service and you try to communicate to somewhere do you tell the operating system please connect to this server from the point that you invoke the system call to connect there are many phases internally so you have two choices or you just stand up waiting or you just send the request continue working and get back later when that response is ready of course the second method is quite better but it adds a lot of complexity because you need to connect maybe you need to write some information maybe you need to then read some information and on each phase of that something can happen, something could fail but the good thing about coroutines is that it allows you to perform that operation from a developer's perspective the developers for example we have an HTTP endpoint a web server FluentBit provides a full API for the developer where he said okay the developer just said please connect to this endpoint the developer thinks that he's connecting and is just there sequentially but as soon as the API connect try to connect it get back it try to consume log and get back the control at the next line so coroutines allows you to keep the states and to solve the asynchronous problem the asynchronous problem that we have when we develop something with this model coroutines are not magic it has some complexity but the goal with this is that if you are implementing something you just for example if you want to send some HTTP data to the server just invoke one function and internally let FluentBit deal with the whole complexity about networking, errors, buffer problems and all that kind of things because imagine that we can create we want to create, I don't know 30 output plugins for different stuff is anybody going to handle every single detail about networking issues? no if there is a person who writes an output plugin they just want to learn about how to take the internal representation of data how to convert this data to my format and how to send it and I don't want to have problems when sending it so that is coroutines coroutines allows you to suspend the execution in a process and retake back the control later at the same point and all of this is done with some synchronous event and Linux is with ePool on OS 10 and BSC with KQ and if you want to deploy FluentBit on Kubernetes you can deploy it as a demo set a demo set is a special component in Kubernetes that means you can deploy an application but if you say this is a demo set that application is going to be deployed on every node of the cluster and it's not taking notes because the slides will be available later so that is important and of course all of this is open source and now we have contributors around the globe from the States, from Japan and cool so what is the status? right now we are releasing the version 0.11 right now in March before the cloud native code in Berlin which is now March 20 so like a few weeks we are releasing that 0.11 that comes with the parsers, memory optimizations and everything that you just saw here the Kubernetes support is just new, it's just landing and for the next version of May we plan to add a multi-line support for Intel and add monitoring capabilities because in the CNCF or Kubernetes ecosystem you want to monitor everything how X application is behaving and also if this application is part of the stack of the component of course you want to know how much memory is consuming how many lines is consuming per second and that kind of things so here's the final information there's the project twitter full documentation one of the biggest things about full embed is that from the version 0.1 it has documentation so we have really good documentation trust me this is the GitHub repository and if you want to contact us we have a Slack channel I know that some people are afraid to join Slack because it consumes too much memory I was invited to 7 Slack teams, it consumes more than one gigabyte of memory, it's crazy so but if you want you can join or just follow us, fluentbeat on twitter thank you if you have some questions, we have time because there's no more talk after this yeah, the Intel plugin supports a database with SQLite optionally so it can keep track about files rotated or status if you were telling some file and something and fluentbeat goes down for any reason it can retake so you don't part from the beginning yeah so every time that we re-read a bunch of bytes we track what was the last position so we can retake context at any point yeah we track rotation, the only problem is when the rotate daemon does truncation mode but that is a problem for everywhere yeah a fluentbeat to fluentd and it's very directional it's using TCP a protocol which is called forward it's an internal protocol, it's pretty much the representation that you saw we use messagepack, do you know messagepack? okay, the creator of fluentd he created messagepack years ago so messagepack is like a binary JSON but it's a serialization format so if you want to see the track from fluentbeat to fluentd it's messagepack over TCP it's basically that it's different, yeah the configuration from fluentd it's similar but it's different we have a different format because if you saw the fluentbeat configuration and you have some section name and you have some spaces if you break the configuration you don't follow the right intentation for example like in Python the service will not run it's like try to force the people who use it to write a good configuration but in terms of compatibility so there are many things that are similar but are not equal no, we had to write the syslog plugins for fluentbeat, we don't have it about plugins something that I didn't mention I don't know why, we have Golang support so that means if you want to write your own output plugin you can write it in Golang and link it to fluentbeat that works we have somebody from Samsung writing the Kafka plugin in Go if fluentbeat read from namepipes eh no nobody asked for that yeah it read from standard input from TCP for many things it's not really from input please log an issue on the github yeah it's quite simple the aggregator think about this every component here in the stack either fluentbeat or fluentbeat even if it's an aggregator it's consuming data so it's going through interfolder parser, buffering so yeah if you get the data on fluentbeat you can do whatever you want with the data put the data to different fluentd's or send it to destination it has some load balancing too so that's why it's a really good aggregator has some failover modes yeah exactly it depends on how you that is called aggregation patterns about how you distribute your fluentd's or logging solutions in your architecture but yeah you can make fluentd talk to each other fluentd talk to fluentbeat fluentbeat to fluentd do a ttls if you want and it will work how do they find each other no they don't find each other it's not like a cluster master slave mode when they can look up each other so you have to apply that configuration manually but if you're working on kubernetes that is different because kubernetes makes sure to add some kind of label with names to each service and they have an internal dns stuff to auto rediscovery yeah the goal of fluentd initially is that it's not do not aim to replace anything if you're using syslog in your device or you imagine you're happy with it please keep it if you have syslog engine and you can send the logs over the network fluentd can be used because you put fluentd and say please listen for a syslog so you point your own syslog services talking to fluentd so that makes a lot of sense it's a really strong use case the thing is that you don't touch your architecture if something is working you don't want to hurt anything put fluentd change a plugin change a configuration because it can listen from journald, from syslog, from files from tcp even you can flush data from do you know elastic search right do you know file bits file bits is to collect logs send it to logstash and then to elastic search yeah you can let fluentd listen for message from file bits and let fluentd talk to logstash or either elastic yeah it's like a multiple it's very pluggable wide file bits versus fluent bit I think that there's not a strong reason I think that file bits works really good so I honestly I always say I'm good for you don't fix it let it run now if you need but you cannot run file bits on Kubernetes and how do you get the metadata I don't that's the reason yeah that's the thing so if you want something that can get the metadata and also happen that the whole components of elastic are very tight to work between them they're not generic to talk with different solutions so if you use file bits it's either elastic search or logstash maybe they have some other endpoint right now but it's not quite flexible that is not the philosophy of fluentd and fluent bit well the thing is you have to be honest if you want to do parsing you're going to apply some regular expression that will cost you that is not free if you add more filters you have more overhead you will have it now of course you can find some ways in order to say I'm going to parse just log files from this directory and the files that has a different pattern I'm going to add a different input because you can have many inputs so you apply the parser for the inputs that you are really interested on I would say that don't touch the syslog side let fluentd and fluent bit handle that because parsing anyway is expensive but we always try to optimize for performance always so you cannot get it for free okay thank you oh sure yeah it will be online I have a lady that will kill me if I don't send her this slide over email okay thank you guys I have t-shirts I think that for everybody if you can make a line you have some t-shirt please come here I have a question on large just saying which size are you which one yeah I wrote an influxdbo plugin for fluent bit large okay help me with this let's put it there each one try to pick just one yeah guys if you don't want it just leave it there somebody will want it tomorrow larger here okay you said about influxdb yeah we just have a customer question about they want to flush data to influxdb faster so we wrote the plugin for fluent bit and we actually crashed it because it cannot handle the load so yeah it's fully supported with the right configuration because influxdb is good but you need to be ready to run some data fluentd already runs on windows so that we are going to announce that for the version 1.0 that is going to be released soon and fluent bit it builds on windows it builds and runs basically yeah okay thank you you found it? it is large okay