 So today we are going to talk about a pretty interesting and controversial topic. And that's why it's a high-performance matrix with mutable counters. And we don't know why I say it's a bit of a forbidden fruit, because it's something rather non-Orthodox for airline elixir and any language on the human general. You know, language on the human, everything is mutable. And everything is, you know, one of the main staples is that all the variables are mutable. And it's very safe for concurrency. And the ways to communicate between processes, you can do it through mostly message passing. So we have those messages for processes and messages to their mailboxes and that's how they communicate. There are no shared staples, however. We can also have shared, we can also loop at variables or data or information through ad staples or media, but there is no global state that any processes and other languages on the human access. So what we needed, I work for a company called ZAPTO, we'll talk about it in just a second. So we needed a matrix collection for a high throughput application. So ZAPTO, for example, is where we have 10,000 connections, each sending 10 messages per second. And we collect a lot of different statistics, we collect how many total requests, how many total cents, and then we also find further bias. So if we die and then we also want to collect what kind of errors we've been experiencing, we would just have much better load in the system if we just kind of counted out the errors of certain type and saw, oh, but this isn't happening. So we have a lot of, we don't have any on the matrix, and we have very high throughputs. And for us, libraries and works are really working into something else. So let me talk a little bit about ZAPTO. ZAPTO, we are creating an internet fourth place. Everybody is familiar with IoT, the term internet fourth, the internet of space. That's when the devices are communicating over the existing internet. The internet was designed for people and it's inherently insecure. All the security is sort of like patched, like slapped on top. But it's not secure. It was not built with security in mind. And it's cool, I guess, with people, but the device, it's a little scary. So also the other amazing about internet not being suitable for devices is if it's been optimized for download, and it has no deterministic response time, you can't really ever get into the response time with the internet. You can't exactly see that there's not good intention somewhere to slow down your requests. So devices are very different. You see devices are very different to the use case than people. So ZAPTO, we created this, we call it ZAPTO-Net. It's a network designed for saying it's from the start. And its main feature is built with security in nothing and none of its closeness to this network. Everything, all the devices that are communicating through it are already on the front, as well as subscribers, the applications that might be in the device's data. It's optimized for upload from the start and it has deterministic message transfer because obviously if you know all the devices are front, we will also not know or we will also look at just enough resources for them to communicate. So it was designed specifically for devices and it's a very interesting project to work on and it tested for very high throughputs. So existing libraries didn't work for us with such high-pressure, existing matrix libraries but didn't work for us with this high throughputs and it was extremely important for us that they work. So what's that? Now we're going to go back a little to... The problem is that you can run into high throughput applications. So message passing is a great thing. It allows us to do a way without multiple variables and it's very safe for concurrency and the problem is that the problem is that mailbox controls. So imagine you have a matrix collate and you're going to go to a very simple, that simple and kind of that solution for all the keys but just to understand from this perspective. Imagine a matrix collective gem server. You guys are all familiar with this and I understand you know what a gem server is, right? So imagine a matrix collective gem server so like thousands of processes or those tens of thousands of processes are sending updates about their metric updates that gather. What's going to happen now but it's very quickly, it's mailbox is going to be filled and then that response time becomes crazy slow. So one of the hugest periods in programming, narrowing or read service that you have to be careful not to overrun any processes mailbox. You have to understand what's going to happen there. So we could maybe introduce a hierarchy of matrix collective sending sort of kind of tripped out to the main one so the processing will be sending to like a pool of 100 matrix collectives and those who aggregate and send further out the chain. That's also sort of why the solution that came to mind at first but I think eventually you'll still have the same power of process mailbox being overrun on time or if you don't, if you miscalculate you have to be very careful like setting up a big enough pool and if your system is under load and you haven't expected suddenly this becomes a problem again. You see if you still haven't talked in a while. So in other way to keep update matrix and to get about what's to get about this general server approach, let's look at at stable, you can keep your matrix and updates to an at stable. That's actually a much better approach and that's what actually it is using one of the popular matrix libraries in our airline and I think Elixir will Elixir can use all the libraries that airline is using so it's the same. When I say airline and this in the context of this presentation it's pretty much, you can think of Elixir as well as just an airline developer but I forgot Elixir is all the libraries available in airline are also available in Elixir. So what the problem is now the problem that we are faced with with at stable is that we have stable or a level of white blocks so if a bunch of processors are trying to write to an at stable it will be very difficult because every time they write the whole table gets locked and so only one process can write to the table at a time. So even if we create a table for each metric it still won't be a problem too many processes are trying to write to just one metric. So this target was like I said, this target was like submitter which is a very popular matrix library in airline in Elixir as well and it worked great for us it was very nice library but when the system load became very high and we started having some problems without design at that time we needed to see the metrics the metrics would disappear they would be very slow they would show up once in a while and we really have no visibility of our system. So the problem was that so while the Elixir was fine the at stable level locks were slowing down to the point that it was useless for us we didn't see the statistics working even the most. So when existing solutions don't work what do we do? We're on our own and that's what we started looking at. So now that we've kind of explored the possibilities for an airline it was we have some C++ developers that were accepted we have both airline and C++ the C++ developers have their own metrics and solutions and they were like, you know, you should have those you should be able to have mutable counters at least so one of my co-workers was a brilliant engineer his name was David Mill he was he was he said, you know I obviously was great about airline but when it comes to metrics mutable counters are kind of really important it really makes that's where you can get high performance metrics so he moved forward something in airline and he indeed found something was already there as always we would have to come up with our own there was a repo created by a company about 360 they also have very high throughputs in this case they're a sports medicine company so they created this repo called OneUp and OneUp is a very simplistic repo it's just a bunch of myths and they allow you to create mutable counters they forbid the truth and remember in airline so there was very few it's a simple repo for us it was a gem for us for our case so you can create a new counter you can increase the counter and that's going to be mutable so many processes can do many processes can increment this counter it's without message passing or at stable cost so it has increment by one increment by a certain value you can also set a specific value in it all the time also it's just a bunch of very simple myths the one thing to notice is that my counter will be a reference so you can't look it up from a central location in one I will talk about the challenge with that just a little later but the cool thing is now is this OneUp repo we don't have to send we don't have to send our updates to a central gem server or to any of my alphabets where their address can be filled or to add stable which also is the white table I love alphabets gets very slow when we try to send thousands thousands of alphabets very quickly so great we actually can get little bottlenecks all together with this mutable reference counters and we decided to build our own matrix library on top of the OneUp repo we actually took it a step further we already had matrix we already used Xenica and it's very popular so we decided to sort of take the past of both worlds and we created a library called OneUp Metrics so the big challenge as I mentioned was the reference counter now we can it's great we have this mutable counter all the processes come and date from wherever they are but how do we how do the processes get all over them there's no central place to get it from so that's the early challenge it's sort of like a very upside down world kind of challenge for early usually you don't have to deal with it so our library because of that it was very easy for me to actually work and it worked great this approach worked great with our existing high school tests everything was great except for once we decided to open source and make it a friendly API that's when it became a little hairy not so it's great it's performing great but API is a little difficult so the idea we had was to just pass the reference counters around to the processes that will be updating them and to make it a little more organized this is what we did so configuration of your matrix is very similar to X meters it has people got the right hand side here it has the a list of atoms as your matrix name a specific matrix name for instance number of incoming requests to say a request received or something in a list of atoms and then what we do we are going to convert this we are going to convert this configuration into a hierarchical map so the map has the type of metric and the type of metric has to be metrics implementation which we are talking about in just a second so we convert it to a hierarchical map that will have the reference counters in it this is on the right hand side and every process can be passed this map will always have so the process that are central like the process that is receiving a specific request will be handing this map will have this map and then we will be handing the map further to the processes that are actually processing requests and hopefully that is clear it is a bit of a difficult idea to understand for early or at least for developers who don't ever have to do that just like and this map doesn't you don't have to pass the whole map you can pass a subset that is relevant for a specific process so you can I made the library such that it is very easy to grab a subset of metrics from the original configuration which is basically say here I need this map with a specific prefix and the next process because it is likely to update this set of metrics so one of the metrics behavior there is a module one of the metrics which maintains this configuration and looks over all the metrics that we have and it is also a behavior and in that metric when we decide what kind of reference counters we are creating for like every simple counter all we need is just one reference counter for meter and for gauge as well we just need one reference counter but for histograms it became a little more complex we actually needed a reference counter for values a reference counter for for the actual number of samples also for min and max so the actual implementation inside what it needs how many reference counters it needs to actually maintain that metric so in it metric does just that and then update is knowing what the reference counters are it will update it will just it knows how to update itself sorry cause a little unclear on this point we will look at various metrics of implementation in just a second so like I said we have one up counter one up meter it is based on drop usage formulas one up gauge and one up histogram and none of these metric implementations require a pool of values and we actually histogram is not a fully blown histogram because you will need full values for that so we created some formulas so we have a histogram approximation where you can just get away with nothing but reference counters and no pool of values because we don't want to deal with mutable pool of values that's going to be a little more complicated and we don't want to store too much around so there is so to make a clear understanding metric updates are very that's the very intensive process that's where thousands of thousands of processes will be updating those and those are the guys who don't communicate to any general server they don't hit any ad tables they basically just pass the metrics map up front when they are created so they have access to this mutable reference counters so they know how to update them with those co-pads the update co-pads and that's what one of metric's behavior is for updating the is for providing the methods that are for high intensity the methods that will be used very often and very intensely and those are not going to form meaning that those are going to have components with them except for you have to understand even if you have atomic counter and C++ even that guy can get slightly contended because only one process can upgrade at a time except for it's very fast and until the guy who created one of metrics he has some very cool performance tests and then he compares the performance of atomic C++ counter versus the ad stable co-pads and they are much faster but in any case so what we have is very intensive methods to handle very intensive updates from the thousands of processes and then we have a much less demanding side of things which is metrics aggregation and retrieval so aggregation will be done every 5 seconds that's fine nothing normal about being filled from that also metrics to retrieval we have an HTTP server or any kind of anything that wants to see the metrics any process you can create on top of it if you want to look them up every second that's not going to kill our system we are talking about tens of thousands of requests per second but if we are just aggregating or just looking metrics up that's not a problem that's only happening once a second once in 5 seconds so for that we have a GM server behavior that will all of these one-up metrics and limitations we looked at before they are both one-up metrics behavior which is for the high intensity processing and then they also are a GM server that knows how to collect how to deal with these reference counters how to aggregate them for their specific purpose especially here's the grounds are kind of interesting because every 5 seconds we'll be grabbing a value a reference counter and samples a reference counter we'll average them over time older values are no longer as important so this dual nature it's very important to understand about this library is that it has just the methods and then the GM server for things that are not very for things that are not very intense so I hope you I hope I got your interest to take a look at the library and maybe what you do if you are born with many high performance high throughput applications I hope you give us your library a try and now it's time for the Q&A I hope I did more or less good job explaining our library I hope you guys have questions Thanks Aarina Any questions from the community Aarina, I have a question I would like to first summarize your talk so you're saying that you had issues in collecting metrics because there were enormous number of actors or GM servers sending message to a single or less number of actors and then the mailbox was getting full which is not an ideal situation and then you guys evaluated some solutions like ETS tables ETS tables turned out to be not with the right locking granularity I guess Excuse me, I'm sorry I can't hear like super clear it's like white voice acting Is that better now? I couldn't hear very clear I'm sorry I don't know, you can't hear me I'm not sure what to do If you were near the computer you couldn't speak falsely to the computer Yeah, I'm actually computer so I just turned on the video Yeah, I can't hear you now Okay, I think probably I'll hand out a mic to the community if they have questions So my question is you talked about the mailbox getting full because a lot of actors were sending message to a single actor and so you wanted to collect metrics and you guys tried ETS table and ETS table had issues with locking and locking the entire table and then you guys basically resolved using C++ I guess native interfaces right so my question is why can we model this solution in the actor with the actors as in because actors are supposed to scale so is it the problem of actually modeling the actors in such a way to actually solve it in an actor way instead of going to a native interface according to C++ You're asking if we could solve this problem if you are an actor model if you just sort of say it was an actor model and solve it that way Yes So I think it's possible for instance so we couldn't think of the one idea I had was that hierarchical approach where you basically have a pool of actors that would be handling the metrics so basically you just send to a pool of them and then they will just sort of hand the metrics further down so the central collector will not be getting the metrics from tons of thousands of processes they will just be getting so basically with the hierarchy if I go back to the level here in the future we didn't get I was not I didn't get to implement it because it's first a bit complex and secondly like I said, suddenly your system is overwhelming so it's like maybe you are under a service attack or something something unpredictable happens what happens in that situation suddenly you get your mailboxes overwhelmed anyway because your pool size wasn't at all you will have to make this pool size huge at some point the pool size of the metrics collectors will be much larger and if you didn't suddenly you know suddenly you get the same problem with mailboxes then this add statement I think well if we'll say I created one table for every metric it's sort of it becomes a little slower then it's still a little slower than a mutable counter but it's possible to it's I guess it's I think it's better because a meter just uses all the metrics in one table but if you had one table for which is the idea maybe I haven't tried that but maybe if you had one entry for the next table your only or your process it would be only struggling for the one metric they're interested in but then I haven't tried it I'm not sure what the drawback of that is and again it would still be slower the reference counters we have but the reason why we decided to the one hand we are sort of breaking the extra model here but on the other hand it's a just the reference counter alone it's a very simple structure it's so long time we're updating this we're updating this mutable counter but it's a very simple thing it's not likely to cause big trouble to your or elixir server we are trying to avoid big myths, these complexities because once you introduce them your system becomes as vulnerable as any C++ system but if it's such a simple, simple use case and you tested enough your problem don't have a problem but the problem is that like you said it's very approach itself is very not like extra model because we are passing around this stuff on its counters into every I mean actually it is okay I don't think it is extra model so we got around to doing it it's just a little I find that API a little crunchy so I'm kind of looking that's why I share this library where we get feedback from mutable and say oh we can maybe do something else and still use your reference counters basically I don't want to give up on the reference on those mutable counters idea because it's very high solution in fact we are in fact we are looking into even taking even further and have rather than having atomic mutable counters first CPU counters and then sort of aggregate them and I'm not quite sure if it's going to be a big challenge aggregating those atomically those are going to be even faster because you end up not having a process struggling for the same counter so but I actually don't have a better solution at this point this is basic library that works for us and I am hoping to get feedback better solution or something that's more digestible for our after model so if anyone has any kind of ideas or suggestions I would greatly appreciate it next question can you hear me yes have you looked into the implementation of 1UP and how they do atomic counting because I would imagine if queuing in Erlang is a problem why is it not a problem in 1UP because since it's atomic I assume there's some queuing there too because atomic is very fast it's like a compare and swap approach so there is no queuing there is basically just they are just a little bit slower they're just so fast even though I agree that they had one, like to your question I have actually a situation where one of the matrix we were using was comparatively hard to make any kind of slope for us even with atomic counters just much faster with atomic counters there is no it's a much lighter way of locking just because it's a single counter it uses this compare and swap swap rather than locking so it's actually much faster but it's still the same problem just like you're saying in the end if the same, a bunch of the same processes are trying to update it it is going to be a problem it's just that problem won't happen nearly as fast as it will with those add-stables working out with mailboxes mailboxes become filled up and slow much much faster so if I have I wouldn't be able to tell you exactly if it's on that right now but if I had 10,000 processes right into a mailbox at the same time it's going to be very fast if the same I'll do to add-stables it's going to be a little it won't pose a problem it won't pose a problem a little slower but if I do this with atomic counters it will also become a problem much much later so it's not like the problem won't over insist it's just the problem becomes much much less with those C++ mutable counters that are like processed mailbox you see what I mean I totally agree with you that this when you try to update the same thing no matter what it is or how it is constructed from 10,000 processes at the same time it will still be a problem it's just this problem doesn't it doesn't manifest itself with these mutable counters for a long time it's a much more robust solution but in the end we actually like I said we had one matchup that was bombarded so hard and even that became slow even with the reference with those mutable counters we had so for that we have our kind of next development which has not been done yet but I'm looking into doing that instead of having the atomic counters we want to use a counter at the CPU level so if you know like 24 CPUs or 12 CPUs your configuration will be equal to the number of CPUs so you'll have that many counters as many counters as there are CPUs that will be eventually combined into one but then it's a challenge how do you do that atomic way so that could be done atomic way itself but if you every process in airline it was the schedule there is a way to get a process of Q number and that's equal to the CPU actually sorry I don't have the slides for that because like this is something I have not developed yet it's just an idea that we are sort of considering to make our metrics even faster having counters at the CPU level so every process that will update a counter will only be accessing the counter like something like my counter underscore 1 or my counter underscore 2 and that will correspond to the Q number and narrowing it's very easy to retrieve the process Q number and then it's easy to know how many Qs are running so we'll be actually instead of one metric counter I'll actually create 24 on the 24 CPUs my counter underscore 1 or my counter underscore 2 during aggregation I'll have to get it all and that might be a little tricky but that's kind of next step we are looking into is having CPU level counters and that's again the idea that came from our C++ guy who was actually working on metrics in C++ and he had that idea for C++ but it's very difficult to implement C++ he actually wasn't able to just never get around to it but it should be very easy to combine C++ and airline so we are looking at some crazy things to do but so far whatever we've done really worked for us those one up mutable counters are just much faster than ads or you know the process mailbox it's just yes and in the end we will have the same problems you are mentioning Thank you Any questions? Anyone? I just have a last question regarding this problem it's really interesting that actors are probably taking more time that's what you are saying so I think atomic counters were kind of less expensive solution and pretty good as well for us again I have not done a lot of Erlang but the question is can we use some kind of a persistent queue like for example Kafka it's horizontally scalable messaging queue so can Erlang system drop a message into Kafka and then you don't lose that message and then some other actors can take the counter messages out and then do the aggregation just asking if that kind of Oh you are talking about using there was a slightly more sophisticated queue mechanism right like Kafka yes so Kafka is horizontally scalable so I think the problem we have is too many messages actor systems can be huge in terms of scale so if you have a horizontal scaling persistent queue like Kafka and drop a message into the queue can that be used or has that been evaluated in your company it's probably probably a I could certainly like look into that if I may it's just this is I guess an interesting idea you could look into yeah probably matrix doesn't it's just I have a suspicion that nothing is going to be the counter any question internet connection yeah because I'm still with you I have a suspicion that nothing is going to be nothing is going to be like even any kind of queue will actually not need the compare it's what atomic this kind of counter is if I will start doing more complicated structures like with C++ then I probably will end up with performance it's not a winning performance but just because those atomic counters the way they are implemented they are so fast I'm not sure that and I know it's a little too cheap to pass around this reference counters in the lab, kill every classes the way I created the one up matrix library sort of I try to make it very easy so like there are methods that makes it sort of easy it's just behind the scenes that will put that map into a process into the process dictionary for you and that's again a lot of a lot of times process dictionary is not advisable to use but that's just a myth you can pretty use it for a lot of things so I express this dictionary to slap them up and they sort of like I guess I'm not answering the question anymore I'm just trying to say that hey we can we can sort of we can have this basically we don't have to restrict ourselves but especially for the matrix in this case the matrix collection is a very different it's sort of separate from your brain application so it's probably okay to break a little actor rule here but yeah I would definitely your suggestion that could have much faster way to implementation that's not going to show up like the mailbox there's something to look into I'm just suspecting that those atomic counters are the fastest that you can find the ones that are the simplest ones you don't see what I mean I'm sure there are better like you can have better implementation like an online process mailbox is not the fastest thing in the world possibly but no matter what I would say that having immediate atomic atomic counter to your disposal I mean if you're a process it's probably the fastest thing we can do it's simple sure you could try it was also a kind of pure implementations and other things but it's going to be more complicated it's going to be a much more complicated solution and it's probably still not going to be as fast but definitely something to look into all I guess I'm not an outcome model curious anymore not since I created this library I was before that I swear any question guys yeah I think we're good here thank you so much for really helping the community here talking from Chicago and yeah so early for you especially not being a morning person so thank you thank you for listening I hope I picked your interest in your library have a wonderful day guys enjoy the rest of your meet up bye bye