 Thank you all for being here. Just a quick note, this is my first ever talk, so please be gentle on me. What I'm going to talk about today is building this thing, which is a nurse-based status monitor. My name's Andrew. I work for a company called User Testing. Doing mostly Rails. I've been doing it for about 10 years now. My partner in crime on this project, and he can't be here today, but he deserves just as much credit for this as I do, is Jerry. He's Jerry Hoggside. He's also a developer at User Testing. And as you can see, loves to play around with hardware. We work for User Testing. We're out, based in the Bay Area, and our mission is to help our customers test their user experiences and create great experiences for their customers. We have offices in San Francisco, Mountain View, Atlanta, about a third of our engineers are remote. We're always looking for good engineers. So if you're interested, you can check us out, usertesting.com, about us jobs. And that's as much something else I'm going to do here. So we had a problem. A bunch of us are really, really interested in Elixir. We think it's awesome, it's great, but our app is currently a Rails monolith. About 52,000 lines of code. And for the most part, it works well. So the bosses say, Elixir is great, we have a monolith. How are you going to introduce Elixir if you really want to do that without causing problems and without introducing operational complexity, having to deploy it, et cetera? So we need to convince them that Elixir is cool and it's valuable to the organization. So we came up a bunch of us with this idea, let's make a side project, a project that didn't risk degrading production systems, a project that showed off Elixir and OTP strengths and some of its unique advantages, and a project that was small enough to get done quickly. I mean, we don't, this is a side project. We're not getting paid for this, so we can't spend too much time. But it's also got to be large enough, it's got to deliver a little bit of value. So we came up with, I just got to the NERS workshop in Orlando, taught by our wonderful friend Justin here. Jerry had been playing around with Arduino and LED art, and he created a build status model that used CircleCI's API to display the status of all our builds, and he wrote that using Python and NeoPixels. It worked for the last part, but multi-processing was a pain, it crashed a lot, CircleCI would give some random JSON that didn't quite match what we were expecting and the whole system would die. And then we'd had a rash of incidents where our site performance had degraded and our site had started throwing errors, but how many of you guys use New Relic or Honey Badger and then all your messages from those services go straight to a mailbox that you never look at? So those systems were telling us we had problems, but no one noticed. Just here's an example of some of Jerry's LED art, including his version of turning his car into a kit. And then here's his initial build status model, and he even built a little portable one you can wear on your belt that tethers to Wi-Fi via your iPhone. So based on all that, we decided to come up with a nerves-based visual status monitor and monitor four key operational aspects of our website. Was our site up using Pingdom? Was our site responding quickly? New Relic app decks? Were we throwing errors? Honey Badger? And what was the state of our master build from CircleCI? Could we deploy code if we needed to? So we have a big LED strobe for the site being down, and it was quite interesting when I was finishing this up last night, and Amazon's US West One Region decided to take a break for a while. And then we used programmable LED strips for the other metrics. So this is what you see here. So we have a strip here for CircleCI. Each of these group of three pixels is one build. So the red ones are ones that failed, green ones are ones that passed, and that yellow breathing one over there at the end. So someone's currently trying to build our and deploy our application. And we have two different builds and using that first pixel color to determine to indicate which build it is. This middle strip is New Relic app decks in five main increments going from most recent, the brightest down to the dimmest, which is six hours ago. And then the bottom one is Honey Badger. This is error rate per minute. One's getting dimming down as we get older, and this is error since we have a couple pixels left over error rate per hour. As you can see, a couple hours ago we had a little bit of yellow there, so we had a little spike in errors. The Honey Badger, so at green at zero errors per minute, Turgel at three errors per minute, red at nine, and then it starts blinking if we hit 27. So it's basically log blip base three. The New Relic ones, green if we're above .99, red if we're below .91, and it starts blinking below .85. And it's actually turned out to be really useful. We model our pixels using HSL, so red, yellow, green are contiguous. So it's very easy to do a nice linear mapping between red and green between there. So that's basically what we built. So now, how did we build it? So the core of the system is the Linkit, I'm going to plug this for a second, is the Linkit Smart 7688 Duo. So the system on a chip, it's got a couple of really nice advantages. It's got both a Linux microprocessor on it, and an Arduino microcontroller, and they're wired together with an internal TTY. It's got built-in Wi-Fi, and it's cheap. That chip's about $15, and it's a good thing they're cheap because it has a little bit of weirdness to it in that the bootloader is written to raw flash, which means there's no wear leveling, which means there's no error correction. So anytime you flash the bootloader, there's a chance that the chip might turn into garbage. So have a spare. The LED strips are just your standard WS2812B compatible LEDs, bottom off of Amazon. Cut them in 144 LED length strips. We cut them in half, did some soldering. Then the strobe, once again off of Amazon, it's really bright. If you have one of these, and your site goes down, your engineers will know, assuming they're not on the ground twitching. The downside of that is it requires its own 12-volt power supply. So all of this, the parts, et cetera, I built this thing for about $100. The most expensive piece were the LED strips, but all in all, plus or minus $100. So the hardware is pretty simple. It's just that linkage chip. The LED strips are hooked up. Quick note, the linkage defaults to 3.3 volts. The LED strips expect 5, and undervoltage will fry those LED strips. I've got a couple dead ones if anybody wants them. The other piece is, at full brightness, those LED strips take 50 milliamps. Most of these USB power supplies only take about two, two and a half amps, which means you're limited to about 200 pixels. And then the LED strobe is just hooked in via just a standard relay with, like I said, a 12-volt power supply. So then we have an elixir app controlling all this. It's an umbrella app. Typical as the main code library has the firmware app, and it will also have something called I call the Bench app, and we'll talk about that in a second. It's a fairly small app. It's running. It's about 11 gen servers, three supervisors. We're going to skip this slide since Justin already talked about what is nerves. And I'm assuming most of you are awake. Blah, blah, blah, blah. So this is generally the supervision tree of our application. It's the main app. We've got a notification subsystem. We've got a monitoring subsystem, and then we have a separate little gen server that's controlling the communication to the Arduino. And the message flow basically comes that we have a bunch of gen servers that are polling these various sites. They get the data in. They send it to a notification engine that decides what we want to do with them, and then that sends it to a hardware dispatcher that figures out which piece of hardware is on what port and all that stuff, and then that goes off and writes it to the Arduino. So one of the things I rapidly ran into as doing this is I'm relatively new for elixir, so I screwed up a lot. As Justin pointed out, SD card swapping is a pain. And since he didn't have hardware in the loop done a year ago, and I couldn't use it, I needed to bird firmware for every change. And then I was trying to debug over a serial connection, which by the way, another thing about the interim Wi-Fi, if you've ever used it and tried to read output, the WPA supplicate spews log output constantly, which means trying to read anything over the serial connections next to impossible. So traditionally, when you create a NERVs app, you have an umbrella. You have a firmware app that's just pretty minimal. It's basically the NERVs libraries and some hardware initialization. You have a library app that has everything else you want to do. I added a third app called The Bench, sort of the old electronics bench. And what I did is I moved everything out of the firmware app except literally the NERVs dependencies, literally just including libraries and turning on the Wi-Fi on the chip. Everything else moved into the library, including things that called NERVs. And then what I did in the Bench app is instead of having the NERVs libraries as a dependency, I just put in stubbed versions of the NERVs call I needed. And this is all the code I needed to do to stub out the NERVs. We need to use NERVs UART because that's what we use to talk to the Arduino. And then I needed to be able to ask the NERVs network interface whether it was up or not because it takes a little while to boot and I didn't want gen servers crashing all over the place waiting for the network to come up. So I stubbed that out and now I could do all the development I needed on my local laptop without having to burn firmware other than testing that at the very end. So we have a bunch of monitoring processes and these are basically just gen servers. They call themselves every minute. They use Tesla, which is a, if you haven't used it, if you're from the rail space, it's sort of like Faraday. It's a API client, great for consuming JSON APIs. So these monitoring processes called the API. They extract data and then forward the results onto the notification engine. The notification engine is a big singleton. It receives the messages from the monitoring processes and then translates that data into pixel representations if needed. Some of them are binary, like site upside down, doesn't have a pixel representation, but all the rest do. And it sends generic commands over to the hardware dispatcher. Open this relay, close this relay, push this pixel. And then we have the hardware dispatcher. It receives messages from the notification engine. And on initialization, it starts up processes managing each of the various hardware components. Process managing this relay, process managing this LED strip. And it holds onto those one by name. And then when it gets a message, it says, do I know about this hardware? If so, shove it off there. If I don't know about this hardware, you haven't configured it. Just ignore the message. So as I was adding services, changing things, I didn't have to compile the changes to that whole side of the system. If the hardware wasn't hooked up in the config, it just ignored the message. The hardware controllers, one's getting just a gen server, and they're just holding a reference to the piece of hardware connected. And their job is to translate the generic commands, push a pixel, open a relay to the specific commands that we need to send over the wire to the Arduino. And it can send in either as single commands or as batches of commands. Once again, gotta get used to thinking of multiprocessing. I had a lot of weird results when I started because I wasn't batching commands. And all these things are running simultaneously and commands started getting interleaved. And it created some very interesting effects on the display. And then the last piece of the system from the elixir side was the interface to the Arduino. And this is just a single process that holds a reference to the UART process. And it takes commands and manages communication, takes it, sends it to the wire, waits for an acknowledgement back from the Arduino before continuing. The Arduino side of things, and I didn't write this. So this is just an overview. And if you have any questions, I won't be able to answer them. There's basically four main parts to it. It's an event loop, a command processor, a bunch of RAM buffers to hold the pixels, and then a third party driver for the LEDs. We ended up writing our own custom command language. We could have used Firmata, but we just decided to use a custom language. In fact, Jerry, who was writing this, has already started writing his custom firmware with a command language. So we just continued using that. So basically, here's a quick overview of the system. Basically, drawing operations come in. They go out to display buffer. There's a buffer holding what effects are being applied, system state, timing, things like that. I'll get combined into a renderer. And then the renderer sends it out to a rendering buffer and off down to the strip. And it does that about 30 times a second. So the commands just come over to serial IO, and then we just send back the acknowledgement. Inside the Arduino, we represent the color data. It's three 8-bit pixels, one 8-bit 8-byte per pixel for effects. Drawing commands, modify the display, and then effect buffers. The renderers build the output, do it 30 times a second, send it on down the strip. And it's the standard one wire WS2812B, if that means anything to anybody. So one of the things we did is, like I said, we wrote our own custom command language and it's RPAN style language, so arguments get sent before the command. All the commands are three characters because if anybody's ever done any Arduino processing, you don't have very much RAM to play with. So you're literally back in the battle days of trying to save every byte you can, so every character you save is worth it. And there's roughly 50 different commands in the command set. Everything from control commands to colors to effects. So here's a couple examples of the command language. Simple, red, green, blue to push a red pixel and a green pixel to a blue pixel. There's some blanks, there's some things like copy, so you can just repeat sections of it without having to specify it. The Arduino plus side, there were some challenges. First of all, the LEDs don't have any concept of effects built in. So if we want effects like breathing when a build happens or blinking when some parameter is way out of bounds, we have to do that ourselves. So we literally have to keep track of the time and decide what brightness this pixel is. Is this on, this is off. And as I said, Arduino's have very limited memory. Also, when you're trying to do really time-sensitive stuff like displaying animated effects on LED pixels, you need really to disable the interrupts so that the timing continues to happen. Which makes it really difficult to send commands to it when interrupts are disabled. So we had to come up with this whole system of basically sending a wake-up command several times and most of the time, that's enough. Occasion of commands will get missed just because the Arduino didn't bother hearing them. But we've got it so that it's pretty much sending it three wake-up signals pretty much gets it every time. That's pretty much the entirety of the software side. So what I really wanted to get into is we were trying to do this project as an intro to Elixir to the company. And was this a good project to introduce Elixir? Short story, yes. If you don't want to listen to the rest, you don't have to. So obviously, going back to the criteria we had, would this disable or degrade production systems? No, it's just a monitoring. It doesn't have anything to do with our production systems. Would it show off some of the Elixir and LTP's strengths? Yes, I think it did. We had supervision trees. We had a bunch of parallel monitors. If we got bad data from one of them, we didn't have to do any error checking. The monitor just died and then started back up again. It was wonderful. And it turned out to be that this code base was small enough that if I sat down with an engineer who was interested, it's about a thousand lines of code. So we could sit down for about 15, 20 minutes. And they may not have all the details, but they could get a pretty good idea of what's going on and how things were done. Didn't take too long. Took a couple of weeks of nights and weekends. Probably spent over two or three months. But overall, it was probably 60, 80 hours to get it, the whole thing done. And was it large enough to develop value? So much so that just based on this foam core taped together prototype, our customer service department has already claimed the first one of these when we actually put it together for real. Hardware turned out to be a great idea. Because let's face it, we're engineers. What's more exciting than making a light blink? I mean, seriously, building a website, making an LED blink. Which one's more exciting? Also, especially where we're a web development shop. Most of us have never touched hardware in the last ten years or ever. Which means that it was really a foreign domain. So people were curious. And it was also a new area for me. I mean, I was new to Elixir and I knew to hardware. Which is challenging because you don't know anything. But it also means you don't come in with any preconceptions or assumptions on how things were done. So I could have very easily done a web page. I've done it a billion times, but I can do that already. And also, in a new domain, struggling to debug is a pain in the ass. You want to rip your hair out while you're doing it, but you learn so much from doing it. So one of the things I ran to, I call the great zero data bug. And this is gonna be a little audience participation, see if you guys can figure it out. So when starting the system, it would boot up and the first time or two times, we asked New Relic to, what are our aptX values, we'd get all zero data back. It would say there were zero satisfied requests. There were zero tolerating requests. There's zero frustrated requests, which doesn't really make sense because the website's getting hit. Also, because I wasn't expecting it caused the whole thing to die because of the divide by zero. So quick hint, the New Relic API when you call it says, how many samples do you want? How big a slice of time do you want to cover and what time do you want to start from? Can anybody think, Justin, you can't guess because you probably know the answer. Can anybody figure out or have an idea of why we get all zero data back? UTC West Coast, clock isn't set, and actually Dave got it right. We're on hardware, we're on a chip this size. It doesn't have a real-time clock built in. So when you boot the system, it thinks it's 1970. And New Relic was being very honest in saying, you had no requests back on January 1st, 1970. So until the NTP application got booted and it was waiting for the Wi-Fi to get booted, so it would take a minute, a minute or two to finally set the clock and then things would work fine. Assuming the entire system hadn't died by then because of divide by zero. With all apologies to Chris, non-Phoenix was a great idea for a first project. Phoenix is awesome, it's powerful, and it's got lots of great abstractions to do things quickly and powerfully. But all those abstractions hide the really cool guts of Elixir. To me, Elixir is gen servers and supervisors and registries and all that good stuff. You can write a pretty big, pretty powerful Phoenix app without ever writing your own gen server, without ever writing your own supervision tree. And we're a rail shop, we all write Rails apps. If I just wrote another blog in five minutes, could you really notice we'd done anything different that we couldn't have done in Rails? Also, they're very similar, but Phoenix and Rails are different in a lot of ways. So you can bring a lot of bad habits, or not necessarily bad habits, different habits, let's say, from Rails into Phoenix. And I think, obviously if your customers need a web app, Phoenix is a great choice if you have to do it. But if you're just playing around learning Elixir, I kind of think Phoenix might want to wait till you have a web project that needs to be done and play with the underlying guts of Elixir and Erlang itself. And as I said, just showing off a new web app to a bunch of Rails developers because Phoenix is a better Rails is cool, but not really a reason to switch unless you have problems with Rails, which we don't right now. So what's coming next? Obviously, this is a really basic prototype project. I think I'm gonna try and refactor it to use agents or ex between the monitoring processes and notification processes. Right now it's a very push system. Messages come in and they get pushed all the way through the end. I think I may decouple that where we push state into an intermediate buffer and then the display drivers are pulling that on whatever cycle they want. Get through that single notification engine hardware dispatcher combo in the middle. Also allows us to do things like collect data for things we don't have outputs for right now. Maybe you want to hook up another output later. But right now, none of that data is being stored anywhere. It's just getting pushed to pixels and then forgotten about. Also, if a hardware controller crashes right now, the system keeps running but that displays reset until new data comes in to fill it up. If we had that data stored somewhere, we'd be able to repopulate that strip as soon as the controller came back up. Also, depending, may play with the registry to use instead of holding the processes by name and deciding whether the hardware was connected or not. Really, that's what registry is for, so I might as well use it, and tests. Right now, this thing has essentially zero unit tests. That's probably not a good thing. Hopefully, hardware in the loop makes that better, but testing hardware in unit test level isn't exactly easy, so I skipped it. Probably add some more metrics in there. How much we're spending a day on AWS? What's the state of our production system? CPU load, memory usage. We're on rail, so memory usage is a thing. Sidekick Qlanks, and maybe even some business metrics. So user testing does user tests, so our test's being completed. If that number drops, something's probably wrong in the system. Our customers are able to place orders. Maybe add a Phoenix app in, some dynamic. Right now, everything's hard-coded in that system. Maybe we could have a Phoenix app that allows, we have ten different metric checks defined, and you can go in and dynamically, via web app, choose which one you want to send to which strip. Or configure what's your new relic app ID so you can change what project you're monitoring without having to re-brew an entire set of firmware. And you could also even get away from the hardware side and just make a Phoenix dashboard. Eventually, you're gonna have all this data in ads or agents or whatever. Easy to write a Phoenix dashboard on top of that, and haven't done it yet because our networking people don't allow weird devices on the network, so. And getting an IP address from that thing isn't the easiest, so actually talking to it is a challenge. And then, we're gonna semi-productionize this. So, we already have requests from four or five of them. We have four different engineering offices, all of whom want one. And like I said, our customer support office wants one. I have a co-worker who does custom woodworking. So we're literally gonna take a two by eight, get out the router, route out strips for each of the LEDs, varnish it, put a nice plexiglass front on it. This is why we have co-workers. So, that's basically all I have for you. I wanna just share this project we did, how we did it, and why I thought it was a great project to introduce Elixir into an organization that's excited about it, but really doesn't have a means to investigate or play with it. I'd like to give a shout out, first of all, to user testing. They paid for all the hardware, they paid for me. They paid for me to come here. Jerry obviously, Jerry did all the Arduino side code for this, without him I wouldn't have been anywhere. Shout out to the nerves team, it's awesome. And Honey Badger IO needs their own little shout out here, because I was talking about this at RubyConf with Joshua Wood and said, hey, wouldn't it be neat if we could get the error rate on how many errors got thrown in the last minute or hour or whatever from Honey Badger? And he said, yeah, that would be really cool. This was like Saturday afternoon, and I was going to the airport Saturday evening, and I got an email from Josh going, hey, by the way, that new API you asked for, it's in production, give it a shot. If anybody, all the code for this is MIT licensed and publicly available, the Elixir codes on the user testing GitHub. The Arduino side is on Jerry's, J-Hogset's GitHub, and there's a link to the third-party LED driver. We're Elixir newbies, this is my first real Elixir project, other than the basics of toys. So we're new, this code, if you look at it, it works, it's probably not the greatest. We love feedback, both on the Arduino and the Elixir side. Call requests, if you want to sit down and talk with me about the code, I'd be happy to sit down with anybody and talk about the code with you. That's about all I have. Does anybody have any questions, comments, heckling?