 Check, check. There we go. Yeah, I think so. So don't say anything untoward in the interim, right? Because you're probably recording that. Don't say anything untoward in the interim because you're probably recording that. I need all the help I can get in his talk if I wasn't doing this one. Oh, it'll explain why that makes sense. I hope. Next speaker is Guy Martin. He's going to be presenting... The name of his presentation is Aim to be an Open Source Zero. Guy is the director of the Open at Autodesk Initiative at Autodesk and without further ado. Thank you very much. And I definitely appreciate all of you being here because through the joys of scheduling conferences which I've done, so I can appreciate it. We have four speakers all doing community topics. The other three are all friends of mine and I would probably be there if I wasn't here. So thank you guys all for attending. I really appreciate it. So I'd like to start with... My pointer is going to probably work here. I have my technical difficulty for the presentation out of the way. I'll start with what I do. And I kind of always hesitate to do this in presentations because we've all stopped the presentations where somebody has to say, here's all of my accomplishments. But I do want to do this but you guys kind of have a perspective of where I'm coming from and I think it'll make a bit more sense of the presentation as we go along. So I have a computer science degree. I actually fell in love with computers in high school showing how old I am back when it was an Apple IIe in the back of the AV room and that was the coolest thing ever with the five and a quarter inch floppies. So any of the young people in the room, five and a quarter inch floppies, it's not the icon that you use in Word. Now that's a three and a half inch floppy but way back in the day when you got to do some fun things like peek and poke commands on Apple. But I ended up getting a computer science degree and then was very fortunate enough to work at some really interesting companies some of which don't exist anymore unfortunately like Sun. But I really started out loving engineering, love to code, love to develop. And I worked at places like Motorola Developing, embedded things. At Sun I worked at everything from Enterprise Java to the Java Car project, the embedded work we did there. But an interesting sort of thing sort of happened, right? Somebody who fell in love with computing and programming and loved to do hacking which I still have to do in my spare time what I found is that I actually enjoyed talking with people and I actually enjoyed figuring out the systems level things of how software has developed at a corporate level, how software is developed with groups of people. And so it went down a path where I started doing more consulting. So one of the things I actually helped build was the Department of Defense has forced that middle system which wasn't exactly what we ended up wanting it to be in the end but we were trying to build an open or inter-source community within the Department of Defense. And what I found through this whole process was that I really was more of a community guy with an engineering background as opposed to an engineering guy doing community work. And it was I have to say a big shift in my mindset and one of the things that was a challenge as somebody who thought I was going to develop software and hack on code for the rest of my life for a living. But I found that it was very interesting. It was very fulfilling. And I then was really fortunate to be able to go to Red Hat and then try to encapsulate some of this work of how you get companies to understand the value of open source, how you get companies to do a better job of open source both within their enterprise and outside. And we built a consulting offering at Red Hat that actually has morphed and is now ironically or not ironically being given by the Linux Foundation. So a lot of the work that I built at Red Hat has made its way into the open source consulting, strategic open source consulting that the Linux Foundation does. Right before I got to Autodesk, I actually was at Samsung which was a very interesting experience for a lot of reasons, culture being among them. But also really trying to take a company that had really never had a big presence in open source and we started and our job was to get developers to want to come to Samsung to work on open source, specifically to work on upstream projects with an open source. So it was one of those things that I took a lot of learning that we put together at Red Hat and tried to get Samsung to understand the value of contributing to open source communities as opposed to just consuming. And for the last year and 10 months or so I've been at Autodesk, kind of on a similar journey to where we were at Samsung trying to get a company to understand not just how you consume open source but where the value is, the business value is in contributing back. Now what's interesting about all of these things is that I kind of went from the path of doing engineering for living to doing community management for living, which I actually really enjoy. And it's something that I'm very passionate about and I think it's something that all the other three talks that are being given in the same time slot are all about the same thing, right? How do we do a good job, a better job of community management from a corporate side in open source? So that's what I do now. I'm going to take a little step back in the past because this also makes some sense as well. Everybody had dreams growing up. What did I want to do when I was growing up? Other than be an engineer, I actually wanted to be a firefighter. I mean really bad meant to be a firefighter. Fortunately, that actually is me not just wearing a Halloween costume. For the last 10 years or so, I have volunteered with Cal Fire, which is the State of California's fire department, and what's called the Volunteer Prevention Program. So we get to do a lot of the same things that firefighters do except for actually putting out fire. So we're at incident bases. We run communications. We do things like go to public education events like you see here with some of my colleagues. And work on getting the folks in the community to understand the value of fire prevention. It's actually me within our mobile command post. And we actually run a mobile, my team runs a mobile command post and I'm actually also really excited to see the ham radio stuff going on here, K6GWM. And we actually do a lot of work for Cal Fire, both on the ham radio side and on the state radio side. The interesting thing about this, besides just the fact that there's a community element to it within the teams. So I have a group of about 15 volunteers that work in the mobile command post out of the prevention program. And so one of the applications to this, and I actually learned a lot of things in working with volunteers in this effort as well. I said I had two dreams, actually three dreams growing up. Kind of doing computers, wanting to get firefighter, really wanted to do. Now, I think that's Buzz Aldrin. I'm pretty sure that's Buzz Aldrin. Showing my age here, this picture was taken probably four or five months what was going on and kind of figured out what was going on in the space program and became really a space geek. I loved the fact that it wasn't just the solo, it wasn't just the solo adventure. It was a team effort, right? It was astronauts who had to learn to work together effectively, had to learn to work effectively in community with their ground support teams and all of their trainers and all of the people that basically developed everything from spacesuits to spacecraft. We've come today is the International Space Station which I think is the most obvious example of community collaboration between whole nations. Now, we can argue that it's a single point of study right now that we only have one nation that allows us to get up and down from the space station, but the building of this I think is a great example of how communities can come together. Communities that probably don't necessarily always like each other right now. I mean right now we're in a very interesting political climate with regard to this, but I think the International Space Station is a great example of community management as is the work that I'm doing in Cal Fire. So we'll talk about what I do now, what I wanted to do growing up. Why was this an inspiration for this talk? Actually, this guy was an inspiration for this talk. Anybody show hands that this is? Maybe this will be more familiar to those of you who don't know who this is. This is Commander Chris Hatfield. A Canadian astronaut was in the Canadian Royal Air Force, was in one of the early classes of the Canadian astronauts. And he has the distinction among other things of being the first Canadian commander of an International Space Station mission, Expedition 35. So while he was there he was actually famous for a lot of things, he was not pleased at which was doing a re-recording of David Bowie's Space Odyssey while playing guitar and floating in the middle of the International Space Station. But one of the other things that he did was really brought, I think for me, space back to being interesting. He was one of the first astronauts to do a lot of live tweeting from space. But what was most interesting to me is what he did when he got back. He wrote this book, An Astronaut's Guide to Life on Earth. And being the space geek that I am, back when I was working at Samsung and traveling to Korea four plus times a year, he gets kind of boring watching the same movies on flights. So I picked up a copy of this book for my Kindle app and I pretty much read it on the entire flight over from the US to Korea one day. And the thing that really struck me about it was all the things that I'm going to talk about in my talk today and all the life lessons he's bringing to bear from what he learned are absolutely applicable to open-source communities. And maybe it's just that as an open-source community guy, I'm reading this book thinking, oh my goodness, this is exactly what successful open-source communities do. And some of the pitfalls he talks about in the book are some of the things that unsuccessful open-source communities have an issue with. So one of the central premises to his book is this notion of aiming to be a zero. So there are some quotes sprinkled throughout the talk specifically from Commander Hatfield, but I think this kind of is one of the key ones. If you come in and you have some skills, but you don't fully understand your environment, there's no way that you can be a plus one. At best you can be a zero, but the most important thing for me in this and this is a challenge as somebody who was an engineer and still wants to write code and wants to be the best that I can be is understanding that a zero is not a bad thing to be in this kind of an environment. You come in and you start basically at the bottom. And I think those of us that have been in successful open-source communities and I'm going to talk a little bit about some of that experience a little later, this is pretty consistent. It's really hard to come in and pound your fists on the table and say, well I am going to be a plus one and I'm going to contribute a ton of things and write off of that and make my mark in this community. You have to earn that privilege within the community. And you have to earn that privilege whether you're in an open-source community or whether you're in my CaliFire community or whether you're in kind of even within your own communities within your organization. So some things to think about, who does this apply to? Obviously as an engineer, it applies to us engineers. I think it's one of the things that when I did consulting at Red Hat, I talked to a lot of engineers and they were kind of one of my main audiences to explain some of these things. Now unfortunately I didn't have this book and this notion of aiming to be an open-source hero back then, but a lot of the things that I'm going to talk to you guys about are things that I would talk to engineering teams about for them to understand how to better work within open-source communities, especially engineering teams that maybe were not as familiar had worked on primarily proprietary code bases from the beginnings of their careers. However, it's also important for these people, business people, product managers, all the people that are helping to direct these engineering teams, all the lessons that I'm going to talk about are also applicable. One of the things that I really want to stress though is that both of these groups make up companies. And we don't think of companies, we tend to, you know, anthropomorphize companies, but companies are made up of individuals. But companies still have a reputation in open-source communities and sometimes companies don't have a great reputation in open-source communities. And I've unfortunately worked with companies that didn't have a great reputation. And so I think it's important for us to consider not just the fact that these individual groups are important when we're talking about what we're working on here, but also how companies are affected by this. One of the things I want to bring in is one of the things Commander Hadfield talks about in his book, Expeditionary Behavior. Has anybody here ever been a Boy Scout, Girl Scout, gone camping, whatever, right? You probably have heard this term maybe, Expeditionary Behavior. The best definition I found is from Paul Petzold who founded the National Outdoor Leadership School, coincidentally, or not, yeah, National Space Station, are all sent to the Noel School. And Paul Petzold talked about this, being able to live with people 24-7 and get along even if we have a bad day. Now, we all in open-source communities don't live and work with people 24-7. They seem that way right about the release time. But we are interacting with these people sometimes a lot more than we're even interacting with our family. So I think this is really valuable. And Paul Petzold, if you didn't know, is a really interesting character as I was doing research on this. Scared the Grand Teton at 16 was one of the first people on the expedition to scale the K-2 Mountain, one of the first expeditions to do that. And I love the fact that it's about helping others in your team make that jump, make the jump across whether it's a creek, make that jump up a mountain. These are important lessons that the astronauts are being sent to the school to learn to live and work together because when you're on an international space station mission for six months you have yourself to rely on plus the people on your team and your ground support people who are sometimes out of radio contact. So being able to really understand what expeditionary behavior is and why it's important is valuable for them. It's also valuable for open-source communities. So specifically things like investing in other people's success. And this is something that I've seen in successful open-source communities. And I've seen in unsuccessful open-source communities this not happening, right? This notion that especially from a company perspective again well my company's perspective and my company's features are the most important thing for this community. And that's really something that is going to get you traction as an individual contributor or even as a company. Another important thing that they talk about in expeditionary behavior and Commander Hatfield talks a lot about this in his book is sweating the small stuff, right? Now as engineers we kind of like this, right? We like to sweat the small stuff, details are something that we're really about but it's not just sweating the small stuff in code. It's sweating the small stuff in processes and social structure. So, Leena Sorgold actually talked about this at the end of the source leadership somewhat a couple of weeks ago at the Slaw Valley where I was. And he for the first time ever that I've ever heard him talk actually talked about how the social construct of the project and how the processes in the project took a while to get right but they're the reason that the O and X kernel continues to scale and continues to work effectively because they sweat the small stuff on figuring that out. So the other thing that being a voice scout, right, you know, always be prepared taught me is that failing to prepare is the same thing as preparing to fail. And I think sometimes there's especially new open source communities are started. This is a challenge, right? We just want to get the code out. We want to get something going and it's great. You definitely need to do that but you should be thinking about the plan for how you go beyond, you know, this great piece of code and this great idea that you think you have to actually building a successful community around this. And actually at all of this this is one of the biggest challenges I have when we have teams that say, hey, we want to open source something. We've created a process now because we kind of had five or six different processes when I started. We've created a process where we lead teams through how to do this. And one of the biggest things we make sure that they do is have a governance plan. Have an idea of how people are going to contribute to this project. What does a pull request look like? What are the kinds of things that you're going to accept in a pull request? How do people progress from being simple contributors to maintainers? Those are the kinds of things that need to be thought about in an open source project for it to be successful longer. And this is a big one leading by example. So not only leading by example in things like my Cal Fire community where I basically came in at the bottom and through trying to be a zero and trying to sort of learn what was going on eventually became somebody who was in a leadership role. I think the same thing happens in open source communities, right? If you come in and you actually take a position of trying to understand what's going on and we'll give some kind of guidelines for that a little bit later, you eventually, if you're there long enough and you gain the trust of the people in the community, your job is to lead by example and to mentor the people that are coming after you. So enough about expeditionary behavior for the moment. Some of the things that Commander Hadfield talks about as valuable ways that you can actually become a plus one after working to be a zero are things like doing your homework. And this should seem very, very basic. What are communication tools and norms in that project? And also how does the development process work in that project? So the best example I have of this is when I was at Motorola, how many of you have used GStreamer? So multi-media framework, open source multi-media framework. At Motorola, we were using an early version of GStreamer in our products but we had done what every company should not do but did at the time which was for it internally and then build it into our products and then six bugs internally but not contributing them upstream. I was running on the source at Motorola at the time where I was one of my roles and I had the GStreamer team internally come to me and say, hey, we've got these bug fixes we want to put back. I'm like, great, right? This is progress. And they said, okay, well, what do we do? And I said, well, okay, have you been finding the mailing list? You've seen what's going on in bug track or do you know all that? They're like, well, no, you just want to give a coat back. I said, okay, well, first go figure out what's going on in the mailing list. Go check to see if these fixes have been made and understand sort of how the development process works, how you get these things upstream. So they promptly ignored my advice, sent them patches back up. Of course, they were rejected because it had been fixed, you know, two or three releases prior. So now we're in an interesting situation. We've spent engineering resources to fix these things. Do we continue forward on our fork of that, maintaining those bug fixes through however many product releases we have? Or do we say, okay, we'll do the right thing, pull the version back from Mainline, but then we have to rerun all of our regression tests and make sure that anything from the open source community hasn't broken anything we were depending on. So you really never want to get into a situation like that. It just makes life difficult in the end. And it's kind of a rock and hard place situation that a lot of business folks that I mentioned earlier just don't want to have to deal with. So that's why I always stress do your homework with regard to communication and development process. So the other thing to think about is sort of how the project is governed, right? You know, what sorts of things, again, if you understand what is likely to be accepted as a pull request, if you understand the leadership of the organization, how does the project organize? Is it a more flat structure like a Debian? Is it a benevolent dictator for life? Is it a group of maintainers? And you have to understand where your patch fits in and how you get people to actually accept your contribution. These things are important, right? I mean, we talk about code being king and code being the biggest thing that matters, and it absolutely is. As an engineer, I'm not going to deny that. There's a certain amount of sales we don't want to talk about. There's a certain amount of sales that has to go on and maybe convincing is a better word than sales. Convincing projects, why your pull request and why your feature that you want in is important. And so kind of understanding how the project is governed and what the leadership looks like, what they do, what decisions they make, and sort of how all of that fits together is a key important part of doing your homework and being prepared to make that first contribution. So, okay, we've done our homework, and now you kind of have to get your hands dirty, right? You have to offer to do the dirty work. And in Commander Hadfield's experience, he likes to say that the best way to contribute to this new environment you're coming into is not to say, hey, I'm the most wonderful developer, I'm the most wonderful person to come in and be a part of this community. Your basic goal is to try to have a neutral impact initially. Now, that's not saying that you're going to continue to be a wallflower and try to then just always be neutral, but you have to start there, right? You have to be at a place where the rest of the team, you're not a negative one, what Hadfield calls a negative one. You're not somebody who's adversely affecting that environment. You're somebody who's basically just trying to get in and understand what's going on. So, let's look at some practical ways of offering to do the dirty work. How do you get in there and get your feet muddy? So, hopefully you guys can read this. I apologize if you can't. But basically, I love this cartoon because it talks about documentation, right? My favorite thing, and I'm sure everyone else is favorite thing, I'm sure nobody here has ever had an open source project that has too much documentation. Now, we don't need more documentation. We're done now. There's always documentation that needs to happen. And I think the interesting thing about something like documentation is it can be a very interesting way for people to become involved. So, I mentioned that I don't like code for a living anymore. I get to write code for fun now. One of the communities I'm involved in is the SmartThings community. So, if anybody here is using SmartThings from Samsung, the IoT Hub and Community. I will say Samsung are SmartThings before we acquired them while I'm still at Samsung. Did a really good job of building a community saying, Hey, here's our APIs. Here's the way our development tree works. And one of the interesting things about that is that they offer, you can get a sensor for the SmartThings environment that's got like five different things, temperature, luminosity, motion, RH, and a bunch of other things. And a bunch of us engineering geeky people who like this said, Hey, I want to be able to trigger a light on two conditions. I want to be able to walk into a room and if it's daylight, if the luminosity is above a certain level, I don't want to turn the light on. But if I walk in the room, there's motion and it's below a certain level, I want to turn the light on. From an engineering perspective, pretty straightforward, right? This is a rural engine. SmartThings didn't build a rural engine into their product. They had this notion of applications like what they call smart lighting, which was great, could only trigger on one condition. The community actually built a rural engine. The challenge they had is the guy who built the rural engine wasn't that great of documentation, quite honestly. And a bunch of us kind of came alongside and started working, playing with the code and saying, you know, we really need like an FAQ for just how this thing works. A set of documentation and a set of examples for how you write very simple straightforward rules that you can then extrapolate from. So it was a great way to kind of get in and start doing documentation, and start kind of getting involved in that community. It led me personally to say, oh hey, I've got some code to contribute and it's kind of interesting because it's written in groovy. And I said, hey, I've got a use case where when I turn on the light manually, I wanted to override that rule. There wasn't anything in the rural engine. So having done the documentation and been kind of a zero in this community, I understood where I needed to go to actually make my first contribution. So the other part of that obviously is QA and testing. Again, something that we don't ever have too much of in any open-source project that I know of. And again, in the SmartThings community, the testing in QA was from those of us that were trying to do this and then finding out that things were broken. And things were broken a lot. But the nice thing is that through the efforts of the community, we really have built a really strong rural engine now. And it's something that the SmartThings team has basically said can't adopt within the SmartThings system, per se, but it's something that they're now supporting, which they weren't initially when we started this. So another good way to do the dirty work is to do bug fixing. And we all know that fixing bugs is sometimes not fun, especially it's really difficult to understand. So the other big piece of this is helping triage. So usually somebody who started a new open-source project and again, the SmartThings guy was an example, was very focused on the future direction of what he wanted to do in this rural engine. And fixing some of these bugs is not a high priority for him. So a bunch of us came along and said, okay, this bug is obviously a user, and we'll help that user. This bug is actually something that we need to fix. So doing that bug fixing and triage really kind of helped us, again, help the community out. And also was a way that people came on board and started showing their zero-ness with a path toward plus-1-ness. And then interestingly enough, specifically again in the SmartThings example I mentioned, UI and UX was not a big thing. And you really had to be an engineer who understood logic and really kind of had to trigger all of these conditions. But what we found was that we had users who weren't technical, but computer science people really weren't engineers, but they wanted to do all the same things we were doing. Hey, I want to trigger this live on these two conditions or maybe these three conditions. And so having people within the community that came together and said, okay, I know the CUI person, the source and who understands design, we actually got somebody to come in and work on some of the design for how you built rules in this rural engine. Because you built the rules in the mobile app. And that was really, really beneficial and that design person again then became a more important part of the community because they came in and did the dirty work. Some other areas that you can think about as you're kind of offering to do the dirty work, one of my favorites, answering questions. The number of times that myself and other members of the SmartThings community, the rural engine team community sat down and just were answering questions. And there's a ton of questions in the forum for people that said, hey, this is really great. You guys have built something that the SmartThings team hasn't had the time or inclination to build, but I don't know how to use it or I'm having this bug. I'm doing this weird issue. Just answering questions was a huge thing and it gives you a lot of credibility in the community. And more importantly, as you're answering these questions, especially if you're also developing code, you're starting to think, oh, that question is being asked a lot. Maybe there's a feature missing there. Maybe there's a bug there. So again, it's one of those things that gives you kind of that neutral impact in the community, but it gives you the ability to start to see how you can become a plus one. And then my favorite as a community guy, community and evangelism. So giving talks like this, talking in the forums about, in other areas of the forums where people, there's a whole thread on this rule engine, but there are other parts of the SmartThings forum where users are casting about trying to figure out, hey, how do I do this? Community management and evangelism is, hey, did you know we've got this whole thing over here where this team of people in the community has developed this. So kind of helping to shepherd that has been also something that I've done because it's kind of just a personal passion of mine. So the other thing that had Bill talk to in his book a lot is treating everyone with respect. And this is actually something that I found really interesting when he talked about the fact that there were astronauts in a program that were brilliant, but never flew on a mission. Why did they never fly on a mission? Because they ended up kicking off an executive admin or a trainer or somebody who basically said, this person is not going to be able to exhibit expeditionary behavior, and they're going to be a minus one in any flight that they're assigned to. So it was a very, very stark lesson for me that you can spend all of that time becoming a plus one in terms of your skills and everything that you bring to the table. But if you aren't able to treat everyone with respect, you just never know what's going to torpedo you in any kind of community. A big part of this is professionalism. And I love the fact that this graphic talks about professionalism being the behavior you bring, the skills you bring, and the knowledge you bring. All three of those things are what come together to build professionalism. And when I was at Samsung, one of the things I would say a lot to our teams within Samsung was you never lose points for professionalism in an open source community. Even if there are people in the community who are being professional, you never lose points for being professional. And also kind of culturally when I was working at Samsung we had to say that because there are certain communities, one expert, that is sometimes a challenge. And so especially if you're bringing in developers from companies that are very traditionally focused like Samsung to get them to understand to not take things personally, to understand that there are going to be people in the community who aren't as professional as you are, but you're never going to lose points for actually being professional and doing the right thing. So a big part of this is this notion of appropriate feedback. Are you criticizing the problem, not the person? And this is a big challenge in some communities. It's getting better in some communities. There are a lot of open source communities that are very, very good at this. And again, in personal experience, I will say the smart things community, especially the rural engine community, struggle at first where there was a lot of casting about. There was actually two versions of rural engine, short story. There was two versions of rural engine. There was a fork for a variety of reasons and there was a little bit of a problem where people were starting to criticize each other as opposed to the problems we were facing in the platform. Once we sort of got that off to a different fork and we started working with this new version there's been a lot more of people saying, hey, this is broken, and a lot more recognition that the team that's building this doesn't have to build it. We're doing it because it's a passion of ours and so the rest of the community was starting to understand, oh yeah, you're not a paid support person for smart things. So yeah, we'll be a little bit better about criticizing the problem or having value personally. So some other things in terms of treating everyone with respect. My personal favorite thing, your reputation as an overinflated ego bag precedes you. I will talk about specific companies, well I won't talk about specific companies, but they happen to be one who makes my watch and my phone. Well this was a challenge and not because the engineers involved were egocentric in any way, but they just had a perspective that their management was telling them, you need to go to X, Y, and Z in this community. It happened to be the Blink community. For those familiar with the Blink community, when WebKit and Apple and Google divorced, you had Blink split off and forked from WebKit. And there was an edict given from a company that certain engineers in corporate headquarters were supposed to do 5,000 patches a year to Blink. So they basically started putting in, for lack of a better word, crappy patches. And the Blink team said, no, we're not going to do this. And the universal response that my team got back was, but were this particular company, which I won't mention, and don't they know who we are? Yeah, they know who you are. They don't care, right? All they care about is actually doing something that's valuable and building something that's really cool. So your ego and company reputation mean nothing to most open source projects. It's something you have to earn. Again, you have to be that open source zero and earn the right to be able to come in and provide strategic direction. A big part of this is human dynamics, right? And I jokingly said to a friend of mine the other day that I should just go back and get a psychology undergraduate degree because about two thirds of my job is psychology, not technology anymore. And so there are challenges in how the human dynamics work in both companies and open source projects and how you sort of figure out how both of those things make sense. But again, a big part of that for me is just understanding that people are going to have different opinions. And if you understand how to treat them with respect and again criticize the problem, work the problem, and not worry about trying to warn up any particular individual in the community, you're generally going to have a lot more success. Excuse me. So we've talked about all these things. Let's try to bring it all together in a couple of interesting points. So for me, there's no such thing as altruism, right? Companies aren't doing this. Individuals aren't necessarily doing this. Yes, some of us are as a personal passion. But as an engineer, for me, this is about removing friction. This is about being more effective in how we build things. So if we're dealing with all those interpersonal junk as opposed to trying to work together or effectively in trying to use some of the things that we've just talked about, not using those things are what provides friction in the system. And so as an engineer, I'm about removing friction. And so I always like to say there really isn't anything such thing as altruism even in open source. And especially from a company perspective, companies most of the time they get a little PR bump for being a good open source citizen, but they're doing it because there's value. And value is in actually making code and making projects work better. And okay, I apologize. One joke for the presentation. None of this is rocket science, guys. Everything we've talked about here is, you've probably heard it in other contexts. I know I've heard it in other contexts. But the thing that's interesting to me is that how it applies to what Hadfield learned 255 miles up in space and how it applies to how we work effectively in teams and communities is exactly the same. There's nothing here that I think should become as a surprise to anybody. So it's not rocket science, but it's also not easy. It's not easy to sometimes admit, and I say that as an engineer, it's not easy to sometimes admit I'm wrong. I have a really hard time admitting I'm wrong as my girlfriend. But there's value in looking at some of these things and really trying to figure out how you more effectively work at removing this friction and building great projects. So with that, I'm going to do one final thought from Commander Hadfield to close this out, and then we're going to have a lot of time for questions, which actually works out well. So if you take nothing else from this presentation, this slide, proclaiming your plus oneness at the outset, pretty much guarantees that you're going to be a minus one in that community. It doesn't matter what skills you're bringing and it doesn't matter what you have or how you're actually performing. It's really going to be a very detrimental thing for you. So with that, we're done. So I'm happy to take questions or have a discussion on this. The reason I came is because there's an article you should look at and on the latest Harvard Business Review that came to me like on Facebook through a friend, but it's about precisely how in the corporate world, people who have tried to climb the ladder of success in management actually end up having bad personality skills for actually getting work done because they're stabbing people in the back, they're promoting themselves. So the skills that actually to promote yourself in the corporate world in a hierarchy like that tend to be bad for business. I guess you might, there's a woman named Carly, I guess, who did he do. He will attack her any good. And open source projects require a little more humility in terms of management, like you said, reducing friction and getting people to work together. It's kind of like a scientific community. You submit things, people get back to you, they say, well, this is a better way to do it. You can't have a gigantic ego to run a scientific experiment because other people are involved in it. So anyway, so that understanding that, you might want to look at that article because it explains how somebody who's our president actually happened to become, and he probably doesn't even know how to do the job, but he's up there. Anyway, forget it. No, you're not saying anything I haven't already thought of. But anyway, what's funny is this talk is actually based on a blog post that I wrote two years ago, one before I said individual was our president. But yeah, you're right. I think that that's the thing that so often gets overlooked is that, you know, this is again not about altruism and not about, you know, you feel good, so I'll get around the campfire and sing Kumbaya. They're real valid strategic reasons why you need to think about this as an individual and as people that are part of the team because, you know, the honest truth is that none of us like to work with jerks. At least I don't. Yeah, and if you want to create a piece of software that's actually good, you should take some pride in actually having contributed to that and actually brought other people into the process and, you know, that the team actually managed to produce that. And it wasn't you even if you're the leader of the project. It's the ostensible leader of the project. Other people built it and your accomplishment is getting those other people on board to do the project. So it's about open source is kind of like the gratification. Yeah, I mean, as an engineer, right, I don't know about the rest of the engineers in the room. As an engineer, I feel a lot of pride when something is done regardless of whether I did the whole thing or whether I was part of the team that did that. Yeah, anyway, so anyway, the reason I came is because I read that article. And I figured the topic said BS software and open source is zero. I thought that's interesting because he didn't say hero, he said zero. I see, my title worked. That was the title worked. Thank you. Anybody else? Yeah, right here. We want to, since we're recording, we want to make sure we get you. Okay, so in the slide, I think it was a slide before this. You mentioned that this is not motivated. No, actually the one before that. The one about altruism. It's not at all there. It's about removing. So there's no such thing as altruism. I was wondering what was the purpose of highlighting that point? Like maybe, so I haven't been involved in open source communities yet. It's something I'd like to get involved in. So I'm thinking that the motivation behind that might have been something you or others have experienced or maybe a misunderstanding. So I was wondering if you could talk a little bit more why you put that up there. Yeah, so maybe not so much in 2017, but in previous years, there's been this notion that from companies, why are we doing open source? We can't be, are we doing it from PR bump to be good open source citizens? Sure, there's part of that, right? I mean, there's some value you gain from that. But the vast majority of, and this kind of goes to something that, I'm at scale, so I have to be really careful. It's more of a free software conference as opposed to an open source conference to a degree. But the number of people that are contributing to a lot of these projects that are individual contributors not being paid for by companies is fairly small. And that's great. I love those volunteer contributors. They're awesome. They're kind of the volunteer contributors that are in this smart things project that I'm talking about. But companies are participating because there's value, right? That's what I'm saying is that they're not participating just because they want to be perceived as good citizens. There's business value. There's strategic value for them participating and having influence on these open source projects that have to be sort of balanced with one or two companies take over an open source project. This has happened in the past. I won't mention examples, but several people here probably know which ones I'm talking about. You don't want that either. So that's why I'm just saying that you have to think about the fact that participating is valuable, but especially for engineers that are being paid for by companies, there has to be an ROI associated with that for companies to want to pay developers to contribute to these things. So I saw on your slide that you're a sound person and I used to use a lot of sound software and a lot of sound products until the evil empire bought it. Just so you know, I was part of production in force number two, so I was gone a long before the evil empire. I figured. So the question is that this is something that I still don't quite understand because open source has been around for quite a long time. Richard Stallman has been around for a very, very long time. And yet we still have these empires, right, not contributing, actually stopping contribution, like Oracle dead. And what is the, where have we failed? Where has open source failed for not, I mean, what is the weakness? I would not say open source has failed. I would say companies have failed to understand the strategic value. Yeah, but Oracle is doing very well and all the other empires are doing very well. I was hoping that they will be dead by now. No, so this actually goes to an interesting point that somebody asked me a question about the balance of proprietary versus open source software. And actually it was for an engineering.com article that I interviewed for for a piece we were doing it through on 3D printing at Autodesk. And the author of the article kind of tried to make the point of, especially in 3D printing in a lot of these areas, you know, where is the value in open source? Isn't there, are we losing all of this IP? And I said, no, there's a place for both open source software and proprietary software. By the way, don't shoot me. I know I'm at a free software conference. There's a place for proprietary software and for open source software. So those companies like Oracle, you know what, businesses are willing to buy that, but then you look at a company like Red Hat. Red Hat is the counter example. Companies, businesses are willing to buy things from Red Hat, but Red Hat does a ton of contributions into the upstream. So I think it's just a question of sort of evolution of companies and where they started. You look at companies built in the dot com era, Facebook, Google, Twitter. All companies have contributed a lot to open source. So the kind of companies you mentioned, right? Sun was in that era where, you know, right before I left, they were just starting to figure out what to do with open source, and my colleague Denise Cooper who spoke earlier was one of their lead person for the open source program office, right before Simon Fipps had that written. So Sun kind of got it at the end, but it was a little too late. I figured you would have a comment. I didn't put a ringer in the audience, I promise, but I know this guy. Hey, guys, thanks for the thing here. So much of this seems to be about culture, having past culture on. Are you seeing any of the tools that we're using start to try to build this in? If I go to GitHub, it's pretty easy to make a bug or add code, but even then that's still kind of difficult. It still requires a lot of knowledge. So you're not giving me the lead into my favorite line, culture first, tools last. And I say that so much facetiously, right? Because you can't always do that. But what I saw continually in my career as somebody who worked for tools vendors and I left one of them off, which was CollabNet. You have to think about your existing culture and how you do a better job of moving that to all of these things we talked about in open source, and then figure out which tools make sense for that. So to give you a very concrete example, at Autodesk, before I got there, there was a mandate that came down from the CTO and the CEO of our code moved to GitHub Enterprise. Now for those of you who may know, the AutoCAD code base is 30 plus years old and it's multi-hundred, I'm not telling you anything, it's not public knowledge, multi-hundred gigabytes in a TFS team foundation server repository. That team basically said, oh no, we're not going to get because of Git LFS not being ready and some other things. And there was a ton of pushback, right? So that was, to me, we're still fighting it. There was a lot of pushback around, well, yeah, this is this new Fangle tool, but it's crap and we're not going to do this. Whereas if we had said, okay, culturally we know this set of projects is not going to be ready for a tool to actually help get us to change these cultural behaviors. We'll focus over here and on projects that are more Greenfield newer and are going to be ready to accept this. We could have had a little bit more success. And by the way, our CTO knows this. I'm not telling you anything I haven't told him a million times, right? It's something that I think we learned a lesson from. So to your point, yeah, there are tools, GitHub, right, some of these other tools that we have now that help us force these cultural norms. But you can't just bang your fist on the table with somebody in the company and say, we're going to use this because it's the magic silver bullet to actually fix this problem. You actually have to think about these cultural things as they apply to your company and figure out how you get those, move the needle on those before you bring in a tool to try to actually solve that problem. Okay, anybody else who got 10 minutes or you guys get 10 minutes back in your day? Yeah, we are heading. What's to see the future of this open source and this community? Wow, that's not a broad question at all. You know, I think we're headed for more of the same, I hope, and hopefully to a point where my ultimate goal, I say this all the time, my ultimate goal is to work myself out of a job. I mean, honestly, if we get to a point where you don't have to have an open source program office in your company because your company inherently understands how to do this and how to work effectively with open source communities, that would be awesome. I don't have to go find something else to do, but that would be awesome. I still think we're always off from that. I mean, we jokingly said, so myself and the other four people giving talks and Denise Cooper who gave a talk earlier, if you actually sat down and tried to calculate this, and I did, there's about 40 of us on the planet to do what I do. And there are tons of companies opening open source program offices and trying to understand how to do this. Job security for us, which is awesome, but we need to grow more of people like me. And it's really difficult to pick them out of college to say to do this because like I said, when I got out of college, I was like, I'm going to be an engineer for the rest of my career. It was only through experience and industry that I started figuring out I'm actually better at doing the human dynamics piece than I am through the engineering piece, but having engineering background is helpful. So I think we've got a little bit of a dearth of talent there that we've got to address to be able to increase the kinds of things we're talking about here. I guess you were talking about how like culture first, tools last and being kind of like a good participant in something that's already existing. But I'm just wondering like if you've seen cases where like from your perspective, like maybe your company or you, like you're doing all the right things and you really like the thing that you're working on, but you just can't deal necessarily with the culture that's there and then like how did you kind of reconcile that? Because like you're talking about like being a good citizen and kind of like doing all the right things, but what if the thing that you like is being run by a group of people that don't necessarily, I do it. You have two choices, vote with your feet and leave. The ultimate poison pill, fork the project. It happens. I generally suggest that you try to see if you can garner enough support for the people who believe like you that this thing isn't running the way it should and it should change direction. But sometimes you just have to recognize that you either have to leave and not be involved anymore or leave and fork something and do something else. And that happens all the time, but I think it's not necessarily a bad thing. It isn't and isn't right. If you fork something and it becomes successful and a significant portion of the community follows you, then clearly there was something more, there were more people that didn't want to admit that there was a problem than that other thing. But if nothing else, you get to work on something that you think is interesting and you have a clear conscience about that. Yeah. Can I add just one more piece to that? Sure. So networking also gives you an additional opportunity to create something new or to focus on something new and there I'm specifically thinking of like the BSD projects. You know, they forked for various reasons, but they also gave them the license to be more creative and define a new niche for their product. More like a comment, I guess. Some of the guidelines that you have put there, I think they sound like pretty good guidelines also for individual contributors. You know, this whole philosophy of being zero, sometimes I try to contribute to open source projects which I really like and I would like to contribute. This is some of the things that kind of became horrible. I want to contribute something really meaningful to that and I really don't want to do the zero thing first. I think this sounds like a really good thing, good advice for that individual contributor. Keep something in mind that most open source projects that are successful will want small patches and small changes. So if you have something huge, as I advise two things, if you have something huge like that that you think is going to be really valuable, be open and transparent from the beginning, here's what I'm thinking of doing because that makes sure that the rest of the community isn't planning, especially the maintainers aren't planning something that's going to go counter to what you're trying to do. So you've signaled your intent that you're going to do this and even if you're not at that plus one stage yet you can say, okay, I'm going to do, let's say you think the thing is going to take 15 patches to make that feature happen. I'm picking a number. You can say, hey, I'm going to do these first five patches. I'm going to build the basis of this so all of you in the community can understand what I'm trying to do as opposed to saying, I'm going to give you 10,000 lines of code. Boom, here's a feature implement, integrate the whole thing. That's just good software hygiene practice in any project but especially in open source projects where you want to have smaller changes that are easier to test and make sure that there are no integration issues. Okay, anybody else? We're almost done. We're in like four minutes. Awesome. Thank you guys for coming. I appreciate it. Welcome everyone. Our next speaker is Federico Luchifredi and he's going to be speaking with us today about hardware hacking one-on-one. There's plenty of room at the bottom. Federico comes to us from Red Hat where he works on. Thank you. Yes, like the track chair was saying, I work at Red Hat. I work on self-storage. In fact, I've had the privilege of spending nearly my entire career in free software. I work in Cephet Red Hat. I was the product manager for Ubuntu Server 14.04 and before that I was the so-called more Linux distributions that you can count that none of that has to do with the stock. Well, I guess it's storage in a way. But I guess when you're a manager of your day job and you used to be an embedded developer, this is what counts as fun. So obligatory disclaimer. Most likely we'll break some hardware doing this and it will all come out of your pocket. So if we bring out the end of the world or some of these though, it's your fault. Most likely all that will happen, however, is that you'll just break some device. And in this particular talk, actually recovery from breaking is quite easy. So we're fairly safe. But you've been warned. Now we have about 60 slides and 60 minutes. So we'll try to hustle a bit without melting anybody's brain. For those of you that haven't thread carefully enough the abstract, the idea is that there is a processor in some SD cards out there, an ARM processor specifically. And we want to get to it to run Linux in the SD card. Not as a file system, but using the SD card as a computer. Now, how could I feature somebody's blog? This is now a while ago. And from there it pretty much went viral. With what you would call sort of a word of web process, environmentally blooming all over the place once it was clear that there was Linux in some of the SD cards. Lots of blog out there. SD card size, effectively the computer looks like a Raspberry Pi. So can we do something else with it? So the community jumped straight on it. And the big question was, can we do something more interesting than what these cards are meant for? Which is copying pictures from your camera to your laptop transparent. So I noticed that this was going on and I wanted the project for Christmas and now this was two years ago. So it has taken me for quite a ride. Let me repeat that one more time because every time I give this talk there is somebody that's lost about what's going on. Here is the concept. In there there is a processor. We are going to run Linux inside of the card. Not using the card. The cards are Wi-Fi enabled so your Linux system is also a wireless. This is meant to automatically download your pictures and there are multiple models. This is not the most popular model in the United States. The majority of the non-US models all come from the same hardware from an original design. That is like the Edison in the original format that Intel promised us but never delivered. So it's for real this time. A lot of these blogs are not in English. So it's quite an international effort during this time. I had to read stuff in French, Japanese, German, Korean and of course English. Latest is your best friend here and there is also copious hilarity as you run into the limits of automated translation but it's usually clear enough. So the card that I'm looking at today is Wi-Fi SD from Transcend. This comes in sizes from 8 to 32GB. I prefer this particular model as they are more open than other cards that I won't name that are very popular in the US. There is no attempt at a platform strategy There is no evil MBA trying to create a platform strategy out of it by adding software you don't want in the middle. That's pretty bad way. Or trying to make it easier. It doesn't have to be a nefarious reason. But the Transcend cards are very straightforward. They are selling you the hardware. The software is there to make it so that you buy the hardware. It's not trying to be the other way around where the solution is. So the lack of these stepping stones sitting in between the card and you make them much more interesting embedded devices. The software on the Transcend card is simple and it's not perfect. And that's actually an asset as we will see. So where do we get these cards? You could go to Amazon. You could also go to eBay. And it's going to be between $40 and $60 depending on the capacity. The price of a Raspberry Pi and that pretty much. There are others besides the Transcend like the PQI Air Card that are functionally the same. The PQI Air Card is interesting because it's actually an adapter card so you can put any storage you want in the Microsoft format in there. And the adapter has actually the embedded system. And Toshiba makes another one. And I think there is another one called FluCard. And you can carry binaries between them. As a matter of fact, when we log in into the system you may notice that there is a command that says BUZZ. But there is no piezoelectric transducer in this card. So what is that for? And then if you actually look at another card one of these cards has a piezoelectric transducer. So the binary support package for all these cards is the same. And you can carry binaries between them not that you would because they pretty much have the same stuff. But it's an interesting thing because you can discover properties from one of the cards and figure out if they apply to the others and they usually do. In any case, I'm getting the rail here. Back to the point, this is what the postman will deliver. Inside the package, there is a bit of an unboxing here since we cannot do it live later. There is a card, an adapter, the inevitable legal disclaimer if there are still some lawyers from the track yesterday you can blame them. And there is a microscopic manual, which is not bad. The adapter is the interesting part. So the SD card, aside from the fact that it has Wi-Fi in it externally and functionally is an SD card. The adapter is there because you can write to your SD card directly by mounting it into your Mac, for example, or over the radio. When you have a file system mounted two places and you're writing to it at the same time, bad things happen. So the adapter is there so that when you want to physically insert the card somewhere the adapter tells the card to switch off the radio. So there is writes coming only from one place. Of course, we are going to get around that as the first thing we do but we think about what we're doing before doing it. In general, you should be using the adapter. We get around it because we want to use it as a standalone device. So we assemble it so that the radio is not turned off because that's the only thing we want to use. This is an SDHC class 10 card, 16 gigs in the case of the ones that we're using. Half and double the size exist. This is a full size SD card. The pictured model is TS16 GW SDHC10. But you will have the slides at the end, so if you want to search for the cards you have all the model numbers. So physically breaking into the device voids the warranty, I bet you did not see that one coming. Exacto knifing our way to success, we get this. On the left side are the case, nothing too strange there, but it may be helpful to guide your own in the section if you need to open yours. On the right is the card, of course. If you see that little yellow bit in the middle, that is the right only toggle that SD cards have. And there is one interesting technical fact here. There is nothing on the card that reads the state of that toggle. The reason for this is that SD cards work like floppy drives. And on floppy drives, for those of you that still remember those things, the right protect is enforced by the drive, not by the disk. And SD cards are exactly the same. They have no idea what's going on. If a right comes in, they take it. But the drive is not going to write to the card if the toggle is in the right position. The card doesn't know and doesn't care, doesn't have to. So let's look at the components inside from the top slide. There is a trace antenna. Definitely that's not a 3 dB antenna. And it's not 100 millivolts broadcasting power because the card as a whole consumes less than 100 millivolts. So we know that we're not going to get full spec Wi-Fi. However, that's okay because we're still beating the pants off of Bluetooth. So there is no antenna connector. You could try to solder one on if that's your thing. So I'm going to assume that if you don't, you're not going to get full Wi-Fi. I think that's entirely fine. The SOC with the ARM CPU we are so interested in is right below the radio. So the radio is on the top left, right next to the antenna. It kind of makes sense. Then there is the SOC that we're interested in. It's right underneath it. 16 gigabytes of flash are in the big chip. And the flash controller is the last thing on the right. And up top you can see the notch that the right protect toggle slides in. But as I said before, the card doesn't care. It's just making room for it to exist. There is no sensing of any kind. Besides the standard SD leads, some of the smaller pads are TX and RX for a 3.3 volt 38400-baud serial. Annoyingly, really annoyingly, they are not labeled. They labeled every single resistor, which we couldn't care with about. But the pads for the serial, which are the useful ones, are not labeled. On the PQI air card, the pads are labeled. So bonus points to those guys for that. If you do wire serial, this is actually interesting. You get access to a full U-boot console. U-boot is the boot loader of choice for embedded devices. And because the U-boot console for this card is actually configured better than the kernel is, it's actually quite nice to have access to it. U-boot is pretty powerful these days, but U-boot is even more so. So you can do a lot of things to the hardware straight from the boot loader. And because the kernel is being stripped down to the bones because of constraints that you will see in a second, having the boot loader as a tool is quite cool. So incidentally, I've been collecting all these details for a survey article so that people eventually can get all the details from one single point. It's been in wait-state for a while, but I'm hoping to get it finished this summer. If you are going to hold your hacking until then, you'll get all the information from my article. If not, and you go hacking and you publish something new, please write to me and send me the address of your blog so I can add it to the survey. Let's see what's in it there. So as we said, there are four chips. CPU, Wi-Fi module, flash controller and storage. The SOC's 32 megabytes of RAM are the weakest spot of the board. So for those of you that are used to gigabytes of RAM, fear not. For an embedded device, 32 megabytes of RAM are plenty. Plus those of you that remember REL6, the same kernel as REL6, 2.6.32. So you did a lot of things with that kernel. It probably had more than 32 megs of RAM available, but it's a lightweight enough thing. RAM is not going to be the problem because you're not driving any UI. You're just driving some very simple problems. And I've worked on embedded systems that had orders of magnitude. No problem there, but surely a difference from Raspberry Pi where you have lots of RAM. You have lots of storage. And it's flash storage, so it's the perfect use case for swap. If the kernel was built with the swap option. In the SOC, there are eight megabytes of NOR flash, and now you'll understand why the kernel is so strict. These eight megabytes of NOR flash is where the entire system lives. They stripped down the kernel to compress it with an Intra MFS. They had to make lots of compromises, things that I have not seen before, like taking out options that usually one doesn't think of taking out. But they managed to fit everything in eight megabytes. Now you could get clever and do this. The system boots, brings up Linux. Once you have that, you have the 16 gig system over there. So you could have a file that is an image of the system that you really want to run, not the piddly eight megabyte system, the real system. You are mounting that loop, your loop back mounting it, then you're trouting into it, and you have a better system. However, I'm sorry to say there is no loop mount in this system, in this kernel settings. So you can see a pattern there. But there is enough slack space that we can throw away some other options, and that's the ones we need, so that is fine. The extreme of things that were stripped is that the config for the kernel is not even there to go properly configured. No, it's not there because they stripped that, too, to save space. So a number of the community called Dimitri Grimberg, if I remember correctly, has gone from the casings of the kernel, and he has recompiled until he matched the casings on both sides. So now we know what the options to rebuild the kernel are, and he wrote a very nice article on how to do this. We're going to build kernels with the options that we want. Now, what is it we're getting for all this effort? The processor is a 200 MHz CPU. There are some articles that report 400 MHz, and they are wrong. They are confused by the fact that the card reports 426 bogey mitts at start-ups, which is a different thing. It is an ARM 926EJS core. This is an ARM 9 processor. It's not a Cortex, not A9. It's an ARM 9. It's a much older thing. About 5 billion of these have been built over the years report ARM. This is a very low-end, very low-cost processor. It has interesting extensions for Java, but the, that's the J in the name of the processor, but the jazelle runtime is not part of the BSP, and you would need to sign an NDA with ARM to get there. So, forget Java, but we have something better than Java here, so worry not. Now, let's look a little bit. So J is Java, E is BSP enhancements. This processor is old enough that it was designed to be a phone processor for feature phones, and those are BSP processing for voice. Probably not very interesting enough nowadays. The Java extension is cute in a technical sense. It allows the processor to run bytecode natively. It actually has an instruction that jumps to Java bytecode, and it can execute, I think, like 130 out of the 200, and so bytecodes and JVM specification. It can execute the bytecodes. It cannot execute the traps. These are done in software, but pretty cool if you think that this is probably 2005 technology. But let's back then, and let's look at what matters now. This is actually a very similar processor to the BCM 2835, to the Broadcom 2835 that is in the Raspberry Pi. There is an ARM 11, but it is quite close, and it's an ARM V6K, I believe. But most distributions don't compile for ARM 6. They compile for ARM 5. So effectively, you would get the same distribution for both, or actually most distributions don't compile for new webpacks. But the ones that target this kind of hardware, unless it's a specific Raspberry Pi distribution, it will go for 6 exactly. It tends to be ARM 5. So the processor itself is a close cousin of the Raspberry Pi's processor. And the idea is that instead of the chip in the Pi, you can use the chip in this card. You have some different trade-offs. You get built-in radio. You lose all the interfaces. The price stays about the same. So the radio is a fortune chip. Yes, something, something, but it doesn't matter at all. It comes with an Atheros AR6003 Wi-Fi Core, which speaks Wi-Fi BG and that, which is interesting. We don't need to do anything. It just works. It has a really low power draw. It's marketed as the smallest Wi-Fi silicon you can get. And if you feel going turtles all the way down, incidentally, there is another CPU in this chip. This is a MIPS CPU used to control the baseband radio. I haven't touched it. It's an extensive MIPS Core. I think it has 256K of RAM and 256K of ROM. Maybe that's your next hacking project. I don't know. It's a 32-bit MIPS CPU anyway. The part that's interesting is that you do, well, this is very low power, really nice. You also should, it's also very low power, as in throughput. You do not plan to saturate anything with the links coming out of this. Pretty much it won't go much farther beyond one megabyte per second. But that could still be adequate for many users. Storage, 16 gigs of flash, so you will not be wanting for storage space on this device. That's for sure. That is not the problem the system has. And then there is a flash controller made by Siliconmotion. Nothing of interest to us here except since the community conference we can talk about the fact that the community got upset at the manufacturer about this because the initial firmware releases did not include the source code for the kernel modules. Sources were requested, angry emails were exchanged, and then things just got right. And the manufacturer very nicely published the sources of their modules. The only one that we're missing is the sources for this piece of hardware. Usually flash controllers are protected by strict IP protocols similar to the things that we see around video cards. So that's probably the source. It doesn't really matter. That's not the chip we're interested in. So that one is a proprietary terminal module. The other ones are all willy-nilly standard terminal modules. They probably are not in Tretha. Let's look at what is going on here. So the board or other GSD card clocks in between 0.9 and 1.1 watts when it's idle in my office. It peaks at 1.5 watts when wireless clients are connected. This is a 110 volt power supply, of course. This is an Apple-switched power supply. So it is efficient, but still there is a conversion factor in there. Presumably you would get higher efficiency if you were supplying power without the need of voltage change. Presumably also you would get higher consumption if you tried to peg the CPU and max out the right to flash all at the same time while shooting the radio. I didn't do any of that. I tried to be realistic, so I tried to focus on the fact that the radio wasn't used. And these are the readings that we got. Interestingly, I checked to see if I would get better efficiency by supplying power at voltage that's already being converted. And 220 milliamps is about 1.1 watts at 5 volts. So the same measure for 120 at idle. So in other words, Apple really knows how to design switched power supplies, apparently the iPhone power supply is really effective. Let's start looking at the software side. So we're going to do a few things. First I'm going to take a breath. Then I'm going to walk you through what we do to set up the card. And then we're going to look at how we hack it so that when I do the hacking you know how the system works. It's not hanging out in the void and it's not clear how the card was set up in the first place. So this will work. First you set up an app on your phone. Then you get access to basic setup and core functionality, setting up network mode, names, password, shoot and view in that menu. It's a mode in which the phone operates as a monitor for anything that's being shot by your camera. So you take a picture, it's going to show up on your camera immediately. This is cute if you're a photographer, but the reason why I'm pointing it out is that there are some technical provisions in the design of the card that you will need to remember this to understand. It will be important later. It will also offer you to upgrade the firmware. Usually this is a big concern because when you upgrade the firmware the precious security hole that you're going to use for rooting will disappear. But not so in this case. It does not matter what version of the firmware you have, they are all equally easy to get around. So do not worry about that. The latest version is as open as the first. I suppose you could think of it as a good or a bad thing depending on your perspective. I say it's good because it lets us hack the card. Really the security issues that this card brings up do not come from the fact that we can hack the BSP. By default, the system wants to be its own access point. That is great, and it's very convenient in development because you just connect your laptop to it and you connect to the card and you're done very easy to develop with. But this is not convenient for photo download, for example, where you would not want to switch the connection all the time to get the downloads or for your Internet of Things application or any realistic application where you want to connect to another access point. So the fact that this card is a seamless Internet AP client mode is a great strength. It's the greatest strength of this card because others put their software in the middle. Here you just configure that you want client mode and you get it. It makes things really simple. So when the card comes up, it shows. Here it's from another machine in the room. Your SD card has now become its own access point. Then you connect to it and then you can ping it, just like any other host. Let's look at how it works. Let's wrap up how it works before we get into it. Think about it. The app needs to find where the heck the card is. And it just knows that it's out there on the access point. But what IP address? So the card broadcasts on UDP port 55777 saying I'm here, I'm here, I'm here until the application hears it and talks back. Other cards are somewhat more aggressive. There is one that apps the entire slash 24 subnet. Then once it finds out every node that's on the net sends HTTP requests to every single one of them and keeps doing it until it gets an answer from somebody. Then hopefully it stops most of the time. You can fix this, no worries, it's just fine. But it's interesting to document it because it makes it easier for you to find the card when you're starting. Once you've found it the first time, then you're done. You can use the tools that come with the card and not have to find it. You can configure it to be in some predefined place which in some cases will work. But if you don't want to use the tools, it comes with a default setup of user admin and the password of 12345678. That's not the part that is necessarily funny. If we port scan it, we see what's going on in here. This one has talent because they already cracked it. The other one, port 80 web server. The architecture of the application is very simple. So the app gets pictures from the web server. It can also call actions by executing CGI scripts that are on another directory. The additional port is connected to the shoot and view mode that I was describing earlier. When you're in shoot and view mode, if the app connects to that port, it gets pushed to the URL of every new picture as it gets taken so it knows what picture to download. And that's how the shoot and view system works. The architecture is very simple, not bad. A few quirks, like it downloads every picture twice. One wants to give you a thumbnail and one wants to actually show you the picture. But there isn't really anything else going on. And probably they prefer not to compress thumbnails on the card. It also assumes that you're not downloading two pictures at once, which you are not, so that's fine. It's not overly complicated, but it's nice because it's simple and it's effective when there is no need for more simplicity prevails. You can actually browse the card's HTTP server directly if you know the API process. You just go there and you get the Boa web server and it exposes directory listings. In theory, it should only show you the directory listing of the data, not of its secret bits. There are CGI bin scripts to set configuration and they are in pearl. The app simply downloads files when it wants the pictures and calls CGI bins to configure the system and that's how you get in. So this is the way the card was broken. This is another community member called Pablo who, I think it was 2013, thought that there must be a CPO in these Wi-Fi cards and obviously it turns out he was right. He went out and started poking at the software, looking at it for security holes. I find several the size of a barn. Like the fact that you can look at the web server like I was doing a second ago and visit interesting paths like dot, dot, slash, dot, dot, slash. And so these breaking paths allow him to access parts of the system he shouldn't see, finds the CGI bin scripts, finds weaknesses in the pearl CGI scripts, breaks in the old fashioned way with an exploit against those. Extracts the files, exploits the scripts. The first time I went this way myself, but there is a better way. There is a file called autorunsh. I think you know where this is going. So write the file to the SD card named autorunsh. The system will read it and execute it with the root privilege. What you really do is write the proper shell script that says, tell it please, and tell it arrives with a password. That's so much nicer than poking at the workflows and all that. Now the security freaks among you are obviously horrified, but in terms of actually working with the card, this is fabulous. This makes it so nice. There is another aspect of this, which is humorously named. I know what you're thinking, but that's not it. It stands for firmware update. It's a clever mechanism. You supply autorunsu.sh and three files. InitRD, kernel, program bin. The system starts program bin, loads it into memory, runs it. Turns out that program bin is actually a copy of the card's Uboot loader with the default script. And the default script loads the new kernel and the new intra, the new InitRD int, which is done loading it, restarts the card, and you have the card back. So it's a very nice way to prevent breaking it because you can come back from the dead if you did something very bad with the card in software. In hardware, obviously, you can still break it, but in software you can come back very easily. And also, this is a clever way to go back to defaults. You don't need to have bricked it. You can just bring it back to manufacturing state for some reason you want. So let's try to do this live. I usually unbox one of these, but in the room of the size that makes no sense. So I showed you the pictures instead. Interesting detail is that the contraption to bring power to the thing is actually bigger than the system. I have to find an adapter that will not radiate a reaction. And after going to my local computer store and buying every single adapter, and then returning nine out of ten of them, I found this one that I will also document. This actually boots very fast. So much so that if I was connected to the serial, I would not be able to attach Talmud and to attach a screen to it fast enough. It would be already there. The first thing that I need to do is connect to the Wi-Fi. Now I need to wait for, well, I got an IP address. It's 2632. This is the most popular Linux terminal of that era. I used it. SUSE Linux 7.11 SP1 used it, and the one version we won't do, I can't remember, I used it. Very well known kernel. Busybox is that version, but it's a stripped-down version of Busybox, which is annoying. And so one of the things to do is replace that version of Busybox with one that actually has all its modules. You can see this partition is a system under 16 gigs. And you can see that I have the Busybox that I want, not the one that the system has in there. So I can go grab it. We don't have the time to do that, but basically that's what we do. We just mount it. I want to show you the system the way it is. Not the way it's. Once you have a real Busybox, you get things like NTP. And that means that it's not 2010 anymore, 2012 anymore, and that means that you can actually do things like SFH. So Busybox first, then you get SFH because the time starts making sense and your things are not in the future anymore. The first side of this is there is the mount for the 8 Mags. There are the root system, and the JFFS2. That's a very popular file system for embedded, or at least it used to be in my URL if it is anymore. But this is a processor of the last decade, so it was back then. So not even a very old Perl. Pretty cool. That saves me from cross-compiling. I can just write whatever I want in Perl. If you are unhappy with Perl and you're a Python person, you can go through the pain of installing that. But if you want to be lazy and do it a little bit of Perl, you won't be able to read it after you wrote it. But it's okay. And then cross-compiling still. The important things are Telnet, which we know everything about. We started it. BOA, the web server, that's exposing the directory of things we were seeing. And if I scroll back, there is the UDP-SVD. There are two of them. They are the ones that take care of spanning the entire network. DNSSD, that's an Abahia. An Abahia type of advertisement. That's interesting. I didn't remember it was doing that. I have actually civilized the discovery of it when I connected to it through the radio. WAKE on Wi-Fi, which is kind of cool. If you had that kind of application where being off most of the time is nice. And you can also turn it off altogether. The question was, can you manage the radio from software? The bodyguard is just there to prevent the overwrite. So it just manages to shut off when you use the adapter. What else do we have? How am I doing with time? 13 minutes. I don't think there is anything too interesting here, except it's overwrite. It's 400 bogomibs. And I did remember the architecture correctly, but 5TJ is really one of the most common ARM processors of the last decade. So that's easy. That also means that if you don't know how to cross-compile and you don't want to cross-compile, you can steal binaries from other devices of the same architecture, like third-digital hard drives of about seven years ago, or hardware, or a lot of other. Re-compiling and replacing the kernel will be because we need swap and loop in here. Those are really strange things to remove from a kernel. So you get the idea of how strapped for space. Suppose it's kind of strange to look at the network interfaces of an SD card. There they are. And there is nothing really strange about them. So remember flash partition under mount SD. You're SSH'd in there. You can do things to MNT SD, and it's your big storage. When you're doing things there, if there is something else that's manipulating that storage, like instead of having it this way, I have it mounted on the Mac. And the Mac is, I don't know, indexing or whatever. That is bad. So think about this. You can repair it very easily. You just remake the file system, but that's going to be a very easy way to undo your VFAT partition. Orbitrate accessing the card from your own Linux versus any external thing to avoid corruption. What do you have in software? Well, a few things. So kernel 2.6.32.28. Greatest kernel ever. Well, not anymore, but back then. Three modules, which we discussed already. Visibox 1.18.5 with most of the plugins missing. But CalNet plugin isn't there, which is why we used to break in. FTP is there, DHCPD. It also has DNS, because remember that this card has to work as an access point. So when it is a proper citizen of the network and it's there, advertising is an access point. It's actually handing out IP addresses to whatever connects there. So it has, actually not DNS, but I'm saying DHCPO. In the version 1.7, this is the version 1.7 of the firmware, which I think is the last I've seen. We already discussed the module. So it's the kernel, no swap, no loopback, no nothing. So going back to where I was saying, Dimitri Greenberg documented how to replace that. And I'll package that in my format when I finish up my survey article. We would create in the large partition an image as a single file. We would loop mount that and we would drill it into that and that would become our main system. That would be our strategy to use this as a better system than it is today. There is someone that has taken this to the extreme, downloaded an Ubuntu 9.something image which is compiled for the right architecture. Did exactly what I described. So put the image in there, loop mounted it, just wrote it. Then forwarded X, started Firefox, and then complained that it was... It's quite amazing that they got that far. So not very interesting to run Ubuntu from 2009, but it's nice that you can get that far. And we have a replacement for config, since the original config doesn't ship with the image. As I said, we have Perl, so we have everything we need. You can get your own language because you can. But if you want to be lazy, you have a perfect way to build things without doing cross-compile or without using iPackage. There are versions of iPackage for this platform. You would have to bring it in, have the joy of packages, but why? Don't make things complicated if you don't have to. Now, the big question for embedded devices of this class is how do you find them? So you need to make it safe for the internet, in a sense. Cannot use a device like this if you don't know where it is. So there are three approaches that are common for this. One is known config. It's always at the same IP address. Just go there. That's what I did right now, because when it is in access point mode, it's always on the same IP address. Then there is what the card does natively, broadcasts, it advertises it. Besides the horror that I described earlier, there are proper advertisement protocols like MB&S. So multicast DNS is a perfectly fine way to discover things. Third one is announce, which would be that the system itself, once it has a valid network configuration, sends out an announcement. My favorite is sending out an XMPP announcement, but you can use any protocol you want. And then you know this is a unicast announcement, not a broadcast announcement. It's going to a known endpoint. I'm not going to go into all of these. There is a paper I wrote in Linux Journal years ago. Just search for my name. It has scripts and documents how to do all of this for this architecture. And you can just grab the scripts from that Linux Journal article. You could also do it right, and the article shows how to do it, which is right in the standard sense for Mr. Wally here, would be to send a DNS update and say, contact me by my DNS name. I'm going to update the record. In Perl, you can actually write a pretty simple DNS update script and do this kind of updating. And my article includes, I think it was 10-15 lines script that works on this platform to do just that. So let's go back to the security aspect. We were joking about the fact that it's easy to break in, but realistically, this is physical access. We have access to the card, and we're doing things to the card. Sure, we could do it without accessing the card because we could do it over Wi-Fi. This is not really the kind of threat that is interesting from this card. What is interesting is that this brings the attack vector of the last USB key to a new level. So if you read spy novels, I think Tom Clancy was even using this recently. An old trick to penetrate the network of someone that's target of industrial espionage or worse is to lose USB keys in their parking lot. And obviously, someone will find them and will decide to look at what's in them. And this way, they will load some malware into their network. So if you have a security situation, like a military base, people are trained to shoot on sight when they see a USB key. But in a normal company, they're not, and they do exactly that. And so industrial espionage. The difference here is that instead of loading some malware, they even have a radio if they lose this kind of SD card. But the countermeasures are not significantly different. If you have countermeasures for rogue access points, if you are in that kind of security situation like my nuclear reactor would be, you would have a system that shuts down rogue access points by sending negative acknowledgments to any connection that tries to initiate that's not from the proof deck spot. It would do the same here. Or if you have security against lost USB keys, you would do the same to SD cards. And the security in this case is usually pouring glue into every single port that people are not supposed to use so they don't get ideas. But it doesn't change the attack vector in any way. So what is that you should be worrying about so much? Sure, the card is not super secure, but you can plug these holes yourself and run a more proper Linux system. And then there is the other thing, which is that threat assessment needs to be commensurate to your security posture. You're not a nuclear reactor or military base. Do people really want to get to your Facebook pictures? In that case, maybe you need a little bit more security. But for most people, this is not an issue. What is on the opportunity side is that this is a low-power system with crazy amounts of storage and wireless connectivity. It's a really low-power system, so much so that you could power it with a solar panel and just put it on a tree if you wanted. So what can you do with this? Well, you could build a pirate box data exchange if you don't know what pirate box means. Pirate box is the term of art for a system that people connect to anonymously when they are within Wi-Fi range of it and they can drop files for other people to find there. Or a spy's dead drop, if you want to go back to the Intel Community example. Less ominously, you could make a geocache that's solar-powered. But more seriously, with this thing, you can put the server in your wallet. That is the part that I find fascinating. It's really tiny, it's really low-power. You could make a server throwaway instead of an LED. You could actually build a Beowulf cluster in a shoebox. Also, I would recommend you really don't do that. One thing that someone has suggested is you could put this in the SD card port of your car stereo if you have one. And then when you park on your driveway, your car connects to your network and downloads your latest songs if you don't have to do anything. There are a bunch of things you can do. If you have ideas, tweet them to me. I'm always interested. There is one limitation. SD cards usually have SPI interfaces. And that would be awesome because SPI interface means Arduino. This card doesn't have it. So no, that won't work. But this is still very interesting. It's a very interesting platform to experiment with. Cross-compiling is an option, but you can take the lazy way when you're prototyping and just use Pearl. I solved the discovery problem for you. You have all the scripts. And this is very low cost. It's the same cost as Raspberry Pi, so you have to decide are you willing to trade RAM for really low power and the small format or gather around. It's a fair question. It's a very nice hacking platform. Transcend has been very nice to the community. We broke this years ago and they haven't complained. They haven't sued anybody. They haven't done anything untoward. It's a good thing. We get hardware to play with. They sell more hardware. So great for everybody. The way that we use to break in is actually something that comes out of the SDK of how the hardware is made. But nobody knew that because the SDK is a secret. So we had to discover the hardware. We need a better destroy image than the craft that we have seen. But that's relatively easy to do. And there are some out there. There are some that has worked for as many limit distributions as I have that is not acceptable. I think that we can do better. But there are adequate things to at least experiment. So if you come up with new ideas on how to use it, please send them to me. Here is my contact info. You can find me on Twitter. Send me an email. Remember that speakers are Pavlovian devices. So if you like the talk, please let us know. Write a talk. Send us comments. Submit feedback, all that. If you do something, let us know what you did. So we can spread the word about that. And the slides will be made available momentarily. So you'll have all of that. And I'm available for any questions if you have a testing. Hey, did that go too much? Okay. Welcome, everyone. Thank you for staying so late. I'm here to introduce Jason Lee, who is a tech writer and evangelist at Datadol. He was previously at O'Reilly and before that at MongoDB. I'm here to introduce you to enjoy International Whiskey. So if you enjoy this talk, please feel free. Thanks. Thank you. Yeah, so thanks for the intro. I'm a tech writer and evangelist at Datadog. I used to say that a bunch, and it's really long. So now I like to just say that I do docs and talks, which is pretty apt for what I do. I'm also an organizer of DevOps Days Portland and on the DevOps Days global team. So if you are into DevOps and that type of stuff, definitely check out your local regional DevOps Days event. They're all fantastic. They're all run by local people who are just passionate about it. Derek mentioned that I'm a whiskey hunter. I do enjoy whiskey. I'm also a travel hacker, so I fly just to get mild, which is a crazy thing, but you can talk to me afterward if you're into things like that as well. On Twitter, I'm gitbicect, and by email, I'm jason.e at datadoghq.com. So if you have any questions during this talk, we'll have some time afterward. But if you think of something and you don't get to ask it or you go home and you're like, oh, I thought of something, feel free. Just shoot it at me on Twitter or by email. Totally happy to answer your questions. So as mentioned, I work for Datadog. This is not going to be an infomercial, but just have to say a little bit. Datadog, if you're not familiar with Datadog, we're a SaaS-based monitoring solution. To give you an idea of the scale of what we handle is roughly 15 million data points per second, which comes up to over a trillion data points per day. And obviously, we're here at scale, so we want to talk a little bit about open source, but we do have open source clients. The agent, if you're running any of your own on-prem stuff, is open source as well. That's widely-accept pull requests. That's all on GitHub. And then for anybody who's looking to work on interesting creative challenges, we are hiring. So we're hiring for a ton of jobs. We're looking to double in size over the next year. On Twitter, it's Datadog HQ. So don't tweet at Datadog without the HQ. That's actually a Black Labrador Retriever. If you tweet at Datadog, he tends to like to make fun of people. So Datadog, like, we monitor stuff, we have this huge ton of things that we monitor. And it's one of those things that sometimes when I work there, it's overwhelming of all the things that you have to monitor now, or you want to integrate into your monitoring system. But what's even more overwhelming, I feel like, is this explosion of monitoring projects and tools that we have today. You constantly hear people that are using, obviously Datadog, but New Relic, a lot of people using them, or Dynatrace. People are like, how does that fit with things like, I'm running an IGOS, or I want to move to Prometheus, or SENSU, where does this all fit? And oftentimes I talk to people, and they're like, I've got this thing, and I want to implement this other thing, and you're like, why? They're the same thing. If you love what you have, just keep running with it. So that's kind of where this talks stemmed out of. Trying to help you make sense of this explosion of all these projects and these services that are all trying to compete for your money. And they don't care if you double buy. They don't care if they're just implementing something you never use it, if you're paying them. So before I dive into this and give you a framework for understanding these different monitoring tools, it's helpful to understand how we should evaluate monitoring. And that really comes down to the four qualities of good metrics. When we're gathering data, what should we actually be considering? So the first quality of good metrics is that they have to be well understood. Essentially, if you're gathering data and you understand what it means, but nobody else on your team does, or nobody else in your organization does, that metric is completely useless. A good case and point of this is actually a graphic. Anybody familiar with the Mars orbiter? Yes, I see some nods. For those who don't know about the Mars orbiter, clue number one is that the Mars orbiter is not orbiting Mars. So the story goes that NASA was working alongside Lockheed Martin in this partnership. NASA traditionally uses metric measurements, things like kilograms and meters, things like that. And Lockheed Martin being an American company traditionally uses imperial measurements, so things like pounds and miles, feet, inches. And everybody thought they were on the same page. They were just looking at numbers. And everyone was calculating the trajectory using the wrong units. So the Mars orbiter now has crashed into Mars. Quality number two of good metrics. They should be sufficiently granular. So last year we had the Olympics down in Brazil. And one of my favorite events is the men's 50 meter freestyle for swimming. It's an extremely fast event. In fact, you can see from these times, this is the final metal race. And some of these times are roughly 21 seconds. But we understand when we're measuring things, particularly things like the Olympics, that granularity matters, right? If we have the granularity of per second, well congratulations, everybody here, except for the guy that came in last, you know, everybody would have won gold because it's all 21 seconds. It's like the Special Olympics are all winners. But we understand that that's not the case. And so for the Olympics in this event, we come to learn that we need to measure in hundreds of a second. And sometimes we still get ties, right? So maybe they'll move to thousands of a second. But we also understand that with our systems. So when we're computing, oftentimes we're talking about things in milliseconds. There's sometimes even nanoseconds. And so we need to start to evaluate our monitoring solutions on that, right? What sort of granularity can they give us? The third quality of good metrics is that they need to be tagged and filterable. We've grown to a world where we have this vast scale and we have distribution. That's the whole point of the cloud, right? That's why the cloud is so popular. We can easily spin up and tear down and to distribute. And that makes the amount of data that we're generating huge, right? I said that, you know, at Datadog, we're doing over a trillion points of data a day. How do you make sense of that? Well, you need to be able to tag and filter, right? You need to be able to tag to know from what sort of data it is, and then you can start to run queries against it. And the final key quality of good metrics is that they've got to be long-lived, right? You need to keep your data around so that you can see trends and patterns, and you can actually start to analyze them. With those four key qualities of metrics, let's try to make some sense of this explosion of all these tools. And the way that I like to do that is to take a look at our application stack, right? We need to understand that what our systems are doing and how our users are interacting with that. So traditionally, that starts with a front-end client where your users are, whatever they're interacting with, whether it be a web application, a mobile application. How are they using that? What are we measuring? What sort of metrics can we gather from that? And then traditionally, we have the application bucket, right? This is usually what people call the back-end, right? So it's where a lot of times your databases are. It's where your custom code is running or your web code. And then finally, we transition to the infrastructure, the thing that runs it all. But one of the tricky things about that is that it tends to leave these gaps in the middle. You know, if you're running in Amazon, is that your infrastructure, well, it could be if it's like ECS, but Amazon also has DynamoDB. That's the database. Is that now application or infrastructure? So we need to think of things less as an application stack in these traditional buckets and start to think of them more as a spectrum. So following the request and following the lifecycle of these requests that users are performing all the way down to the infrastructure that we're running. And hopefully that allows us to be able to cover all of the gaps. And it allows us also to see some of the overlaps because as we start to evaluate monitoring systems, a lot of them do have these overlaps where they're not cleanly in the buckets that we think we have. And oftentimes, marketers like to use this or use this to get us to buy more or to believe that their products can do more than they actually can. So understanding these overlaps helps us to understand where specific products may cover and where that coverage may end. So on the user side, this is traditionally called performance monitoring. And there are two types, synthetic and rum. And we'll dive into these later. But that also has a little bit of overlap with the next type, which is our application monitoring. An application monitoring similarly has two traditional types, APM, which is application performance monitoring, and just your application monitoring. So application monitoring being getting the metrics directly from those applications. And we see that there's some overlap here because you have things like what happens with a mobile application. Your application is no longer running on what's traditionally the back end. It's running on somebody's actual device. So is that now the front end? Well, maybe, right? So we'll get into that with where APM is and being able to maybe run APM on the client side. But as mentioned, when we're talking about things like AWS, you have things like databases which are traditionally thought of applications, but now they've sort of become part of your infrastructure. So there's some overlap there. So infrastructure is the next bucket sort of monitoring tool. So overall, just keeping these in mind, these are the three of the areas that we'll be talking about for monitoring. So let's jump in to the first, which is the infrastructure monitoring. So let's build from the bottom up. So why is infrastructure monitoring important? Well, obviously it's important, as I said, because this is what runs all of your applications, right? So you need it for that. More importantly than that, infrastructure monitoring is important because downtime costs money. Simply put, if your application isn't running, you're losing money. Just some quick statistics about how much money you might be losing. Well, last year, about a year ago, Amazon.com went down for 20 minutes. They lost $3.75 million out of that. That's the retail side of Amazon. We all are familiar with S3 going down just recently. The number for S3 that I saw was that the outage cost them $15 million. But that was... I haven't seen where that money or what that number was generated from, but that's the number that's getting thrown around. $15 million for the S3 outage. I'm pretty sure it's more. If you are running S3, you can get a credit back. Corey Quinn, who's speaking next door, reminded me of that. Other money lost. IDC did a report. They surveyed the Fortune 1000. They roughly estimate the average cost of infrastructure downtime $200,000 per hour. And then, as I mentioned before, I'm a travel packer, so I'm really interested in the airline industry. In January, Delta had an outage. And Delta's outage forced them to cancel 170 flights. The estimated loss on that was $8.5 million. So that was just for a couple hours of downtime, which then cascaded. But more importantly, from that that I thought was super interesting was that happened on a Sunday and I came back on Monday and took a look at their stock prices. They had actually dropped almost 4% because Delta has been experiencing a number of somewhat frequent outages. So the cost when we think of that of outages and why we ought to monitor, sometimes we always think of, well, what's my lost revenue? We often miss what's my lost opportunity for having engineers work on that. And the big thing that people always miss out on is what's the loss to my brand, right? What's that lost trust? So the benefits of monitoring, why, right? GI Joe says, knowing how to battle. So monitoring gets us decreased in the meantime to detection. We need to know about things so that we can fix them. And then also they decrease the information that we can so that we can actually resolve those issues. So let's jump into those four qualities of key metrics to start to evaluate. What should we be looking at in our infrastructure monitoring? Well, first one is well understood, right? First key quality, well understood. Simply put, it has to be easy to use. Everybody in your organization has to be able to use your monitoring tool to get the information that they need. This means, is it easily deployed? Can people generate dashboards? Don't be the dog. Know what you're doing, right? But more than that, more than just using it, does it integrate? Does it have, is it able to monitor the things you need to monitor? Whether that's your cloud services, whether that's your on-prem stuff, whether that's all of the servers and containers if you're moving to containers. Can it monitor ephemeral containers as you're constantly spinning them up and carrying them down? And you want this all in one place, right? The one ring to rule them all. Single pane of glass, because if you have separate infrastructure monitoring tools, it inevitably means that you're not going to be looking at one of those, right? Nobody wants to look at a ton of monitoring. Nobody wants to look at monitoring at all. Hopefully we have automated notifications. But pulling it out all into one centralized place means that you're not going to be missing anything. Quality number two was sufficiently granular. So you need that granularity on your services. And so I like to point out the granularity that a lot of the major cloud providers have. So if you're running in AWS, you're getting metrics at one-minute granularity. Similar with Google. Azure does a really interesting thing. They do a roll-up. So hopefully you're not running in Azure, or if you are, hopefully you have another monitoring tool because Azure one-minute granularity up to 24 hours. Okay, so if I have an outage today and I see it today and I'm out for a minute, no big deal. I can see that. But beyond one day, they roll up, right? So they roll up to one hour. So if I have an outage on Sunday and I come in on Monday, yeah, that one-minute outage is now averaged out into an hour. So essentially I never went down. I guess that's good. I mean, depending on my SLA, right? You're just like, look, I didn't go down. Third, pre-quality. Tagged and filterable. As I mentioned before, the main advantage, the reason that we're all moving to the cloud is the ease of scalability and distribution. I can spin things up whenever I need them. I can put them in different availability zones or geographic regions to make them faster, to make them more resilient. That generates a lot of data. So being able to do things like take a metric point and not only say what it is and how much it is, and obviously when it happened because we're looking at time series data, but to say, like, where it came from, what application was it involved in? So rather than just looking at dashboards, I can start to query against them, right? I can say, show me everything from my availability zone or my geographic region, everything from the eastern U.S. that is running a certain size of server, say a T2 small, that's also part of my database system, right? So I can start to drill in and make sense of all of the data that I'm getting. And then finally, number four. So they have to be long-lived. Thankfully, if you're running AWS, they bumped up their retention, so they're now storing for 15 months. So 15 months essentially gives you one year plus a quarter, right? So you can now see those annual cycles. How did I do this quarter that last year? But they do do a roll-up because they don't want to store all of that data. Google Stackdriver, they only keep things around for six weeks, and Azure is only 90 days. Datadog, we match with AWS, so we keep things for 15 months so we don't roll them up. So, talked about infrastructure, let's move up to application monitoring, right? So we talked about what we're running our applications on, what do we know about monitoring the applications that we're actually building or the open-source applications that we store that we're actually running? Well, obviously, they're important because these are critical to our business. And as mentioned before, it's important to monitor them because we want to avoid downtime. But this is where performance actually starts to come in. Flow performance costs this money in addition to downtime. So a few stats about that. The graphic here is because this is often how I feel when things are slow. And it's actually how most people feel when they're slow. So one of the interesting stats that Tammy Everts who wrote a book for O'Reilly on this, she discovered that there's actually a greater bounce rate for slowness than outages. There's this interesting phenomenon that when people go to a site and it's down, only 9% of them won't come back. Everybody else really just understands that the site's down. I'll just come back later. But when a site is slow, that reflects on your brand. And people just think your brand's crap and they'll go to your competitor. So 28% of people will not return to a slow site. Other stats on this, Walmart.com, they did an overhaul of their website. The team found that for every 100 milliseconds of speed improvement on their website, they grew their revenue by 1%. Google and Bing did a similar study, but in reverse, they took their existing sites and they slowed them down. And they found that just the 1 second delay cost them 3% in revenue and actually increased the time to click by 3 times. So people just lost interest, went away. It took a lot more energy to get those people back. So thinking about performance, let's talk about the first type of application performance, the APM. Obviously the benefits as we discussed, performance impacts revenue. And performance also impacts costs, right? Especially when you're not running particularly your own on-prem stuff, if you're running in the cloud, you're paying for compute resources. You're paying for storage resources. So having slow code, having bloated code, is costing you money. And then beyond that, optimized code actually correlates to less bugs and defects. So there's a term called psychomatic complexity. Psychomatic complexity simply says that the number of paths that code can take, that the more paths that code can take, the more paths that essentially you will be unaware of, right? The more options you have, the more chances there are for edge cases that ends up leading to unexpected outcomes. And so you get more bugs and defects. This is one reason why things like microservices are super popular. Because as we streamline things down, have smaller code bases, to actually control them. So let's jump into the four qualities again, and run them against APM, right? So we want to be well understood. That was the first quality. Well, as we consider APM tools, well understood means does it integrate with the languages that we use, right? It's not very useful if my APM doesn't support the languages that my applications are written in, because then I can't actually use it. But more than that, we're in a polyglot world. I'm sure most of you are working in organizations, or you're not coding in just one language. You've probably got two, maybe three. And you're probably looking at why is Go and Rust? Like why are Go and Rust so hot? Maybe we should get into that. So starting to think of what languages can your APM support, right? Will it accommodate your growth? And then finally, does your APM actually integrate with the other applications that you're using? So can your APM start looking at your data stores and see how those interact with your custom applications? So the second quality of metrics, sufficiently granular, right? With APM, granularity is generally not an issue. They're all running sub-second. Most are running millisecond or nanosecond. But with APMs, you have to understand that you should be running them in production. And that impacts your end customers, right? If you're spending too much time measuring, and that's causing a slowdown in your application, slow applications cost you money. So your APM should be doing some sampling. And I love the title of this book, How to Live Statistics. If you're sampling, ensure that your sampling is statistically sound. The third quality, tabbed and filterable, as mentioned before, similar to our infrastructure being highly distributed and easily scalable that goes along with their software. We're having a lot of software that's now distributed. So can your APM actually trace your code if people are bouncing from server to server? Can it follow a user's path? And then finally, for for long-lived, right? So you need to be able to correlate your code deploys or your code changes to those actual performance changes. So oftentimes people don't think of needing something long-lived. They're like, well, as long as I can see when my code deploys and any sort of performance change that's good enough, I can keep it around just a couple days. Well, you want to be able to actually have some sort of history on this data, right? Because we're deploying, hopefully we're deploying fairly often. We have continuous delivery systems. So we talk about deploying often small bits of code. It's hard to see how much performance change you're going to get with small bits of code. But over a longer term, it is easier to see these things. So you want to be able to capture and save this information to start to see some trends. Let's jump into the other side of the application and just the application monitoring, right? We want to be able to get metrics out of our applications. So obviously the benefits, custom applications, we want this custom metrics. But this custom metrics are really important because ideally we're focused on the business goal, right? We're all part of the business. Even though our job is to maybe set up servers or to write code, we want to set up a service of the business. So being able to emit business metrics that correlate. But we also want application monitoring to monitor the applications that we don't write because most of us are not writing our own data stores, right? We're using MySQL or Postgres or we've gone to MongoDB or, you know, open source, NoSQL. So we want to know how well those are performing. So for its equalities, well understood, well, is your application monitoring, right? How well documented is it? Is there an API, an SDK? Or better yet, are there integrations, right? Because even if there's an API, I don't want to write that integration if there's already one. So how many integrations can you get out of the box? How quickly can you get up and running? And then the rest of the key qualities. How sufficiently granular? Again, as we talked about, you want granularity on hopefully the second level, and you want them to be tagged and filterable because you do have distributed apps. And you want them to be long-lived because you want to see those trends in what you're doing. So we covered infrastructure and application, both the application and application performance. Let's move up into what's traditionally being called performance monitoring. There are two types, as I mentioned before, real users and synthetics, humans and robots. The benefits, right? So a lot of people think that if I need performance monitoring, I'll just get one. Why do I need both? Well, they actually work really well together, and you should have both synthetic and real user monitoring. So synthetic monitoring is essentially computers. It allows you to write tests, run them whenever you want. It's independent of user activity. The nice thing about robots is because it's not a human to be on things and moving a mouse and seeing things. You can actually test things that most humans wouldn't interact with. So if you have software like API endpoint, REST endpoint, things like that, you can actually write tests to measure the performance of those. On the flip side, real user monitoring, the advantage is you're monitoring real users. You get real-world data out of that. And along with that real-world data, you're getting that diversity of your customers. A bunch of people around the world who are giving you data. You're getting a diversity of the devices and the browsers that they're using, whether that's mobile or tablets or desktop computers. And the great thing about this, and the reason that synthetic in-rum works so well together, is you can use this information and feed it back into your synthetic tests to make those synthetic tests better. So fourth equality of metrics, well understood, right? You can understand what's being measured. One of the problems when you have robots running tests is you don't necessarily always know what they're measuring. But more than that, well understood means can you update those tests easily. A lot of synthetic testing allows you to essentially record macros, almost as if it was when we used Microsoft Word a decade ago, where you just click record and you do things. Which is really useful, that makes it easy, but you do want to be able to script those, right? Nobody wants to just click record and then have to constantly update those. So being able to script your synthetic test is really useful. On the flip side with rum, well understood for rum means can we actually understand what our users are doing? Are we able to see their full session? Can we make sense of when there's a drop-off? Did the user just stop? Did they go have a sandwich? Or did they actually get lost and not understand what the UI was asking them to do? Was it an abandoned cart? Are they going to come back? Things like that. Hopefully your rum system will allow you to have some insight into your user activity. Quality number two, sufficiently granular. So for synthetic, how frequently are your tests running? With synthetic, again, they're robots. You can tell them to run tests whenever you want. It's a balancing act though. You can test your own system by running tests every second, probably not. By being able to get enough information that you can have some reliability on that information. And then granularity as far as what we're looking at in the system. With synthetics, it's really easy to test specific things that we know about, but it's important that we understand to use those synthetics to try to get a more comprehensive or holistic view of our system. Next slide. Similar to what I said about APM, we're running this against real users, and we don't want that Heisenberg principle of us measuring actually influencing the outcome. So we want a low overhead if we're running rum. We want to be able to measure users on their actual devices but not actually impact them and cause poor performance. And then third, key quality, tagged and filterable. We have a variety of test locations. It's really easy to set up synthetic monitoring, set it up on a server, but then again, that's going to be one location. That's generally going to be one type of browser. So we want to be able to have a system that's geo-distributed. So we can start getting synthetic tests from multiple endpoints. We want those tests to be able to have some variety so that we can test various things that hopefully will reflect what our real users are experiencing. And then similarly on the rum side, again, we're distributed. So we want to be able to tag and filter our rum data to know when users are experiencing one thing and when they're on mobile versus another thing when they're on a desktop. Or to know when they're experiencing one thing when they're a certain geographic region versus another geographic region. And then finally, long-lived. With synthetic, can we see trends again? So keeping your data around hopefully allows you to see trends to improve your performance. With rum, it's really interesting because a lot of companies that have been doing rum are actually starting to allow us to correlate those against business value metrics. So having long-lived rum metrics allows you to see business trends, which is really useful. So again, as Kit and Michael Knight, humans, robots working together, you should have both. So a fourth type of monitoring is what I call other or specialized monitoring. One of the things about when we think of this as a spectrum is we have all of these little bits that fall in between. And so thinking about things as a spectrum hopefully allows us to avoid those little cracks. So what falls into this specialized monitoring? Well, network, right? So as we think of the request lifecycle to a user doing something and that happening on a front end and going to maybe a back end, well, there's a network in between there. So if you're running your own systems, are you monitoring your networks? Other types of specialized monitoring? Security monitoring, right? Are you watching essentially the spectrum on the completely other end, the people coming in your back door? There's other monitoring out there, things like configuration monitoring. If you're running your own data centers, there's physical monitoring, so hopefully you have some DCIM, you know how the environmental conditions of your data centers. And then oftentimes there are specialized monitoring. So things like Run Scope, which runs specifically for APIs. So if you're doing any sort of special applications, you might want to look into monitoring that fits those, that may not fit into one of the other traditional types of monitoring. And then finally a fifth area which actually isn't monitoring, but logging. So a lot of people like to think that well, I use logs and I'm monitoring via log, right? Why is that so bad? Logs aren't an optimal way to monitor, right? The whole point of a log is to output as much information as you can, hopefully to debug things. That means that you have computational overhead. Because you're going to be, if you're trying to monitor off of that, you're running a global distributed grep to try to parse out that information. So you have a computational overhead for that, but then you're outputting all that information, you have a storage overhead as well. So as I said, logs are excellent for providing that additional information. You want logs, you want really verbose logs, you want logs that are giving you as much information as you can. So hopefully with your logging tool you're going for a log management tool, not a log monitoring tool. The idea is that if you have proper monitoring, that monitoring should be alerting you to issues. And then you can take a look at your logs. And then your logs will be extremely useful if you have a management tool to find that information to help you resolve those issues. Other tools that often come up when we talk about monitoring that aren't actually monitoring themselves, things like on-call management tools, right? As I said, nobody wants to look at monitoring all day. It's not automated alerts, but getting them is the key part. So oftentimes you want management tools, things like PagerDuty or VictorOps or OpsGenie that will allow you to reach people in the best manner possible. Other tools that often come into play, things like error tracking, right? Similar to logging tools, you want to track your errors. You want to be able to manage those easily and get to that information quickly, especially when you have a storage. And then anomaly detection. So sometimes anomaly detection is built into some of these tools. At Datadog, we have anomaly detection on our infrastructure monitoring. But if you don't have it, if you're running Nygios, which doesn't have anomaly detection, look into anomaly detection. Humans are really great at seeing patterns. But that assumes that you have a dashboard set up to display those patterns. Robots are really great at seeing patterns that humans can't. And so anomaly detection is extremely useful. So just to boil everything down, really, it comes down to 100% of observability is about following your application spectrum. Think about things not as the buckets that they've traditionally been thought of as, but think of it as the complete life cycle. And then remember the four qualities of good metrics when you're evaluating. So remember that things need to be well understood. They need to have be tagged and filterable. They need to have sufficient granularity and that they need to be long-lived so you can see those trends. If you want more info, we actually have a fantastic blog post up on the Datadog blog. There's a link to that. And then if you would like to grab these slides, there's a link to that, bit.ly slash scale dash 100. I'm open to questions now and questions. I am on Twitter. That's my email address. If you think of anything afterward that we don't talk about here, feel free to shoot me an email or send me a message on Twitter. And with that, any questions? It's for you, Lon, because you snuck in. We've got games tonight and upscale tonight. So I'll be hanging out if you have a question, comment up. Other than that, you're free to go. Go enjoy the evening. Thank you.