 Ballroom H. I'm not sure what talk is in here. I'm just text setting up. I'm not sure actually. I know the ballroom on connects. It's like right there. Cause this is, this is G. Uh, I don't know. Congressman, go. You have audio for me now? One, two. Ballroom H. Check, check. One, two. Maybe you can help me out. What's your name, sir? I don't know. I'm a, I just want to run, I have to run in mirrored mode so I just want to make sure that this is, how does that look? I think I have both the headset here. I have both the headset here. The headset's the one you wanted. It's complicated to get on. No, I don't have a problem with it. The funky design. But does it work? It does work. Yeah. This is the headset? Yeah, what you want to do. Put it here. It should have it as just on for the live screen. Now we got to, oh yeah, this does work. I hope that looks real. Yeah, okay. You demand. Do you want a chair or something? You can have it. You sit there, alright? Alright, let's give it two more minutes because there's going to be another 300 people coming in from off the bus. They're waiting for the bus. The bus is going to come in and there'll be 300 people. And then you'll be able to sell your seat on a derivative basis. You can have an auction, right? You know, instead of add auction, it'll be seat auction. And then now that'll be the deal. Okay, we're good. It's 11.30 because I'm going to sell it in time because I do have a lot of stuff. And maybe the bus is late. There's seven rules for a world class technical documentation. Okay, in case you don't know it, this is like an exclusive club. So we're going to close the door in a minute and nobody else will be able to come in. And now that'll be it. So this is Jules. He is now, he's the production manager. Okay, this is a job you have to work your way up to. And we're great. So let's try, let's do a little run here. Let's hit the space bar. Boom, alright. So here's what we are. Well, there you go. Jules rocks, right? Okay, so what you'll be able to do at the end of this, we're going to do, who are you, who am I, why am I here? The big existential question of all time, why am I here? And then I'm going to talk about defining world class technical documentation. The seven rules, real world examples. And I'm going to spend a lot of time on video. Now, I've done this like a lot. How many of you people have seen this show before? Okay, we have one person. He's the plant at the back. He's actually the reviewer from Hollywood Reporter. Alright, so here's what I'm not here to tell you. I'm not here to tell you about how to put a technical documentation infrastructure in your organization. That's another top. And I'm going to be spending a lot of time talking about really about content and content creation at a very granular level, because I've learned some things over the years that I want to share with you. So let's, well, Jules, there you go. Okay, so when this is over, you're going to be able to make documentation that's easier to understand, more engaging, and becomes more accurate over time and clear on purpose. Okay, which is what is good technical documentation? Jules. Boom. Okay, so who are you? How many people here? Let's do a writing technical documentation as part of their job. Creating technical documentation. None at all. Okay, remember the first slide? Oh, did you forget the first slide? Jules, go back. Alright, so let's try again. How many of you people are doing technical documentation as a way of life? Alright, there is a firing squad. Alright, okay, keep going down. Alright, go back. Okay, how many people like it? Like this is like really cool. I can't wait to do this. Nobody likes it. Alright, how many people hate it? I don't wish I could ever do this again. Alright, alright. Alright, I'm sorry. I'm sorry. You've probably, it's, you know, what can I say? You know, nobody liked what's happened until everybody started using it, I guess. Alright, so next one. Alright, that's who I am. This isn't actually an older shot. This is when I was really adorable before I became moderately adorable. So hit the spacebar, please. Okay, that's Bouda up there. My Bouda tells me, you know, you are not your things. Right, because we're on it. And then hit the spacebar. Alright. And then Santa Claus tells me I am my things. Alright, then, okay, then Santa Claus tells me I am my things. Hit the spacebar. Alright, and my dog is in a gang. And this is my dog, H.E.D.Dog. And H.E.D.Dog goes with me everywhere to go. And this is my toothbrush. It's very important that you know about my toothbrush, because if you're going to write good technical documentation, you need to have clean teeth. Okay, I'm going to take it out right now. Next. Alright, hit it. Oh, that's a shot. I've been writing a lot. I've been writing since really professionally since 1994. So that makes me almost old. And so I've been doing this stuff. And I didn't make this up. I didn't have this epiphany about how to write. People taught me, and these were really a breed of person that no longer exists in the world. I need to share this with you. And the breed of person that no longer exists is called a technical editor. They've been eliminated pretty much from our landscape. That, and pretty soon, driving cars. And we're paying a price. I'm going to talk about the price we're paying in a minute. So I learned a lot of stuff from these people. They were ruthless with me. They were just brutal. But they taught me how to write technical documentation. Next. Alright, so here's a dirty little secret. Hit the dirty little secret. Go on. It's all hard. Even the easy stuff. Even the easy stuff is hard. And what do I mean by that? So what I'm going to do, hit it again. Alright, so let's take a look at this piece of technical documentation I have to read from the screen. And what it says, throughout most of the years when silver dollars and smaller denominations were minted in actual silver, the mint value of the silver was substantially less than their face value. As a result, the monetary value is based on government fiat rather than the commodity value of their contents. And this became especially true following the huge silver strikes in the west, which further depressed the silver price. From that time on until the early 1960s, the silver in the United States dimes, quarters, half, and silver dollars was worth only a fraction of their face value. Boom. That's a lot of language. That's a lot of language, isn't it? So what do you got? You got value. You got all this stuff talking about value. Hit it again, Jules. You got stuff talking about, once more, a relationship which you got to know to play. Once more. Oh, okay. Hit it again. Go ahead. Okay. He's doing a great job as a production manager, isn't he? All right. Go ahead. All right. So here's the deal. So we've read this big pile of words. There probably was about 120 words in there. And there's four different definitions of value intrinsic in there. Explicit and intrinsic. Explicit in there. So you got to know what those mean. If you don't know what those mean, you're not going to understand what's going on. Hit it again. Now you have a relationship that you have to understand. The price of silver and the huge silver strikes. You got to understand that. If you don't, you don't even get the passage. Next one. All right. And government fiat. And here's what a government fiat means. I will tell you this right now. Hit it, Jules. Boom. It's a car. All right. Because that's what the reader thinks. This is a car about the government. Go on. Next slide. All right. So what is world-class technical documentation? It is increasingly accurate. It is engaging, purposeful. And here's the kicker, friends. It's easy to understand. If you don't understand what you're reading, I'll let you in on it right now. It's not your fault. It's not your fault. Remember goodwill hunting? It's not your fault. What do you take? I'll take the wrench. Oh, I usually take the belt. That's what most technical documentation is. Do I get to choose the wrench? Or do I get to choose the belt? All right. Next. All right. So here's the seven rules. And those of you that have seen it before, I have changed it. It's not what's in the article. I've seen the light. Rule number one, if you don't want to read it, don't write it. If you don't want to read it, don't write it. Rule number two, don't confuse and don't abuse. Rule number three, before you start, be very clear about what you want your reader to do when you end. Write to an outline. Always. Not sometimes. Always. Next. Clarity equals illustrations plus words. Six. Watch the pronouns. Seven. Embrace revision. Okay. Everyone go home now. Nope. All right. Keep going. Next. All right. If you don't want to read it, this used to be, I'll let you want a secret. I changed this one this week. It used to be dry socks. But I changed it. I'm going to come to understand there really is no individual prescription for how to write interesting technical documentation. Every human being is different. Everybody thinks differently in all that good stuff. But what is important is that it has meaning to the reader and the writer. So here's a trick I've learned. I can, if I had to say one prescription about how to write better technical documentation is after you write something, read it. If you don't like what you've written, rewrite it. I mean, it sounds sort of trivial, but it really is important. If you don't want to read it, nobody else will. Or you will become either the wrench or the belt. How many people here have seen Goodwill Hunting? Okay. For those of you who haven't, there's a part where he's talking about his abuse from childhood and both the Robin Williams character and the Matt Damon character abuse his children. They're looking at the abuse and they say, well, which one did you like? And they had a choice between being beaten with a wrench and being beaten with a belt. And Matt Damon says, well, I took the belt. And Robin Williams says, no, I took the wrench. So we don't want to be the wrenches and the belts. All right. But that's it. If you don't want to read it, don't write it. Next, don't confuse and don't abuse. Here's the deal in life. And I'm even struggling with it now. In your pocket, you had this thing that is probably more important than me or I will ever be. It's called your cell phone that has a connection to the web. So the minute I become boring to you, guess what's going to happen? Boom, boom, boom, boom. And you're on Facebook saying this guy's boring or you're on, I just heard about WhatsApp. But you'll be on WhatsApp messaging or that. This is our landscape. This is our life. And we're constantly battling it. So the minute you confuse a reader, and a reader, I don't mean a textual reader because I'm going to talk more about media in a bit. The minute you confuse somebody, you're gone. They're gone away. They're never coming back. So you have to be very clear to keep their attention. And the best way to lose their attention is to confuse them. Do not confuse. Do not abuse. Next. All right. So what does that mean? Use graphics, examples, analogies. Here's the biggest one. Here's the biggest confusion point I've seen in my years of doing this. It's using the same word to mean have different meanings. Going back to remember that's the thing of value. We had four different meanings for value. And like what is this guy talking about? So use the jack to jack up your car. You got me. Right? Use a noun to do verb or use the jack to elevate a car. If you found, and redone it, the thesaurus really is your friend. If you find you're using the same word twice with different meanings in the sentence, don't. Don't. But that's English 101. All right. Next. All right. So the other one is when you're writing, be very clear of what you want your reader to do when you end. All right. This is not a novel. There's no mystery here. You want to be very clear about the subsequent behavior. You're creating a contract with your reader. And when you create a contract with your reader, you're making a promise, and you need to fulfill that promise. So let them know. Like I promise I will teach you how to make chocolate chip cookies. I mean, the cookbook really is our friend. It's probably the best technical documentation around because they got to sell them, and they do tend to sell a lot of them. And cookbooks that don't work don't get sold. All right. So the big one is to be very clear about what you want your reader to do at your end and state it upfront. Next. All right. So the other one is I call the three rep, the three rep rule. If you're doing particularly complex documentation, one rep doesn't work. I used to do work with the Iowa school system. And they did a study because they're very concerned, you know, the Iowa basic skills test, remember that thing? The destruction of mankind. Anyway, what they would do is they found out that really, really bright kids needed a minimum of three repetitions to understand what was going on. And at the other end of the spectrum, it was 11. And they designed the curriculum and the classes accordingly. So given that anybody that's ever done any sales, you know, tell them what you're going to tell them, tell them and tell them what you told them. That really does hold true. It's really hard to pull off writing a piece of documentation that has meaning and value unless you do the three reps. And that can take many ways. I'm going to show you this, particularly around a methodology I use. Next. Okay, right to the outline always. How many people remember right to an outline here? Oh, good. Oh, all right. Okay, that's really, really good. This is an outline. And I'm going to discuss the rules of outlining very briefly for those that you need to refresh. It's an outline. Okay, hit it again, Jules. All right. Okay, outline. You might not see the outline, but you can see that the format, the rendering has changed, but the outline remains the same. The rendering has changed, but the outline remains the same. Next. Okay. Oh, hit it again. Now, there you go. And this is, you know, monitoring. But if you see, there's the outline. There's the outline for the US Constitution. Now, what's interesting to understand here is the basic rules of outlining. And most people don't. I've seen a lot of documentation where people will put network. Network, and then they'll do network connectivity and then network security underneath it, right? And you'll have some copy about network connectivity and some copy about network security and nothing under network. Okay, and that violates the two sentence rule, which I'm going to talk about in a minute. But you never have a descendant child with only one item, and there's a logical reason for that. And I'm going to go into it in a minute. The rules of outlining are you have a root and then you have subordinate children and then underneath each subordinate child, you have at least two entries. So if you have root one entry, that's illogical and illegal. You'd only have root. Now, I'm going to go over it again. Next. All right. So, again, I just discussed this. The two sentence rule, which is very important, was taught to me at McMillan, is that if you have a root portion in your outline, you have to have at least two sentences in that section. And if you don't have two sentences, you don't have an outline node. And that's standard publishing stuff. Now, you might say, and this is anecdotal, but I'll share it with you nonetheless, why do we care about this stuff? Well, in the old days, when books were published in paper, and even though we're seeing a resurgence of paper, it's nothing like what's happening on a line, is that the way you determined the price that you were going to sell the book at was based on what's called its paper count. Okay? If a book had 500 pages, that paper cost money plus the production costs. And what you do is an author, when you prepare a book, very few technical writers get asked to write books. You have to make a pitch. And the pitch is you provide an outline, the typical rule at that time, with an outline to the third level, and you provide a page count at every read note. So that would mean my book title, Chapter 1, Page Count 30, Chapter 2, Page Count 40, Chapter 3, Page Count 50. So what that means is at the end, you say, oh, this is a 360-page book. I can now charge $49.95 from it and make a profit. Okay, those days have gone away, and what we have now is we have unbridled publishing, which we're paying a price for, and I'm going to talk about that in a minute too. So remember the two-sentence rule. Every entry point has at least two sentences. Next. All right. So this, let's talk about the good dog, bad dog. On the left here, you can see we have a bad dog. There's a bad dog. You see him? You see it's a single entry point. Okay. That's a bad dog. So hit it. All right. You can see over here we have the good dog. And if you look closely, it says know about the file system of the web. The web should have meaning. What this is really about, what they're saying, is because there's only one node, it really is know about the file system of the web app, the web app is really a child of the entry point above it. Okay. And it takes a little finesse to get used to it, but the kicker is if you ever find you have what's called an orphan, you're creating a little havoc for yourself. How many people here are writing more than, like, say, 3,000 words, a documentation that runs more than 3,000 words? Okay. That's a lot. Generally, okay, 3,000 word people, this is going to hit you. This is going to hit you. 120 word people. New York Times Sunday magazine section is 10,000 words. A typical newspaper article is probably 150 words, just to give you some sense. But be aware. Next. Okay. Clarity equals illustration. Okay. You've got to be very careful of your production. Production manager, this is very important. Okay. Hit it. Okay. Illustrations describe the feeling and words describe the context. Okay. And I'm going to spend, I'm going to go into now illustration. What I found is that people for better or worse have stopped reading text. The way it works for me is that most people will focus, when they come to a piece of difficult documentation, they will look at the diagrams, listings, or code. Typically, developers hit code. I think I'm going to be presumptuous and say engineers probably hit the diagrams. System people hit the diagrams. And their eyes are going to hit those points and then they're going to start reading around it. They're going to start reading around it. So you need to be, now you don't need to do anything. I've become very clear about the immense power of pictures. The immense power of pictures. And getting a sense of what how to make pictures is a very useful and desirable skill. So let's talk about this. Hit it. Production manager. This is, lift the link, tastes great. Is this guy, how many agonies do we have out there? Is this guy an agony? Okay. How many ecstasies? How many people just don't care? Okay. I just go, okay. So we've got a lot of agonies, right? And we've got some ecstasies, right? A lot of agonies, some ecstasies. Okay. Hit it. Hit it. Boom. All right. Here's, now there's a couple points here. But do you see how powerful that caption now becomes? And I'll let you in a second. I was writing for a publication. And generally when I'm writing for a publication, I don't fight back anymore. I don't know. I used to fight back. Now I don't fight back. I don't know. I'll make a guitar. I don't care. Anyway, so the publication said, no, it's all right. We don't need captions. We'll just tell them in the copy to go look at the picture. Well, you're assuming they're reading the copy. And guess what? They're not. They're not. The captioning is a critical piece of technical documentation. People will look at the pictures, code listings, probably code and listings. And then they read the caption. That's been my observation. So let's do next. So let's do another experiment. Ready? Pay attention. Hit it. Hit it again. Quick. Hit it. Okay. What was that about? Okay. Once more. Hit it. Oh, no. Go up. Go up. Once more. Once more. Once more. Okay. Go hit it again. Hit it again. Come on. This is a good review. Don't worry. Hit it again. Hit it again. Okay. What's it about now? Yeah. That's big news. That's big news. Oh, missed that one. Imagine if somebody from the New York Times in 1860 came into our world today, they'd probably jump. You know, like, what do you mean you don't read? Okay. Next. I don't know how much a language. If you read Ken Burns, remember Ken Burns, the Civil War? It's very eloquent, beautiful language they used to talk. Just some soldier in the street. And they were just scribing the horror or the missing around them. And he's very inspiring ways. And now it's like, he's got shot. Anyway, okay. Hit it. Hit it. Hit it. Okay. Hit it again. What was that about? Okay. Next. Okay. This is what I've been talking about. So you want to take a picture of it. I send the slides in so they're immediately available. But that's the deal. Pictures, pictures, pictures. Pictures. Pictures. And more pictures. All right. Now, what I do, if I, again, Jules has been a production manager, so I have no idea what's coming. I have no idea what's happening. But here's the deal. What I do, my writing style, is that I do my outline first, and then I drop my captions, my code, and my listings. Or listings or code. And I drop anything that has to do that's not about text. Pictures, captions, code, and listings. Or lists, if I have listed things. Then I write around that. It's very rare for me to just write. I don't do it. But that's me. All right. Let's move. All right. So let's talk about creating context. And I've become, this is my, here's my commercial announcement. We're all friends now. Nobody's left immediately. I do publish a cartoon that's everywhere. It's called The World In Which We Live.com. Want to help a brother out? Go to the website, The World In Which We Live.com. Get on the list. Get a cartoon every week. Your life will be better. All right. But I've been spending a lot. We write a cartoon, and we are writing close to two or three cartoons a day now. We really have to begin to understand about visual communication. Particularly if you want to keep the customers and keep the readership. So what I've come to discover is that the wonderful thing about cartooning is that you have to give a lot of information and a lot of context very, very quickly. Because the don't confuse and don't abuse thing is really in line. If they don't get it, they move on. So let's take a look at this. And this is relevant to what we're talking about. Hit it. So let's, this is some guy. This is some guy. All right. Some guy. And what do we know about this guy? Well, let me hear. How many people think of these happy? Let's do, because remember feeling pictures transmit feelings. They don't transmit context. And this is about the detail of illustration. And I'm going to talk more about this. How many happy? Nobody? Okay. How many mads? Anybody think he's mad? Okay. Anybody think, anybody care? All right. Okay. I don't care. All right. We'll give you one more. How many think he's frustrated? Yeah. There's a frustration. And I will bet, here's, I have a suspicion in your brain of why you think he's frustrated. And I'm not going to tell you right away. I'm going to tell you in a minute. But let's throw some, now we have some feeling, frustration, but we have no context. Right? Right? We have no context. We don't know what he's frustrated about. No idea whatsoever. Hit it. Jules. Bomb. Okay. Am I resonating for you all? I don't want to make sure. Am I resonating? All right. Cool. I'm resonating. It's very important to me that I resonate. All right. Very important. All right. But now let's change the context. All right. Hit it. Go. Well, sort of. But is he really frustrated about ruling? Is frustration an appropriate emotion for that statement? Is it? Frustration? I mean, I will. I'm like, what do you mean? Frustration's not really good about that. All right. Hit it once more. Did you see what happened? Right? That one little graphic gesture changed everything. Is it resonating? All right? One little piece. So if you decide to drink the Kool-Aid, and you really do decide that text is gone, and now the only thing I really have are pictures, now every brush stroke means something. Because what will happen if you write something that's gobbledygook, nobody's going to read it, and you're not going to be sort of happy about it. And if you draw a picture that's sort of thrown together, you're going to lose audio chip. Miles, there's 34 more out there, right? No, don't worry. All right. I want to be clear. There's more copies for you. Just hang tight there. We'll get it at the end. All right. And you'll also get your bonus sign up fee. We're doing time shares, I think, in some place. Anyway, all right. So let's not. So I will rule the world. And also, okay, so now we caption it up a bit. We play with our captions, but that one little gesture changes everything. Okay, let's talk how we do it on time. So this is really, this is a poster child for really a nice piece of what I consider well-constructed documentation. Okay, we have the line, we don't have the outlining here, but you can see it's figured, it's figured and captured. And there's reinforcement. And the other thing is, let's go back to don't abuse, don't confuse. Here's what you don't ever want to do. People, you don't want the reader's mind to wander. Wandering goes into confusion. And so how many times have you read some? First of all, you've read something like, you know, the figure, the figure, figure G will show you about how to plug in the amplifier. Well, where's G? Right, so I'm a big proponent of putting direction. Figure G below, right, and also numbering and lettering. Because if you don't, if you just do figure or you caption without direction, the mind is going to start wandering. And not thinking about your content, they're trying to navigate your document. And you don't want that. You want them to be very clear about what they're doing all the time. Don't confuse, don't abuse. All right, hit it again. What do we got coming up next? Right, so here's just some more. This is a bad dog on the left. You see, it's like, here's this thing. It's sort of there. Granted, I gave you an outline, but I have no lead in. And just about everything I'm writing these days has a leading graphic. Everything I'm writing has a leading graphic. But you can see here, the purpose is to see what knowledge we will do. You got me. You got me next. All right, this is a bad dog. Okay, so we're trying to do a little thing here. And we got the numbering going. So now we know there's sequence to the diagram. There's sequence. That's good. One, two, three, four. And I'm making Ajax call to a response to a server and Ajax call back. And I got some stuff going on. But it's still pretty loose. Okay, hit it. Boom. Okay, now we're better. Okay, now we're better. Okay, you got me. There's 10% more effort there. But the rewards have far outseed the expense. Far outseed the expense. All right, next. All right, this one here. And this is new. But you can see we can get the pictures to start telling a story. Then we get the pictures to start telling a story. So we know in the mind, if we look at the pictures, it says go to one. Okay. And go to two. And then go to three. And then you got your story. Granted, do you feel the absence of captioning there? I do. I feel that the caption is not there. The caption is like pulls it all together. And there's no caption. And now I'm lost. Now I'm lost. All right, next. All right, this is where I'm at these days. This is where I'm at these days. This is about the RIDAPI test suite. But you can see it's one, two, three, four, five, six. There's something in there. You got me. What is this about? I don't know. But I got that. It gives me some indication about the image. But now I have the text up front starting with figure 12 that shows me how to get through. All right. But again, the process is the same. Do the figure. Do the caption. Do the copy. Do the figure, caption, and copy. Okay. Let's go. All right, the pronouns. This is a big one. Good. Were you doing okay here? I mean, I'll sing if you need me to. All right. Sure. Whatever. Yeah. Sure. Okay. Here's what I was taught. And now I tend to come from outside. I was taught to do what's called value added captioning. You should never put in the caption what is stated in the copy. There's no value added to the caption. But if we go to the plan, this is hard. This is one of the hardest, again, which name, sir? Jason, it's a good question. One of the hardest jobs in journalism is writing headlines. Usually difficult. Most people can't do it. And it goes back to captioning, right? So here we're caught with an obstacle. I need to write a caption that has meaning to the bigger picture, has meaning to the illustration, but is not redundant to existing copy. And still keep the reader's attention. Am I answering your question? So we want to create enough thing that they'll get it if they go to the picture. Now the problem with this is it's a fairly detailed illustration. So you need, and it's sequential, and there's things going on in there. This is where video comes better. But videos create more crimes against humanity. And I'll go into that. All right. So we pick it up at the top. So if indeed you find yourself saying, okay, this is how to plug the amplifier in and you've got a caption saying, this is how to plug the amplifier in, it's going to confuse the reader out of a sense of what's going on. Why? Why are you doing this? And it's a hard scale. It really is. I'm not going to minimize it. Writing captions is hard. Good answer? Okay. These, man, the pronouns. Oh, the pronouns. Oh, we hate the pronouns. If I never saw this again in my life, I would be a happy man. That's how much I hate these. I hate these in my sleep. I hate them in my sleep. That's how much I hate these. If this, that, these, then enough. All right, let's go. Hit it. Okay. If this, that, these, then enough. Wow. Oh, I hate them. I hate them. Let's do a great job. All right. Okay, next. Okay. Watch. Okay. Hit it. All right. Trafalobors. Let me go back here so I'm not doing the evil thing of looking at the screen while I'm talking. I can have fewer experiences. It could be like UX. Okay. Trafalobors are the fundamental component of the Weebie Todd's framework. This screencast shows what they are about and how to use them. Okay. Right? That's really pretty cool. Okay. Hit it, Jules. Are we talking about they? Are we talking about the Trafalobors? Are you sure? Hey, hit it. Are we talking about the Weebie Todd's? Well, first of all, you have no idea what a Trafalobor is. And you have no idea what a Weebie Todd is. And like you're, so you're confused the minute you walk through the door. I've already used language that has no meaning to you at all. I didn't say, by the way, Trafalobor parentheses is that thing you read about. It gave you some way to hold on. Remember that first example, four different things of value? Right? You have no way to hold on. I'm losing you. I'm losing you. All right. Hit it. Good. One little word changes the whole show. One word changes the whole show. Is it clear? Yes. Trafalobor is a fundamental component of the Weebie Todd's framework. This screencast shows you what Trafalobors are about. Do not confuse. Do not abuse. Don't let the mind wander. Right? Your mind doesn't wander. Other than, this is the most boring thing I've read, your mind has not a lot of opportunity to wander. Okay. I'm doing a presentation in PowerPoint 2007, when? I'm sorry. I'm sorry. Okay. In slideshow mode, every time I move between slides, hitting the space bar, I get an awful click sound. Does anybody know how to make that go away? Hit it. Did PowerPoint go away? Hit it. The slideshow mode go away. Hit it. The click sound. Which one do you want to go away? All of them. Make it all go away. Make it all go away. This is a bad dream. Hit it, Jules. There you go. Just make that sound go away. So when I do use a pronoun or that and this, I try not to use pronouns. And it gets tricky, because now you've got redundancy issues and language issues, and you have to write interesting stuff. If you use too many nouns, repetitively, you hit that first thing and it gets a drag. It's a drag. You don't want to do that. But make that sound go away. Next. Oh, that was pretty good. You did that. That was really good. There you go. Tell them three times. Keep going. All right. Now, embrace revisions. I'll share another story. I've got this story about what it took commercial interruption time. We saw time, right? Hang tight. So I know I'm doing this presentation on Sunday and I'm really psyched. Actually, I'm psyched. I thought we'd have six people in my dog. The room is full. And stuff. And I'm getting ready Friday. I'm getting ready Friday. I'm saying, this is great. And they want the slides. And I'm revising it and revising it and revising it and revising it. And my hard drive crashes. Bang! Right out. Done. Fun. Never coming back. And I'm sitting here Friday night, trying to get nationality. I'm talking to a guy I'm writing an article for saying, I need you to verify this illustration. Make sure it's accurate. And the machine's not coming up. And my whole life is coming in format. So go home. I have to format the drive. Do I have time box or time share or any of that? Of course not. No. Luckily, the important stuff's backed up. So you need everybody after you're through here. If you want to, you take my wife's e-mail address and say, thank you for providing your laptop so I could show you this. Because my machine is completely fried. But that's not my tale of woe. This is my tale of woe. So I'm writing for developer.com. And I'm full of myself missing doing this. And I'm writing this thing about down and dirty async and await and C sharp. This really complex technical topic. And I'm doing all the right stuff. And I'm putting in the code examples. And I got the sample code. And everything's working. And I'm like a superstar. And I'm getting to look at it. I got five. And I got 101 likes. Which in tech journal land is like, I'm now God. Right? I'm now God. And 37 shares. And my tweets, probably 2 billion tweets. Meanwhile, you know, I'm still not seeing any increase in income. But I'm getting a lot of attention. But that's the increase in income. Right? So this is really, really cool. And then hit it. The code example was wrong. Luckily, it didn't affect my like count at all. And luckily, I didn't, nobody tweeted out about it. So here's the thing that I'm sharing with you. Even with the best of intentions, the best information you have at the time, the best expertise, your writing will become obsolete. If not an error. Alright? And that's okay. Now, I don't think it's okay is a very broad statement. I mean, we don't want to produce information that's just blatantly wrong. We don't want to do that. But given the speed of change and the speed of having to acclimate this and get this one to our head. Anybody, any Ray Kurzweil fans here? Ray Kurzweil, the Singularity? Ray Kurzweil, yeah. Let me give you, this is important. This is really important. Particularly those of us in the tech game. Ray Kurzweil, the guy at Kurzweil, talking machines. He's now the head big guy at school. Okay. He wrote the Singularity is near. He has said he has eliminated type 2 diabetes from his body by doing vitamin regimens. He's a really, really, really bright guy. And he made a list of predictions over the next 50 years. And you'd be surprised. He's like 90% right on. And one of the things he's saying is that what's going to happen is that the amount of information that needs to be absorbed for a human being to participate in the modern world will exceed the physical capacity of that human being to acquire the knowledge. In other words, we won't be able to get it fast enough. It just won't happen. So his projection is first thing in an education, you will no longer go study French. You will eat it. So you will have a little pill that you will eat that will alter your brain chemistry to allow you to speak and understand French. Okay. That's what Kurzweil is saying. The other thing Kurzweil is saying, which is pretty dramatic, and we're seeing it. Okay. We're seeing it. He's saying that the human body has outdated its usefulness. It is no longer adequate to absorb the information we need. So we're going to have to start becoming partially robotic. We're literally going to have to put, make part of ourselves digital mechanic, digital mechanical, digital biological. We're going to become partially robots. So what it means is that information streams, such as going back to French, you'll just change your biology to understand things, or you will have enhanced sensors. Now, there's a good argument to be made. Well, what's so different? I mean, before eyeglasses came along, a lot of people couldn't participate in the information structures. Now, we all wear eyeglasses or contact. Most people do. When you couldn't hear Beethoven, right, you had to listen. You didn't know what's on that. We have hearing aids. So that trend has always been there. It's not like dramatically new. But what we're going to start seeing now, and this is my prediction. My prediction is that within five years, how many people are actually talking to their cars every day now is a way of life? Okay. As the fleet starts to exhaust itself, fleets exhaust themselves. People will start talking to their cars more. That's just the way it is. You don't see driverless cars. The other thing that's going to happen is cell phone, and Kurtzweil talks about miniaturization a lot. He's saying that the world is miniaturized, and they have nano-machines now. What's going to happen is your cell phone is going to go away, and you're either going to have a cochle implant or a cochle optical implant. And we saw that they attempted it with Google Glass, right? And you might call it fantastic. It's not going to happen. What's that have to do with technical documentation? Well, it does because as information creators, we're going to have to consider what that next generation of information acquisition looks like, particularly at the educational level. So that's going to be revision. Revision, cycle, getting and understanding that next platform. If people said that hypertext would, think what hypertext just did. I mean, no more, you know, you don't put your footnotes in that whole idea. Anyway, that's embrace revision. Next. All right. And I just pretty much talked about. All right, next. Okay. This is a big one that I need. This is my little diet tribe. How many people you're writing? How many people have work in companies that have at least 500 employees? All right. How many people work in companies with 500 employees? I'm going to assume that 60%, with 60% engineering staff, how many people work with 500 employees? 60% engineering staff. 50% engineering staff. All right. Okay. Maybe the room is larger when we see a shift in the numbers. All right. How many people you have a tech editor on staff? Good. You're my heroes. You get a parade. You get a parade. A tech and not a tech writer. A tech editor. All right. Here's the deal. When I did my first book, I wrote my first book in 1996. All right. I wrote my first book. The technology is completely obsolete and worthless. But it did give me a royalty check to buy a car. Okay. The royalty check that came in last month, I got a candy bar. But that's all right. But on that book, okay, I had an acquisition editor, a copy editor, a technical editor, a graphical editor, and me, five people dedicated to one book. All right. Now, my last article, which got popped, which is we've had a minor inaccuracy, had pretty much me and some of the copy formatting. And we said, we have technical editing. We're going to crowdsource technical editing. And one of the things we haven't been very good about in crowdsourcing technical editing is what's called what I call generous accommodation. It's really, Bob, you don't. I was lucky on that one. I was really lucky. I've seen stuff that's pretty vicious. So, one of the things that people start thinking about is, okay, the first thing we need to do, the knee jerk reactions will hire a tech editor, a tech writer, anything better. No. Technical documentation needs to be as close to the subject matter expert as possible. But if you hire a tech editor, which is pretty, that will give you or your company the ability to produce quality documentation throughout the culture. Good tech editors are good teachers. They have a mastery of written and visual language. They see the bigger picture about how this all fits into a taxonomy. We can talk about taxonomies if you want me to. Yes? Sure. First of all, a technical editor never creates content. Ever. They don't. They're giving content. It's like any editor. At the editor level, they're giving content and they go through and they say, okay, this doesn't make sense. This doesn't make sense. This is not needed. Cut. You're gone. We don't need this chapter. They do the fundamental mechanics of documentation management. That's one thing. The other thing they'll do is like you'll see, like the pronouns are bad. That's copy editing slash technical editing. The last thing they do, when I started as a technical editor, was you go in and you actually do everything. So if you do everything that's supposed to be done, then make sure it's accurate. So I would get a, let's say I was doing a programming book. I'd get a programming sample. I'd go through it, set up, do the programming, make sure it worked. And everything that was talked about was indeed accurate. And that's what technical editors do. What? The what? The technical editor is contaminated. It's like anybody else. You don't, any content creator is by nature. You don't see it. All right. You just, you don't, you don't see it. That's why one of the fundamental rules of public publishing is you always have a second set of eyeballs. No matter what. I've done it. The, the. Ever see two these? The, the, or people even this? You just don't see it. You're contaminated. You know, the fantastic story is, is that the, when, when, when Columbus showed up on the shores of Santa, of Santa Domingo, the, the Indians couldn't see him. They didn't have a concept for Columbus. They just didn't see it. And the diamond talks about this. And, but what they did see is that the alteration in the way, they saw the alteration in the waves. They couldn't see the boats on the horizon. They couldn't, there's no boat under, they couldn't do it, but they saw that the waves were coming in differently. They were contaminated. They were a part of their culture. So the writer is completely contaminated, even at the copy level, at the technology level. That will, how many people have you, have you ever, I do this all the time. You write something or you make a picture and you bring it to a subject matter expert, or somebody else in the known says, do you understand what I'm talking about? And if they say no, that's all right. That's good. That's good. Now the trick is, and most people can't do this, is where am I confusing you? That's the next step is me saying, how is my writing confusing you? Tell me, where, what aren't you getting? Most people can't do that. Most, I wish they did, but most, did I answer your question? Okay. And I'm happy to go into it afterwards. Do you have a follow up? Looks like you do. Okay. I'm happy to do it. All right. So, we can do a t-shirt. Bring back, bring back the template. Okay. Next. Okay. Video. Let's talk about video. All right. Michael Jackson, right? I'm Michael Jackson now. Okay. I'll do my moonwalk. All right. Video. Video. We're all, video to the masses, YouTube. See, YouTube, everybody's a star, right? All right. So what's the deal? Here's the deal. When I was back in college, the film students, when I was back in the seminary, when I was back in college, the film students, it was, you had to be, being a film student in 1973 was, you had to be prepared to pay a price. And that price was $1,000 a minute. That's what it cost to make film in 1973. $1,000 a minute. So as a result, everybody really thought about their minutes. Everybody really thought about their page count when that page count cost money, right? Everybody thought about their photo resolution, but they still do, even in the digital age, like digital compression, when bandwidth costs money. People thought about it, and there was a wonderful efficiency and a wonderful creativity that was created around that, lack of resource, right? It's a really one, when you run out of stuff, you become very creative very quickly. But we've lost that, okay? The other thing is, is that I was taught, is that, and this is, one word is one second. So if you plan to do a 60-second commercial, and this is interesting, if you want to go back, look at commercial word counts. Really, these guys got it down, because they understand the scope of human attention. But if you're going to do a video, and you have 5,000 words in your copy, well, that's 5,000 seconds, which is a 10-minute video, and a 10-minute technical video is a lot. You've got to be prepared to pay a big price. But it's free. It's free. So we don't care. We don't care. Because all I've got to do is get my screen capture stuff and talk. And I'll get a lot of attention and all that. And so we're losing this elegance of really good didactics, really good educational quality material. And we're losing it. And I'm going to demonstrate what I mean by that in a minute. And then we're also losing production quality. So if you want, the real, the big players, Microsoft gets it. You get tens of millions of dollars. You get it. But those numbers are shrinking. I mean, everybody is saying, if this doesn't produce me revenue immediately, I'm done. I'm done. All right, let's go next. This is my closing. I'm going to do, let's do Good Dog, Bad Dog. Let's hit Bad Dog a minute. I want you to look. We'll do this one again. But I'm going to show you Good Dog, Bad Dog. And then I'm going to ask Jules to go back up again. And we'll look at it again. I'm going to tell you when to cut because hit Good, hit Bad Dog. No, no, no. Hit, literally go back. Can you see the Bad Dog circle? Hit the Bad Dog. Yeah, yeah, watch. Just watch. Okay, stop. All right, let that go. Okay, now go back to the PowerPoint. I think you can tab through. Now just hit tab window. Yeah, up top. Here. This one's new. This one's a little bit beyond. This is technical. Thank you. You're doing a great job. Okay. Oh, you're going to listen to the video. Now watch Good Dog. Did you see that? Let's do it again. Because this is critical. This is critical information. Okay, now we're going to do Bad Dog. Okay. One, two, three, four, five, six, seven, eight, nine, 10, 11, 12. 12 seconds. Right? What value did you get from that 12 seconds? What? Cool. Right. Absolutely. There's absolutely no educational value. No reader value from that 12 seconds. Now, 12 seconds, right? That 12 seconds going back that's a fifth of a minute in old film school time is a $250 expense. Right? What did it get you? One, two. So the point is, I mean, let the music do the talking. Right? We've become indulgent. It's become an indulgence. And when you want to do video, and I'm a big video fan. I like video. I like video for teaching. I like video for documentation. But be very aware that you just, you can't, you know. Yeah. Excuse me while I take 12 minutes of your life or 12 seconds of your life and give you nothing in return. All right. Excuse me while I do that. All right. Beautiful. That's the first time I've done this in this presentation. I hope there's use. I mean, that one just hit me right out. Boom. Right? And that goes back into, if people want to take me outside and talk about video production, how to do didactics around it, how to do documentation around video production, and stuff like that. All right. Next. All right. So let's wrap it up. Okay. Several. Number one, if you don't want to read it, don't write it. Number two, don't confuse and don't abuse. Number three, before you start, be clear about what you want the reader to do when you end. It's a contract. You're making an agreement. Keep to your agreement. All right. And you can do that in your thing. At the end of this document, you will be able to do ABCDEFG. And if you're writing and you find out you're not keeping your agreement, nothing says you can't go up to the top and change the agreement. But make an agreement. Next. Okay. Right to an outline. Always. Next. Okay. Clarity equals illustrations and words. You can't have one without the other. Two. Go. Watch the pronouns. They suck. Right? Seven. Okay. Embrace revision. Okay. Any questions? We don't have a little time. Of course, that's it. That's the show. Go to the world at whichrelive.com. Sign up, get on the list. I'll send you a cartoon every Monday. It'll make your day brighter. Question. No. I didn't say that. No. Go ahead. What was your question? Yeah. The point is, if you have a root and you never have an orphan, let's scroll back up. Do you mind? I want to take a minute. If you've got to go, you've got to go. I get this. It's a good question. We'll go up. Up. Up. Up. Up. Up. Up. Up. Up. Up. Up. Up. Up. Up. Up. Up. Up. Up. Up. Up. Up. Up. Up. Up. Up. Up. Up. Up. Up. Up. Up. Up. Up. Up. Up. Up. Up. Up. Up. Up. Up. Up. agreed, but is it really a route to something else? And my position is know how to write web apps. And part of knowing how to write web apps is that you need to know about the file system, and you need to know about the relationship of web imp to the web app. So the remedy was really to push your route into a peer. Does that make sense? So when you find orphans, most likely either there's not a lot of logical choice here. Either you need a peer, you need to probably, you need a peer. So either I have to create a peer or I become the root. So here's my orphan, so I need to do one of those and create another peer for myself or I need to become a root. Any other questions? Hey, you've been a wonderful, oh wait, okay, oh sorry. I wanted to do, well it depends on your delivery mechanism. Okay, if you're doing, you're doing inline. Video is a strange beast, video is a strange beast because it can be innately textual. All right, if you have to put, the answer is I don't have a fast answer. Video production is much more elaborate, it has to be much more well thought out, understanding your didactic points and I can talk, that's another talk, I mean I've done video. You have to, the script, how many people, it's the script. It really is the full Hollywood version shooting script. You really need to understand what your language you're supporting, what your voiceover is supporting, what your visual is supporting and then overall what your approach, even the zoom in, zoom out, all that stuff really counts. That's why video is, bad video is cheap, good video is expensive. No, you should learn how to do good video. Well, we can talk about it offline if you want. It's not that, it's hard, but you've got to start someplace and what most people do is they do too much. So you want to start something like really simple, actually what's really, okay, no video should really be more than five minutes. If you want to stick around during the break, I'll talk about it. I don't want to hold people up to have places to go but I'm happy to do that. Any other questions? Sure, I'll give you my email address. It's become, it's been a step child, except the only place it's not a step child is when, let me do, okay, here's a fast one. Here's the, here is the continuum of software development. The continuum of software development is over here you got your porn, right? Put the picture up and get the money, right? And then over here you got your department of defense. We're going to blow stuff up and kill people, right? And medical, right? And so the interesting thing is over here there's not a lot of documentation requirements. You just don't, you can make a boatload of money and never document a thing, right? And nobody's going to die and if they do it's probably nothing you did directly unless you remember how Nelson Rockefeller went. Anyway, you know how Nelson Rockefeller went? He died in bed with his secretary. Anyway, anyway, but nobody's going to go, okay? And over here DOD, DOD medical, people really do die and you got, it's usually regulated and reproduction then you got the ISO, ISO 9000 levels, 1-2-3-4-5, all that stuff sits in here. Over here the documentation is really good. It is, it's dry as all hell but it's good because it's got to be. Most companies, all right, in my experience, okay, you got your porn, you got your DOD, I'm going to tell you what. Here's, we're going to have a real life example before you go. I'm going to start here and as your company comes on the spectrum, raise your hand, okay? Porn, DOD, okay? Did the exercise clear? So as I come into the spectrum of your company put your hand up, okay? Here we go. There you are, to be that way now. The good news is there's a lot of autogen documentation and then we've seen Swagger, Swagger Hub, Swagger IO. The Swagger people are doing an excellent job of creating autogen. You want to get it on the SME level, such matter expert is quickly possible. So you can use Swagger annotation, you can use Java doc, at the code level it's really good, at the systems level it's a little more difficult. So but still on your Jenkins plugins you can do stuff like that. But it's nobody, yeah, I mean I'll give you my email address, I'll come to your company and send them straight, you know, we'll talk about porn and DOD, it'll be a great time for all, you know? All right, any other questions? Okay, people that want to know about the particulars of video, we can meet up out front so we don't tire of the room. Thank you for coming, I know it's Sunday. Don't forget the world in which we live, I'll give it, I have three cartoons to give out if you want them, come see me, you know, thank you for coming. Thank you, we couldn't have done it without you. It's also very enjoyable. Thank you. There's a, I'd be sure to refer to the movie Finding Horace of Sean Connery, Finding Horace of Sean. And all he said, the way to be a writer should be right, just write whatever comes out. So what you want to do, my recommendation, don't get all caught up on a good writer, bad writer, is to write, put some structure on your writing and get other people to read it and get people to find it. And really you need to get that interaction going with a good writer. Unless you drive yourself up, you know, you'll get to be confident in the United States. And the other one is, yeah, my style, outline, image, code, listing, copy. And then you've got to, you've got to write. I mean, it's on the volume writer thing. And the other thing is, we don't have to write everything. So let me just get it out there. Just get it out there and get the people to review it. And if you turn it, if you go, if the vicious people show up, you know, you know, we'll have to take care of that somehow. Check one, two, check, that's better. Check one, two, check, one, two. Check, checking on countrymen, countrymen, countrymen. Hello, check, check, hello, hello. Check one, two, check, one, two, there we go, that's better. Hello, listening to me in a knock, as I sound. Checking the levels, that looks better. Level's better. Level's better. Level's better. Check one, two. Should be on now, yeah. Do you want to swap cards? G'day, my name is Brendan. And we're here for broken Linux performance tools. Previously at scale, I covered working Linux performance tools. And I also drew this diagram to show how, where the tools provide observability, which is quite popular. This is a complimentary talk. This is about broken Linux performance tools, particularly observability and benchmarking. My objectives are to bust assumptions about tools and metrics. The prior talk about working performance tools, it's the sort of talk that's fun to put together, as I get to talk about things that are exciting and things that work. But when you start to do a lot about performance analysis, you realize that's not what the landscape looks like. Some of the tools are exciting and work well. Many of the tools don't work well. And understanding what's good and what's bad is important for developing performance expertise. And so that's what this talk will help you accomplish. I'll embrace the bad and talk about it. And we can see the sort of solutions that help us navigate through the minefield that is performance. So you'll learn how to verify and find missing metrics and to avoid the common mistakes of benchmarking. I'm going to discuss current software. And as I talk about various bugs, you might think, well, I can fix that. Yes, please do. So even a couple of years I give this talk, it might be a shorter talk. And maybe we'll get it down to a 30 minute talk. That'd be great. So I work at Netflix. And we've just launched World Wide to have this great map of where Netflix is available. And that's right, it's almost everywhere. We're doing great. And we have Linux on the cloud, on AWS. Tens of thousands of instances. And we have FreeBSD running our CDN. It's also an awesome place to work. And you might have seen the earlier talk from sysadmin to SRE, because we're hiring SREs. And there's many of us here today. You can talk to about that, please do. The first section I'd like to talk about is observability. And I'll start with load averages. So something straightforward that we all should be familiar with. Load averages still get used at Netflix. It's one of the signals we use for order scaling groups. So whether to scale up a cluster based on load. The problem with load averages, well there's two problems of load averages. One of them is the word load, and the other is the word average. Load on many other operating systems means CPU demand. But on Linux, load means CPU plus uninterruptible disk IO, which can be a bit confusing. So you have that NFS server with a load average of a thousand because you have all these threads doing uninterruptible IO. So it's a bit tricky to understand. The word average as well, that's actually an exponentially damped moving sum. When you see the three numbers, one five and 15 minute load averages, it's not really the average for one minute. And to explain that a bit better, I took a workload, which was a single hot thread, and I plotted the one five and 15 minute load averages over time. You can see the one minute load average eventually settles to 1.0. That's what it should be. The load average is reflective of how many threads were in the runnable state. But that's the one minute load average at the one minute mark of having begun this load. The one minute load average is only 0.62. That's because it's an exponentially damped moving sum and it's reflective of prior history. So what does one five and 15 minute actually mean? Well, they have the constants used in the stamping equation. Or another way to put it, don't spend longer than one minute trying to understand this. It's just like a heartbeat signal to see if a server is healthy or a little bit busier. Yes. So he said that it was an infinite impulse response filter. Yes, that's right, he said thank you, it's an infinite feedback response filter, infinite impulse response filter. Right, and it's feeding back the original signal. That's why it's actually useful to plot things. Because when you plot things, you recognize patterns that aren't so obvious from numbers. And so quite often if I'm trying to understand a workload or a metric which is related to this talk, you just plot it and then based on the profile, like you've identified yourself, if I've seen that before and I can understand that now. There's a whole science in performance engineering about modeling workloads and coming up with equations for them. I often find if you just plot them, you'll recognize the signals before you get that far. Top, percent CPU, the next metric I'd like to talk about. Well, this seems fairly straightforward. If I run top on a Linux system and it's telling me Java is aiding 935% CPU in total, this is from one of our production instances, then that's fine, I know this consuming CPU, it's Java. Of course this is an instance with many CPUs and so I can get that high. So this shouldn't be misleading or broken. But it can be and I've heard of this being posed as an interview question. Short-lived processors can be missing from top. So if you've ever done like a software build and you run top, those all short GCC processors in Mac and Boinshow, they disappear, so you don't see them in tops output. So you can use things like A-top which does event-based statistics and perf to get around that. One thing I particularly don't like is that short-lived processors vanish off screen updates. And so you see something and you go, wow, that's the thing that's causing the problem and then the screen's gone. So I like to run PIDs that instead as it doesn't clear the screen but it keeps printing an updated interval and I can scroll back. But that's not so bad. Okay, so we're missing a percent CPU in those situations. Misinterpreting percent CPU. And I want to draw on percent CPU because it's one of the most common metrics we use to understand server behavior. So percent CPU can mean different things on different servers. So I had an example of 935 percent CPU. That's obviously not out of 100 so it's summing across multiple CPUs. So it can sum the CPUs, it can be a percentage of total CPU capacity or it can do the same things historically damped like using the exponentially damped function. It depends on the operating system and so check, look at the man page on Linux it is summing it up so that we can see the total consumed. Another problem, when I was creating these slides I was looking for screenshots to put in there and this is particularly interesting. The top of top has the system summary and I can see percent CPU broken down into user system nice idle and so on. And then we've got the per CPU percentages. What often happens is you add up the per CPU percentages and they don't quite add up to the summary line because you missed some short lived processes. Okay, I don't know that. In this case, the per process percentages added up to be more than the system summary. So we see more here than we are out there. That doesn't sound right. Hands up if you think the system wide summary is correct. Okay, so I've got like four people. Hands up if you think the per process summary is correct. Okay, so I've got like three people. Other people aren't sure. Let's see what's happening next. This sounds like a pretty good interview question. Actually, it's just trivia. It's not a horrible interview question. Go read the source code and tell me which one is right. I've got the quote there. A man with one watch knows the time with two he's never sure. Unless you're Mark Shuttleworth and you have that GPS clock in his house. So from cpu-load.txt in the Linux source code, it says proxstat, which is what's giving us that summary. Sometimes it cannot be trusted at all. And that's in the kernel source code. It's like, well, that's interesting. Good thing we don't use it for anything. Now actually, we use this for everything. All the performance monitoring products read proxstat and get the information from there. Fortunately, it's not that broken, but it can be sometimes a little bit misleading or just flat out wrong. It's both of those values can't be right. It does get worse. So that's looking at top, what top is giving us as per cent CPU and then looking at what the kernel is exposing as per cent CPU. But looking at per cent CPU is a metric itself. What is it anyway? A nice way to describe it is there is good per cent CPU and bad per cent CPU. So good per cent CPU is we're retiring instructions. We're making forward progress on our program unless there's spin lock instructions, in which case that's bad CPU because you're just wasting instructions. But generally if I'm retiring instructions I'm making forward progress. Bad per cent CPU is where the CPU is busy but the CPU cycles are not actually doing work. They are stalled, waiting on memory IO, waiting on resource IO or thermal throttles or some other events. We can understand the difference between good CPU and bad CPU by lots of different ways. So performance monitoring counters inside processor will tell us if you're stalled on cache misses or main memory IO or LLC or hybrid resources. But a much simpler metric similar to load averages in terms of simplicity is instructions per cycle. This is similar to a miles per gallon but for CPUs. So in this case it's more like gallons per mile. So it's, oh is that way around, miles per gallon. So a high IPC is where we, for each CPU cycle we are able to retire many instructions. And CPUs can do this because they execute instructions in parallel and out of order. They turn them into micro operations. A low IPC, to give you some numbers, a high IPC might be two, 2.0. A low IPC might be 0.5. And that is on average for every cycle we only make forward progress on half an instruction. So IPC it's like a miles per gallon metric. It helps us understand whether we've got the good per cent CPU or bad per cent CPU. And that's really important because if I'm trying to make deductions based on per cent CPU like we should buy faster CPUs because we know where CPUs bound, well it may make no difference because if your memory IO bound, faster CPUs just mean you store faster but you're not making forward progress faster. So per cent CPU alone is ambiguous. It's actually a problem, a widespread problem not just in Linux but in all operating system tops is they don't split cent CPU into retiring versus stored which is what they should do. Although it does get worse. To describe this with a story, because things get even stranger and like this surprised me, I was doing performance analysis of RxNeti versus Tomcat and one of the metrics I like to use to quantify performance is CPU cycles per operation to understand these frameworks. And I found that the CPU cycles per operation for Tomcat was fairly high as I ramped up the load when I began when the CPUs were only 10, 20 per cent busy. As I ramped up the per cent CPU to 80 per cent CPU, 90 per cent CPU just by throwing more load at it, the CPU cycles per operation went down. So the processor process all became more efficient at executing operations the more load you threw at it. Anyone tell me why? Like it commonly happens. What's that? Speed step, so speed step is one solution. What else is there? So like things get faster as you throw more load at it. Filling up threads, caches, so like CPU hardware caches. So the more often you're, the more load you're throwing at it, the more you're lighting up the caches, you may just cache better. And yes, JIT could be doing it. Yeah, you could actually, you're actually executing, you're fundamentally executing different code. And so at 20 per cent CPU utilization versus 80 per cent or much higher load, the instructions that get executed are different because JIT chose to do something different. And so with apples and oranges, you can't compare the two. See if you could clock faster into autobus. There's actually lots of reasons for this. And as I went through them, I checked the hardware caches and wasn't hardware caches. It was basically the same. It explained a tiny bit. And I could, I should, the number is it was 1.8 times more efficient at high load than at low load. So I was trying to explain a 1.8X. It wasn't hardware caches. It wasn't turbo boost. I did a flame graph just to see if the functions or the overall code changed from low to high utilization. That was the same. Yes, was it real or perceived? Well, that's a good question because I'm trusting various metrics. And so what do we even, how do we even know that I'm measuring per cent CPU correctly? So I did know the actual delivered throughput of the system, but when you're trying to understand in terms of CPU metrics like per cent CPU, I would use other ways to double check that that was accurate, which is an important lesson of this talk. You have one more idea, yes? Right, like, yeah, as you drive load up like this schedule on the network and second or behave differently. And so it can just be different code and a flame graph will identify. Wow, people are really thinking about this. This is great. Branch prediction. Branch prediction in the pipeline because you're just loading it up can be different, which should change. You should ultimately change IPC because you're taking the wrong branch and you're having to, depending on how that's calculated. So it's another possibility. So what's the possibilities? I went through a heap of them and it wasn't it and I'll have to move on. It was the speed step, which was the first answer, so thanks. Speed step was really broke my mind in this case in that at low load at 20 per cent CPU that the kernel speed step is driven by the kernel. It decided to run the CPUs at 1600 megahertz. And when I ramped up to high load, the kernel said, I should just run the CPUs faster. The CPUs are now going at 3000 megahertz and I'm trying to compare cycles per instruction based on my per cent CPU utilization and I'm comparing apples and oranges. It's not the same thing. This is a hardware box I set up and I've forgotten to set the Linux governor to performance, which fixes you at the 3000 or whatever. So what does this mean? It means if I tell you per cent CPU as a lot of people have helped out, it's per cent CPU itself can be ambiguous because you need to know a lot of things. What is the IP instructions per cycle? What's the actual clock rate you were doing? Because it could be speed step or turbo boost doing it differently in order to be able to comprehend that. How do I figure this out? It was a process of elimination. And so IPC, IPC should have covered many of the bases. It should have covered cache misses, branch prediction, if it's measured right. And IPC was fairly static across the range. And so that got me thinking about the actual cycles executed, so I started using those counters and could pretty quickly figure it out that we were completely running at different clocks. Also, you CP flamethrower to see that the code was the same and we weren't doing polling versus interrupts and things like that. So per cent CPU itself can be a complicated metric and it can still be complicated because it doesn't account for whether cycles are stored or retiring when you get down to the micro operation level. And so at the micro operation level, we have lots of different functional units in a processor and they can execute things in parallel. And it's just a simplification to say the CPU was on or off from one moment to the next when internally you have some functional units that run and some are off and it may have more internal headroom for more processing. So it actually turns out to be really complicated metric. Another metric we like to use in Linux that turns out to be fairly complicated is IO weight. And IO weight suggests the system is disk bound but it is often misleading. So the problem is if I have higher IO weight, let's say I'm comparing an option. So I'm trying new configuration for my database and IO weight goes up. I may say that's bad because I've got more blocking and I shouldn't do that configuration change. If I have lower IO weight, that might actually be bad too. Since IO weight, and to really try and understand I've drawn a VIN diagram, IO weight is an idle state. So if I am idle but I am pending disk IO, then we call it IO weight. But I could have pending disk IO but it's just covered up by CPU. So if you have an IO weight problem on your system, just run steady at home and like burn up all those CPUs and you've solved the IO weight problem because you've overlapped it with CPU cycles. And so that's what actually makes interpreting this complicated. But so long as we do understand it at first, understand it eventually. Pre-memory has been another confusing metric on Linux and there's even a website, linux8myram.com, which has a picture of a penguin with a DRAM out of its mouth. It's a lot of these things that counterintuitive and that's what makes things hard is we need to learn them. When the free column goes down into like VM set or you're using a monitoring tool, it may not be bad. It depends how it's calculated. And so most operating systems use the principle if there is free memory available, use it for something useful. And so it's being used for the page cache or file system cache. And that's why we use free minus M and it gives us a clue there. So initially I might have thought I had 2.6 gigabytes free. I've actually got 3.3 if I include the file system cache. Okay, so that sounds good. Now you run ZFS. ZFS hasn't hooked into this stuff yet. Free should be updated to handle ZFS. So now we get back to the problem where your free memory goes down to basically zero on a system with ZFS. And the answer is oh, don't worry about that. It's part of the ZFS arc. You need to use arc stat to figure that one out. So again, free memory can actually be complicated. You need to figure out how it's calculated. Yeah, it shows up as used instead of cache for ZFS. And in some ways it's, you could call it a soft, you could call it a software bug in that free is trying to give us the notion of what's in the file system cache and it hasn't been updated to handle ZFS. So again, it's important to understand the source of the metrics and how they're calculated. Vium stat, classic Unix tall. Also difficult to comprehend on Linux. The first line has some summary since boot values but not all of them. In fact, I can't even remember which is which. I generally have to run a test workload to figure it out. Other Unixes, the first line is the summary since boot. So there is a difference on Linux. It's also used as a system-wide summary, which is pretty good except we're missing networking. But that's okay. We can run net stats, net stat minus S, and then drown underneath all of the metrics it gives us. It's been getting better. So Linux has been adding more and more metrics like sin retransmits, which I really like. But there is a lot of metrics to wade through. I think it can be over 200. One of the problems with this is it still doesn't include everything. That's why I don't have really decent metrics to know, say, TCP queue utilization, how much is queued from one moment to the next. And there's other stuff that I want that's actually not part of net stat minus S, which you may assume is there when you see so many statistics. There are more minus things like typos and inconsistencies, and there's often no documentation outside of the kernel source code. So it does require some expertise to comprehend. These metrics themselves, well, these metrics are worse because all these metrics are misleading. And you'll understand this if you've worked in the storage industry. This percent utilization or percent busy? Well, the problem is we often have logical devices that may be backed by multiple disks. And so a percent busy calculation just tells us that something was active during that window of time. We don't know if half of the spindles were busy or flash drives or all of them. You don't really know the headroom, which is kind of the whole point of measuring percent busy or percent utilization. This guy-ups can be misleading. If we're dwelling on that, we're using, say, IS-STAM, we're looking at it. Is high or low bad? It really depends on what you're trying to do. And disk latency. And so as people get better at understanding disk performance, they will start using benchmarking tools, looking at latency histograms of disks. But even then, if not, it may not be what you think it is. There's been decades of work to make disk IO asynchronous to the application. And so let's have writeback buffers. And let's have read caches. And also to different levels throughout the stack. And so in the real world, there's so much engineering work has gone into avoiding your application from ever going near the disks. And so dwelling too much on disk latency, you have to ask why is it mattered that much? File system latency matters more because you're going to talk to the file system. And so I find it much more useful to measure it there. File system, much more useful. That's what your application talks to. It has a big file system cache, but we don't have any file system cache hit or miss metrics in Linux. So this is just completely missing. We do need a hit-miss ratio. It's something I did hack up with Ftrace and another port's been hacked up with eBPF so that at least we can get the ratio, the hit-miss ratio out of the file system cache. So so far I've covered some metrics, many metrics that are misleading, some that are wrong and some that are missing. And so just to pause and say what you can do about this, you need to verify and understand what the metrics are if they're important. Realistically, you don't have time to verify and understand the 50 or 60 metrics your monitoring tool may be giving you. And that's okay so long as you're aware of which metrics have I verified and which haven't I. And so when you're using them to solve issues, you know, these three are known to be good because I spent half a day reading the Linux kernel source and testing them and they work. All the other metrics, I don't know that they're good but I can use them as clues. And so that's a mindset that's really helpful knowing what is proven to have worked and knowing what you've yet to have the time to do to prove and verify. And so you can cross check with other tools. Dynamic tracing is great for that. Test with known workloads. So you might have a benchmark tools can actually be useful for testing observability metrics. So I know this is my supposed through point what does the observability tool tell me? Reading the source obviously and using known to be good metrics. To find a missing metric that's even harder. And so methodologies like the use method pose questions for metrics to answer. That's a great way to discover that you're missing metrics. Also, if you draw a functional diagram of the environment or Linux or the application to try and figure out which parts you don't have observability for. Sometimes it gets a bit depressing and you think can't we just burn it all down and start from scratch? And there's just been so much work in systems, metrics, and understanding and documenting that it's hard to do. Just as a wild speculation, there is one environment where they already have burnt all the metrics down and they kind of do need to invent them from scratch. And that's some of the Unicernals I've looked at. And so you can't log in and run VNSTAT and ISDAT and PS on top and get confused by all these metrics we've had for decades. And some of the engineers are already thinking about the problem of if we have to reinvent it all what should we do? So I'm not saying that will be the solution, but it's a surprising opportunity that maybe someone will figure out what metrics we should have given a clean slate. Profiles. Linux Perth is a great profiler. And this can be used to do things like is another observability tool, but I'm trying to understand how the CPUs are being used by code parts. It has this nice hierarchy view, tree view. Unfortunately, it's very verbose, so 13,000 lines. And that's kind of a problem with today's profile is that the output can be onerous to read through. That's the full output in one image. And this is why I came up with Flamegrass, which uses a hierarchical visualization called an icicle plot that is upside down. And it shows the same data we saw in that the impossible slide of text, and I can much more easily navigate it. Great, sounds like profilers are a solved problem. We can just do Flamegrass and everything works. Unfortunately, there are lots of issues of profile. This, you had a question? If you just Google Flamegrass or DuckDuckGo Flamegrass, you'll find all the steps are online. Visibility, so when we tried to do this, we found you can use Java Profilers, which Netflix runs a lot of Java, and they typically will show you the Java code, which is of colored green, but they can't see outside of the JVM, so I can't see kernel libraries. I have some visibility of GC. And there are various other problems with those profiles as well. Inaccurate or incomplete profiles. System profiles, like what Linux has with Perth events, that's great because I can see kernel activity, but I'm missing stacks and symbols, so I can't see the Java methods. And this is one of the examples, which I wanted to include in a broken performance tool talk, where that's the real world. You often find the Profilers don't work and it requires some engineering time to fix this. We did this work at Netflix last year, and now JDK8 Update 60 has the preserved frame pointer, so we can do proper system profiling. But don't assume profilers will work out of the box, it may be a few weeks of work to get them to see your code. The particular problem here was compiler optimizations, and it's how the kernel likes to walk stack traces, or profilers like to walk stack traces, using the frame pointer register, and compilers have reused that as a general purpose register for years. So that's why GCC has minus F, no omit frame pointer, which you should always use because it helps debuggers and profilers, and how Netflix will help to get preserved frame pointer put into Java for the same purpose. Missing symbols, just to mention it, if you're getting into profilers, you'll often find you can profile stack traces, but you don't have symbols, you just have these hexadecimal addresses. It's just another problem to work with. Linux does have some solutions, so Persevence will look for these supplemental symbol files, and there's ways to create them for Java and Node.js. If you try and do instruction profiling, you find that instruction profiling has actually kind of been broken for many years. So here I wrote an assembly program that just does knots in a loop, and then I profiled it, and I found that somehow the CPU was jumping from one knot to the next. That's the percentages of samples, and you could never see the instructions in between, and there's various reasons why instruction profiling has been broken for many years, scared out of order execution and sampling the resumption instruction. This is why more recently, Intel has come up with things like PEP support, which Linux supports, so that you can get precise event-based sampling. So if you're gonna go into profilers, some happy things of what you can do to address these problems, do get stack trace profiling working. So get stack traces to work in symbols to work. It's worth it because you can then do things like create fine graphs. Also for observability tools, it's important to understand overhead. So TCP dump seems to be the genesis for many tools, and people do GUIs on top of TCP dump. I've solved lots of these issues of TCP dump, it's great, but you do need to understand overhead. And I'm always wary of anything that does per event-based dumping. Try TCP dump on a 10 or 100 gigabit ethernet system, and you'll see dumping all of the packets, even just the headers, starts to incur high overhead. The screenshot I've got there was dropping packets, so it couldn't keep up. So there's various overhead costs of doing this. What I try and do instead is use dynamic tracing and go into the kernel and do frequency counts of TCP events. If you have no other tool and you use TCP dump and you pay a price, but you ultimately solve the problem, that's okay. The most important thing about this is to understand there is a price to pay. S-trace is a much bigger price to pay. And an example here, a worst-case example, DD of one byte from dev zero to dev null is running 400 times slower if I run that through S-trace. What's interesting here is I'm S-tracing the accept syscall, which DD is not doing at all. It's not even doing the accept syscall, but it's that much slower. And that's because of the way it's doing instrumentation. It has to instrument all the syscalls before it can do the filter. And also the way S-trace currently works with P-trace is like putting medium lights on your application. It will set a break point when you interest syscall and when you exit. And context switches for every system call. So we should use more modern traces instead like Perf events. And they're much better, but there are some hidden dangers of things like Perf events and system tap and S-trace. And just to come up with an example here, I've said, well Perf events, I can record the schedule a switch event. It's pretty useful. I still also use Colstax, and it wrote a 100 megabyte Perf.data file. So that's a lot of data. And this is only for one second. Imagine if I typed 100 seconds, like I'd have a 10 gigabyte Perf.data trace file. And my problem is I've traced something that's very frequent, so the schedule a switch event. When you use traces, and the same is true for all the traces, I'm quoting, say, EBPF and D-trace. You have to be aware of the overhead of what you're doing. And so there's various costs. So if I'm dumping every event that's high cost, like to a TCP dump file or Perf.data file, if I'm doing internal aggregations, it's much better. And then it's the frequencies of events. They need to develop some idea of how quick is the thing I'm trying to trace. Schedule events are really quick. So I'm always careful if I go near the scheduler, but there are lower frequency events. Like this guy here is generally lower frequency, process creation and destruction. Of course, some tools make it pretty clear that they're gonna slow the application down, like Valgrind has in its documentation. I will make your application run 20 to 30 times slower. So at least in this case, it does warn the end user. Java profiles can be even worse, depending on how they work if you run these on Linux. Sometimes they have two modes, a sampling of stack traces, like at 100 hertz of tracing methods. And by now you should identify, tracing methods could be the really expensive one because of the frequency of events. I could do millions of methods a second. And they could slow the target. What I find weird is the documentation for these can describe method timing is highly accurate, even though your app runs a thousand times slower. It's like, how can it be highly accurate? If my app is running a thousand times slower, everything's going a thousand times slower, like race conditions could be different, like networking events are all different. So I did see a good talk about profilers are lying hobbits as they went into more detail. So for overhead, for profilers, you do need to understand how the profiler works. How is it instrumenting what it's telling us and what's the frequency of events? Generally, if I'm gonna do internal summaries and it's less than 10,000 events a second, it should be negligible. If it's more than 100,000 events per second, you start to be able to measure it. Monitoring, monitoring, I can mention quickly because a lot of the risks involved for Linux monitoring are the same that we saw for tools and metrics. So the tools can be quite misleading on the metrics they're telling you. They can be missing metrics, like a false system cache hit miss ratio and some of the network statistics. So the problem that I see many monitoring tools do is they assume the system metrics are perfect. And then the problem is let's just plot them, let's just grasp them. And of course, whenever I talk to vendors, that's not the problem. The problem is the metrics are broken and they're misleading and we're missing all these metrics. I don't need someone to grasp this stuff. I need someone to fix this stuff. So that's kind of the problem we have. Another issue with monitoring is if it's built on some idea of event tracing where we'll trace every event and then post-process and that can incur a massive amount of overhead. And of course, doing this on a cloud wide can make things all the more worse. So if your monitoring product is supposed to work on the Netflix Linux cloud of tens of thousands of instances then how much data do you need to push around to the monitoring servers to work? Statistics themselves can be quite misleading and averages can be misleading because they hide latency outliers, permanent averages can hide multi-second issues. So I had a customer with an issue in a prior job where their CPUs, I knew they had a CPU issue but they were convinced they didn't because their monitoring system said they peaked at 80% CPU. 80% was the one minute average. And when we looked at the per second average there was flatlining and then there'd be more idle and flatlining. So always understanding how the statistics are calculated on metrics. Percentages can be misleading as well. So if I hit my 99.9th latency, intuitively that sounds pretty rare but if I'm processing millions of events a second like we are at Netflix, that happens all the time and so we need to look more closely at those. What we like to do is to look at the distribution and so there's lots of different ways to do that. This example I've plotted the latency of disk IO, latency on the x-axis, frequency of disk IO events on the y-axis and it's multimodal. So you have a low latency distribution of like half a millisecond then a higher mode at 1.2 milliseconds. The average is in between them which is just an interesting real world example. It's when the average is supposed to be an index of central tendency. In this case it's not. It bifurcates the modes so that is quite misleading. If you're looking at averages then what you wanna do is you also wanna look at the distribution so that you understand if you have a multimodal distribution or a long tail and the average is not what you think. Speaking of misleading things, visualizations can be misleading as well. And if you've seen my talks before you know I'm not a big fan of traffic lights, people putting red and green colors in things. And so on Linux I've got these stats and H-top and I like those tools. The tools are innovative in all sorts of different ways. Personally not a big fan of the colors. If you like to use the colors and you find them useful that's good. But for me when I see things like my system time is 80% that's green, that's good. And 20% user that's red, that's bad. For me I'll color them around the other way. 80% system time? It kind of sounds like the Linux kernel is doing numerous rebalancing which we've seen. And like just burning CPU in the kernel. And here H-top has got its own color highlighting from the same workload which can be more misleading than useful. So I find traffic lights are good for objective metrics like actual failures, they can be misleading for subjective metrics where it can be latency or IOPS and if it's not clear, based on your business requirements, what is good or bad. Not a big fan of tech omiters either and there's at least one Linux monitoring product that likes to use them. Especially with arbitrary color highlighting or pie charts for real time metrics. So what you can do for that section, for monitoring it's the same as the observability tools. You need to be able to verify metrics and then understand overhead. So statistics ask how is this calculated? Is it an average over an interval? What's the calculation? And look at the full distribution. Just looking at it you might see that you've got multiple modes or the shape of it. And for visualizations you do want to use histograms and heat maps and flame graphs that they add value. The last section I've got is benchmarking. And benchmarking is extremely error prone to the point where I would say that almost 100% of benchmarks are wrong. And there's a nice quote I've taken from a paper called a nine year study of file system and storage benchmarking. That sounds like hell, who'd do that for nine years? What I've said most popular benchmarks are flawed. You can see how restrained they are when they wrote that. And also, and this is what catches people, all alternates can be flawed as well. And so if people tell me, Brendan I need to benchmark this. I'm gonna use Bonnie plus plus, whatever it is. And I say, no, no, don't use that. It's misleading, there's bugs, there's overhead. You'll shoot yourself in the foot. And they say, okay, what's the alternate? What should I use? And sometimes it's like, there's nothing. There's absolutely no benchmark that you will not hurt yourself with for this area. You're going to have to create yourself. And some people really struggle to get the head around this. It's like, this is really popular. There's like 20 benchmark alternates on the internet. Can't one of them be true? It's like, no, actually none of them can be true. They can all be wrong. Like you need to understand this. Some common mistakes. Testing the wrong target, very common. So you download some benchmarks, seems like a good idea, but you're testing something that doesn't matter for your actual workload. And so that's where I think I'm evaluating a storage product and I think I'm testing the disks but I'm hitting out of the file system cache. And so I was like, well, my disks are doing like 10 gigabytes a second for one spindle. I was like, no, that's your file system cache. You're testing the wrong target. Choosing the wrong target. So at that point you say, okay, I will do direct I own or disable file system cache. I will test the disks. Why? This is not actually real world, as I said earlier, because in the real world, you will have the file system cache. They may as well test through the file system cache. Who knows what you're testing. Invalid results, sometimes benchmarks do have bugs, just like observability tools and metrics. Ignoring errors. A great one I see is, and I've had this myself where I get a great result out of a web server. I think, wow, look at this. I did a million events per second. I should probably, it's like, wait a minute. These are HTTP 500s and they're all broken. Like 100% of them are broken. And you discover that the error path is faster than the success path. So if you're wicked, you'd use that in your benchmarking results. Look how I broke a million events a second. Like yes, and they all fail. It's because in the function, you do the error checking at the top and then like return error, and then you do the actual work. So it's no surprise the error path can actually be quicker. Ignoring variants of perturbations is not a common error. So the real world workload is not steady or consistent which doesn't matter. And so we call it like sunny day performance testing versus rainy day performance testing. With sunny day testing, a maximum throughput when it's a perfect workload. But in the real world, it's more like a rainy day where there's variations, perturbations and you just don't have any insight into that because you've never tested it. And of course, misleading results. If you're not paying too much attention, it's easy to benchmark A, actually measure something else and conclude you've measured a third thing. Just to go through the types of benchmarks quickly, some micro benchmarks, these are the ones that test a specific function in isolation. And so file system maximum cache read operations for seconds or network maximum throughput. These are pretty useful. They're useful because they're easy to debug. And so if you're trying to understand why micro benchmarks are regressed, there's not that much on the analysis table to get to the bottom of. There's lots of bad ones, like get paid in a tight loop of the speed of dev zero. And that's the problem is it's easy to also test things that aren't very relevant. So you need to, or you're missing workloads that are relevant. So micro benchmarks, they're a useful tool. You need to match it to the real world intended workload. One way people do this is let's do macro benchmarks. That's where I will do my full client simulation of logging in and doing stuff and logging out. Common problems is misplaced trust where you believe your macro benchmark is must be realistic, but it's not. Just like everything else, it can have lots of problems you need to debug. And there's a complex to debug. Now that you have everything on the operating table, you have to figure it all out. And so it's like, oh, so now you found a regression with a macro benchmark. I would try to reproduce using micro benchmark just to make it quicker to analyze. Kitchen-synced benchmarks are also popular. Let's run everything and then come up with like a value, like it's average across them all. Lots of problems. The misses of more benchmarks is greater accuracy and more benchmarks is just more opportunities for errors. To mention a few in particular, Bonnie++ is a popular hard disk benchmark or the website says it does a simple test of hard drive and file system performance. I had to do an analysis of this once and I began with the first metric which was the per character sequential output. And I found that what it was actually testing was, it was doing one byte writes to libc, vibrate c. Libc would then buffer it. And by the time libc talked to the file system, it was doing four kilobyte writes. The file system and volume manager did its own buffering and placement on displacement and grouping. And by the time it talked to the disks, it was doing 128 kilobyte asynchronous writes. So I had a customer who was worried about the disresult, the per character sequential output, thinking it's a disk result and it's got so little to do with the actual disk. By the time you're talking to the disk, you're doing something totally different. And bizarrely found things like, oh, I can actually tune this, I can set buffer, I can change the buffering inside libc. Has anyone ever tuned to set buffer in libc? Wow, someone has, excellent. So look, there's a learning experience. And of course there could be other things like it have IR nice turned on. And so your benchmark is accidentally testing your Linux IR throttle. So really error-prone, you really have to get to the bottom of it. Bonnie++ did update their code. So this particular problem now does direct IR. So we'll actually do one byte disk IO. But just as an example of the sort of many issues you can run into. Another common one is Apache Bench. Single thread limited. So so many times I see a result where you're actually limited by Apache Bench being single thread. That's why people use WRK. Another problem is, there aren't some problems with A-B's code, but there is another problem. And it's not so much A-B's fault, but whether you use Keep Alive or not. And so I've had the situation where if I don't use Keep Alive, it becomes an unrealistic TCP stack benchmark. This is creating and destroying sessions for every request. And like your kernel system time goes through the roof. It's like, okay, well, did you mean to test the TCP IP stack? I think you're trying to test like a real world workload. And if you use Keep Alive, then it can go at light speed because it keeps alive everything. And so now you become an unrealistic server throughput test. So again, it's misleading to begin with because you think it's gonna give you a nice simple result but you really have to dig in and think about it to get value. And also what I'll mention is Unix Bench. Unix Bench still exists. It's the original Teach and Sync benchmark from 1984, published in Byte Magazine. Lots and lots of, you'll run lots and lots of micro benchmarks that makes up the Byte Index. Things like pipe throughput and pipe based context switching. Many problems, many, many problems. I've been meaning to write a blog series about all the Unix Bench problems in the hopes that I can stop people from using it. Unfortunately, I only got as far as a make file. So it tells you you run this shell script called run and then it automatically creates a make file for you. Okay, so I ran run and this is what it does. This is all hashed out. Oh, look. For Solaris 2, let's use these options. I'm running this on Linux. I'm running this on like Ubuntu and it's picking a Solaris 2 options. In the computer history museum last night at Gaming Night, there was a copy of Solaris 8 sitting there. This is Solaris 2 and that's actually what Unix Bench is using as part of its make file. So, but the problem is, it's like, okay, so it's picking the wrong thing. It should be picking the length thing, right? Let's uncomment this. If you start looking at it, you might think, oh, should it be minus 0, 2 or minus 0, 3 these days? Maybe I'll change that. No, don't change it. What are you doing? Like, if you change it, you're changing the benchmark result and like you're baking in your choices into the numbers. You forget you did that and you'll be telling your friends, look at my Unix Bench, they're faster than yours. That's like, it must be the server. It's like, no, if you went and hacked at it, right? Because you thought this is stupid. Like, why am I doing Solaris 2? I should go and uncomment this and go hack away. And so that's a really big problem. The Unix Bench documentation says, hey, the results depend on not only your hardware, but your operating system, libraries, and even compiler. So, and it says, if you want to publish any results, you should put your compiler versions with your results. So it's interesting that the documentation told you all along, this is going to include the compiler and how you're doing it. Except I have, it's rare that you see someone publish the compiler settings with the Unix Bench results and people compile it up on different systems. It's really problematic to try and compare those numbers. I should finish the series of Unix Bench one day and like, actually go through the micro benchmarks and my demo is fleeting, but I got a little bit of a depressed just at the make file. You're always innovating in useful at time, but it's time has passed. So what you can do about benchmarking, match the benchmark to your workload and do this methodology called active benchmarking, where you configure the benchmark to run in steady state, 24 by seven, and you do root cause analysis while the benchmark is running and you answer the question, why is the benchmark result X and not 10 times X? Because if you can answer that question, you can answer the limiter and then you discover the limiter is stupid because you configured it wrong or you're testing the wrong thing. And so you just go through the normal process of root cause analysis. This is great to learn performance actually because if you do root cause analysis on a production system, you don't really know what the limiter should be, at least with a known workload and known benchmark you have a starting point. In summary, observe everything. So how to get out of trouble, trust nothing. Even though everyone is using that metric, it doesn't mean it's right. Metrics often can be misleading. Metrics can often be missing as well. So you wanna pose questions first and then find the metrics to satisfy them. You might have to create new metrics. So that's why I like my functional diagrams. Profile everything. So like Java Mixed Mode, Flangrass, is that this will solve a lot of issues. It's really useful for benchmarking as well to show what code is actually executing and what's on the table. Visualize everything. So I wanna see histograms of latency. I wanna see heat maps of latency which is a histogram over time. So here's my bimodal disk IO and I can see there's a wide mode every five seconds or so. Great, that's where we wanna see more information. And finally benchmark, nothing. Just stop doing benchmarking and then the problem will be solved. Or if you must do benchmarking, please do active benchmarking as we analyze what's going on and get to the root cause. So I've got in my slide deck, I'll post it on slideshow. I've got links and resources, especially things that aren't broken. Like I said, this is a complimentary for some scale talks where I talked about performance tools and things that work. And so if you leave this a little bit depressed, there are suggestions of things to do throughout these slides and you can have to check out my earlier talks about things that work. And that's my talk, thank you very much. And we do have time for questions. I'll say before anyone runs away that there are some Netflix people here as well. If you'd like to talk to us about working in Netflix, it's an awesome place to work in 2016. So please hit up one of us. The questions, yes. What's my favorite tool for NFS benchmarking? I used to do a lot of NFS benchmarking. You know, at one point it's funny when I was doing it, I wrote a poll program that NFS benchmarked because I was so sick and tired of the various proper tools being broken. And it got to the point where they said, my poll program was the gold standard of benchmarking. It's like, wow, you have to be kidding me. But the point was it was a simple program it used to interface. I actually published it and put it on, you can still probably find it on blog.article.com. And the point was it was a very simple program. It used the, I just did syscalls directly. I should have written it in C. Like a nice simple C program that you can debug, you'll understand what it does because as soon as you start to use more complicated things, things get off the rails. If I had to do benchmarking now, file system benchmarking, so just treat NFS as a file system, I do like FIO, FIO is nice, it gives you some distribution information on the output. It does assume a type of distribution, but it's not too bad. So if you're gonna hit up a commonly used one these days, FIO is good, it's a minefield, lots of bad. Yes, right, benchmarking is a pretty depressing area. And the question was, wasn't there times when compilers detected that they may be eventually benchmarked and like changed the compiled output? And yes, there's stories like that. You have to debug everything and know what's going on. It's pretty crazy. You see a question, yes. So what was, so the question, what's the relationship between processor queue links and CPU usage? So processor queue links, so run queue links on Linux is a nice measurement of saturation. So how many threads have been queued up waiting for their turn? Percent CPU utilization is how busy a CPU was during a given interval. Now you mentally may say, okay, my model is I will go from zero to 100% utilization. And then if you keep throwing threads at the CPUs, I will then get queuing. And that sometimes happens, but sometimes it looks a bit different. So sometimes you'll get to say 50% utilization and then you start to see queuing. I think, what? How can I have queuing for CPUs when I have all this headroom? And so again, you debug it and you find out that you're running Erlang and it's P bound to the CPUs and they can't use the headroom because they can't un-P bind themselves. Or you find out it's because of statistics and the sample interval. And so you may be sampling it for maybe one second during which 500 milliseconds, the CPUs are flat out and you have lots of queuing and for the remainder of 500 milliseconds the CPUs are idle. And so you end up with percent CPU utilization at 50% but you have queuing. And it's like, how can this be? So always get to the bottom of things, really useful and you'll learn what's going on. Yes. Okay. Yeah, the question is, have I seen it where the Linux Perth events doesn't detect kernel symbols? No, but we are on Ubuntu and things work pretty well. I do have some, like my own compiled kernels where if I mess up the compile, I can break Perth and like the way it fetches symbols. It just sounds like if you, I don't think that's a Perth problem, I could be wrong, but I just get to the bottom. If Perth can't recognize kernel symbols. Yeah, if you debug in Perth and symbols, like run strace minus fc open on Perth and find out all the files it's opening and see if it's going to the wrong directories. You just have to go through some debugging. Yeah, use ftrace to debug Perth. I know it's Perth to debug ftrace. Sounds like it just needs to debug it. Find out why it's simple about working. Next question. So the first question is, how do I measure IPC and the Linux Perth command does that? So you did a PMU access, I've got a PC under my desk for that stuff. And if you run Perth stat commands, it will give you the IPC summary. If you run Perth stat minus a, sleep 10 or two to 10 seconds. If you go to my homepage, I've got a lot of the Perth examples. I've got a separate Perth events website and I've got all the Perth examples from measuring IPC and cache misses and all that stuff. That's a really cool command on Linux. So it's a great go-to tool for performance monitoring counters. Next question was, have you sysdig? Yes, I contributed some chisels to sysdig. I did the, I did a spectrograph for sub-second offset heat maps for sysdig. Sysdig's pretty cool. It's nice to see innovation in the tracing space. One thing I don't like about sysdig is that it is an event-based tracer. And so it passes all the events down to user level and then processes them. And I would rather see that summarization happen in kernel. However, they have done a pretty good job of making the overhead as low as possible. So it's not, like my first instinct is like, I can't see myself using sysdig, the overhead will be too high. And then I tested it and it's like, oh, it's actually done a pretty good job of lowering it. It's pretty interesting. They are doing innovative stuff in sysdig. I'd love to see sysdig pick up EBPS, the kernel engine, which will allow them to do some in kernel summaries. Was the question, do I see differences between problems found on real hardware and virtualization? Yeah. When you're virtualized, there's just a whole heap of different issues. So like, getting timestamps can be different because send's getting your timestamp. And like, there's issues where we've had to tune our clock and use a different clock source in Linux. And like, I've never had to do that in hardware, but then it's something that's virtualized, it's a hyper call. And so, yeah, there are times that you need to know. Of course, you're using Paravet Drive, this sort of coalescing IO together. It's a different, it is a different beast. But if you're really good at one, so let's say you come from a hardware background, you go to the cloud, you will pick it up because you've worked on similar issues before. Definitely, it's a different beast to work with. Yes. So the question was, what tools do I use to analyze memory? If it's just high level memory statistics, then you do get a lot of the flash proc counters, flash proc memory info, you've got PS, it should be fine. You do need to understand the difference between shared memory and then dirty memory, whether the application has dirtied it so that it's actually page-folded and it's allocated physical main memory. And so I use PMAP minus X a lot, PMAP. So PMAP minus X and the process ID really fast on Linux, not so fast on other operating systems. It's really fast and it'll dump the memory address space and you can see the different types of memory. They use that to get to the bottom of a lot of process memory issues. If it's kernel memory, things like slab info and then other stuff out of proc. Yes. So comparing Linux to FreeBSD in terms of, say, observability tools, it's different. At some point I'd like to do a differences talk, because it'd be pretty interesting. So FreeBSD, there's a lot of common things. Like you've got your tops and your PSs and your VM stats and IS stats and whatnot. One thing I like about FreeBSD is PMC stat has a better byte of fault grouping of performance monitoring counters. You've got detrace on FreeBSD, which is really easy to use. You walk through kernel, like I've written lots of detrace scripts, so that's currently much more developed on FreeBSD than the equivalent right now on Linux. So right now on Linux, I'm using things like Ftrace and Perseverance and EBPS to do kernel-based tracing, things that are inbuilt into the kernel. So it's different, but FreeBSD is pretty good. Like right now, until things like EBPS and Ftrace surpass, I generally, it depends on the performance issue I'm working on, but I can often have a better time on FreeBSD than I do on Linux, just because there's a lot of stuff there. It's a nice thing to check out if you haven't checked out already. And of course, Netflix has lots of FreeBSD on our OCA's, which is our CDM. I'll take one more question and then we'll break and you can ask me questions in a break. I saw one more question. Okay, thank you. All right. Okay, thanks. I know, right. Here is my 13-inch screen. Can everybody see that? Yeah. Yeah, everybody up here did the first two, right? Oh, perfect. Well, while we wait on that, I'll go ahead and give myself an introduction and tell you who I am and why you're sitting in this room, why you should care about what it is. So my name is Scott Sealy and I am currently the Community Manager of Cumulus Linux, or Community Networks. What we do is we produce a Linux distribution based on Debbie and Jesse, like Debbie and Weezy currently, that runs on white box network hardware. So you can think of it as sort of the red hat for the networking world or DDWRT for the enterprise world. We give you a full Linux console access to the hardware, giving you SSH access, Chef access, puppet, Ansible. All those kind of favorite tools and things that you would do to use automation DevOps, all those kind of good things. You can use from the top of your rack all the way down to the bottom. Just give a short little pitch about what Cumulus is. My talk today is gonna be about open networking and what that is, there's a bunch of different things out there that people call open networking and everything from software defined networks to network feature virtualization to open stacks, neutron, to open daylight and a bunch of things you'll see, you'll see when we get the slides up if we ever do, that there's a lot of open, open, open networking pieces and a lot of different companies have their own versions of it. If you were in the exhibit hall, you obviously saw the F-boss wedge that they had out and just a stack of switches and that's based on the open hardware project and allowing people to get the granularity down to the hardware of what they wanna be able to do in a completely open-source way. And I'm gonna talk to more about that with some of my other things, other slides that I have. But that's just a bit right this second. Let me go ahead and pull my slides up and I can at least talk off of them. You have a bunch of outstanding people here. Thank you very much, yes. I am the community manager from Cumulus Networks. I am a former customer support junkie. I've done customer support for about seven years before I became a community manager. You can find me in a multitude of things, obviously my email address, my Scott S at Cumulus Networks. I made bit dad on the Twitter. Kilted one on FreeNode. I'm in multiple tutor channels, but obviously you're gonna find me in Cumulus Networks. So what we're gonna talk about. So you can see the my little attempt at being diagrammed over there with penguins all the way down and the Cumulus Networks are actually sitting on top. The fact focusing on what open networks is and what Linux's role in opening networking is and how people can get involved, what's driving the market. Just gonna give you a quick little history lesson. So in the beginning we had early attempts to communicate. So your network administrators are your people who are doing the cave drawings telling people, this is what can be found here. This is what we were tracking. This is what we were looking at. And your sysadmins are the hunters and gatherers who are going out there attacking and they're gathering the things and bringing them in. So it's all the early function areas of networking. Advancements, we go to the ancient Greeks and you have the city states and they're trying to get information back and forth. So you have your network admins with your intermediaries. These are people who ran the city states and wanted to get information back and forth across Greece. And your runners and your marathoners are gonna be your sysadmins because they were running the data through the wastelands of Greece back then. We move forward, we have medieval expansion and obviously things became more advanced with shipping routes and the world became bigger and network just had to get that much further. So you've got your captains and explorers who are doing the network management and your merchants are your sysadmins sending the data through it. And then we got a little bit further in and now we have telco companies and obviously you can see the, oh gosh. I'm sorry, yeah, telegraph, sorry, blank. But your method of data transfer becomes faster. You're seeing things happening much more quicker than shipping it out or somebody running it from point to point and you've got your network admins of your telco companies and your sysadmins being the operators having to plug in all the different things and sending the data across the globe. Computers were invented and obviously things got a lot more crazy. Data could be done in a flash of a second whereas it was taking days before or even hours. And you can see as simple layers of the internet and how things came about. In 91, the content of the internet, 98, Google search engine, browsers in 93, World Wide Web in 1990, the internet itself in 75 and networks back in all the way in 1973. Your network admins are born and your sysadmins are born because you have much more data and a lot more ways to transmit that information back and forth. So everybody knows what happened in the past, but you know, it was great stuff that's why they're called the Good All Days but you gotta review the history of where things were and what they were doing so that you're not doomed to make the same mistakes that people made in the past. You know, those who don't remember are the past are the first people to repeat it. So let's take a look at where we are now. Obviously with networking, you can't have a conversation without talking about Cisco. Cisco is the 800 pound gorilla in the networking environment. Everybody in the room probably has some Cisco device somewhere within their network infrastructure. If you don't, that's awesome because that means you're not giving into the Cisco behemoth. Let's just get a quick show of hands. How many people use something that's not Cisco in their current work environment? One, two apprehensives, three maybe? That's awesome. But just so you know, Cisco has some open things. Their ONS for their Nexus line are getting a little more open. They do have their Cisco One, which is their open network environment, and they have Cisco Viral, which is a virtual environment that you can sit down and play with and touch the different things and diagram a network out and go through things with it. Open WRT, Tomato, DDWRT, I'd say probably, is everybody in the room familiar with what this technology is? This is where you can flash the firmware on your home router and make it that much more accessible. You can be able to do much more things with the stock hardware that you buy off the shelf. And a lot of the companies nowadays are actually starting to advertise that they are currently able to run this stuff when you buy the router off the shelf. You can go to Best Buy now and you buy the latest D-Link router. That logo right there is on the side of the box. It says that you can go ahead and flash this firmware and run this software on your machine. And that is a huge leaf from where it was 10 to 15 years ago when these things started coming out. The ability to say, you know what? We put this on here, but you go ahead and you upgrade that and you make it better. That is huge. Not a lot of companies are gonna say that to you. You look at this, this is just a stereotypical network stack. So over on your left, you have the closed down compute side. You know, you've got your Cisco Juniper Arista sitting at the top of the stack. Obviously, you've got your servers and blades sitting in your rack. These could be anything from Windows, Linux, you name it. And then on your right hand side, you've got an open environment. You've got a choice of hardware vendors. These are just a few that you can choose from. And the same hardware vendors that are there also provide networking gear. And that networking gear is wide open at the white box that you can just buy, put it at the top of your rack, and you can put a choice of operating systems on it. I'm gonna say Cumulus is greatest because they pay me to say that. And that's what I put over there. But there's a bunch of different network operating systems that work out there as well. And it gives you a choice. And that's something that hasn't always been around. This is a little open networking cheat sheet. I'm gonna say there's a lot of words on here. I don't expect you to read it all. My slides are gonna be online later. You can take a look at them. But this is just a few of the networking things that you'll see that say open. As I was saying earlier, there's a lot of projects out there that deal with networking that use open in the name. So open daylight, open flow, open stack, open compute project, open contrail. We'll go into these a little more in depth. Oni. So let's start with Oni. Oni is the open network install environment. We had a guy who works for us that described Oni as pixie that doesn't suck. But what Oni is basically it is a little piece of software that sits on the box. And when you boot it up, it says, all right, I'm looking for an OS. Is there one local? Is there one somewhere in the data center? Is there one on the internet? Where is it directing me to get this? And it goes out and it says, I want you to install this operating system. It pulls it down. It goes through a boot cycle and it kick starts the machine and loads it. And it's just like pixie is for servers. This was written by an engineer for our company and then donated to the open compute project. If you talk to the people who were in the Facebook both in the exhibit hall, there was a bunch of people from the open compute project in there as well. And what that is is it's basically a project that comes together with a bunch of different companies backing it. And it says, we wanna make hardware and software as open as possible and get it to as many people as possible and give them the ability to get down to the granularity of what the hardware and what the software does for them and what they can do with it. Open Daylight was started by the Linux Foundation. Huge advancements for software to find networking and network feature virtualization. They're trying to give a standard to the market which is something that you don't have. I mean, you remember the chart that was just up there a few minutes ago. There's a lot of things. If we could get something where it was a complete standardization, it would help out a lot. Adoption rate on this is a little slow but the list that has backed it is getting some big names. So expect to see more things out of that. Open Compute Project, as I was just stating, great projects, great things coming out of this and it's only gonna get better. More big names are signing on. You can see some of the big ones I put up here. Apple, Goldman Sachs, Bank of America, Fidelity, Microsoft, Rackspace, Facebook. All these people are getting in there and they're contributing to it. And that is huge. That is gonna be big for the networking field in the next several years. This is just a screen bar of all the people that are contributing to the Open Compute Project and you can see, I mean, huge names are there, including the one right in the middle, Cisco. You can tell that this is something that even they see credibility in and it's something that they need. Open Switch, this is a new thing that was launched at LinuxCon and Ireland just passed its fall. Started by HP, a community-based network operating system and it's getting some support. You can see HP, Acton, Broadcom, Intel, VMware. So some big names in the networking field are coming through it. They're trying to develop a fully open network operating system. And you'll talk to that a little bit more later when I talk to this humorous slide. But they're looking to try to get something out within the first half of 2016. Will it hit that soon? To be determined, but again, huge things coming here. Cuberance Networks, obviously, is myself. I wouldn't, would be remiss if I didn't talk about my own company. We're Debian-based, currently based on Weezy. Gonna be going to Jesse at some point in time in the first quarter of this year. Mostly open source was the exception of one thing. If anybody's worked with networking utilities, Broadcom is not exactly one of those ones that wants to let go of what they're doing. So the SDK is not open source, currently. Will that change? Time will tell. You saw them back on the open network systems with HP. Who knows? But we contribute a lot of software back to the Debian main project and also open source communities. IF Updown 2 is a huge one that's just now coming out. It's a rewrite of IF Updown and Python, which is, it's being much more supported than the current IF Updown currently is. Quagga and VRF. But basically it helps you make Linux the language common to your entire stack instead of just so you don't have a network operating system who's run Cisco LS and in rest of everything's Linux. You have this common language that your sys admins and your network admins can talk back and forth. It makes it easier for them to be able to converse and talk about the networking. Dell Networking Operating System 10. This is a operating system by Dell that was sort of rolling behind, but they just made an announcement that they're gonna put more tools behind it and start garnering a little more force behind it coming in this 2016. And they're gonna add a premium layer, two and layer three abstraction to that. Basically the software abstraction and infrastructure basically was just a little piece that would sit over top of the networking box. Anybody would allow you to talk back and forth but there was still more underlying things. The premium layer you're gonna pay a little bit more to Dell for it but it'll give you the full L2, L3 experience. Neutron obviously is OpenStack's version of networking. There are a lot of different companies up there that will talk to you about networking in OpenStack. MediCura, I mean you name it they're out there and they wanna talk to you about it. But basically it's an intermediary for the virtual Knicks and OpenStack and other services to talk back and forth to one another setting up your VMs and all that kind of good stuff. This is something that I expect to see some really cool things happen between now and April when OpenStack Summit happens in Austin. So I expect to see some pretty big things coming out of this bar. Cool, well Jupiter. Obviously Google has a huge need and desire for a lot of bandwidth and a lot of networking power. So they said, you know what? None of these things are giving us what we want. They started their own. They built it from the ground up completely and it's set up to handle both OS, hardware, and SDN needs that Google has within their data center. And obviously everybody knows there's a lot of data centers for Google to handle this. I don't know if anybody was in here who saw the Facebook Gluster talk yesterday but it made me laugh that when they said, we may lose a couple petabytes here and there. This is huge. A couple petabytes. They might lose data. Can't even, can't even fathom it. But one of the new things is DevOps, all the things. If you were here on Friday for LA DevOps days, that room was packed and that speaks to what the future of DevOps is. I mean, a lot of people want to see more and more from the DevOps community and how to be able to handle that within their infrastructure. Open networking obviously lends a huge amount to that because it takes the DevOps tools that are currently available for your Linux in your server environment and applies it to the networking environment as well. We've done a couple meetups and webinars on NetDevOps and it's just garnering more and more support as it goes faster. The ability to automate things. The ability to have a common language in the data center. I mean, if you could think about it, if you have a data center of say 20 racks and you make one mistake on one configuration on one switch, you lose a portion of your data center with automation and then with the ability to have DevOps, you have a configuration file and it allows you to keep that human error down a little bit. Obviously this will happen, but you want to be able to have that layer in there. Looking ahead, this was an article on MarketWatch talking about software defined networking and network functionalization. 45 billion dollar business by 2020. And that's huge. And you know everybody's going to be standing up trying to figure out how they can get their part of it. I mean, you saw that even Cisco was involved with that because they realized that this is where the future's going. So what's next? Traction helps drive the innovation to companies and that's what is wanted. And you never know what's possible until you ask. A lot of these companies, you never saw their involvement until somebody came up to them and said, you know what, I want something like this. And that's when they started going, maybe there is something more to offer there. How can I help and what can I do to help? How can I get involved? Well, that's a great thing. And coming to scale, you all know that there's a lot of different ways to get involved in open community. First off, I'm going to show you talk about is GNS3. GNS3 is a project and it is a company out there that allows for the setup of virtualization of networking systems. So you go and you grab their software, you grab your ISOs from Cisco or us or somebody else and you can set up a virtualized network environment right there on your laptop with their software. And the ability to set that up and test it and see what it's going to look like, it's huge. The ability to say go to somebody, you can go to a sales person and say, here, I've virtualized what my network environment wants what I want it to look like. Build this for me, make this happen. Huge, Revella Systems is another one. This is a really cool setup that we've started using at work for our training and our consultants. Basically, it is a, you go online to their site, you sign up for it. You can build out your entire network infrastructure and spin it up on GCE or AWS and just go nuts with it. We've tested virtualization on it, we've tested demos on it, we've tested all kinds of stuff with it. We've done presentations with it and it's huge, huge, huge. And it's got a lot of great tools. Podcasts, podcasts in the community are obviously huge. I mean, thank you. Podcasts in the open-source community that are a great resource. You saw bad vultures here on Friday night and there's other podcasts that have been represented in the exhibit hall. But these are two networking ones that are very great keys to the community. They give you a lot of information, a lot of videos, a lot of communities to talk about for certifications, information study groups, all those kind of good things. And it's all available at the fingertips. I'd be remiss if not saying my own community. The one I've built for the last year. And it is building steam every day. We've got a lot of great customers that have been a part of it. We've got a lot of people who have been interested in QMIS who've come and asked a lot of great questions as well. And you come to our community, you ask a question, you may get an answer from somebody who's done it with being a customer. But you may also get the engineer who wrote the piece of software you're asking a question about. Our engineers are in there, our consultants are in there, I'm in there. A lot of technical resources and a lot of great people are behind this one. OCP, I've talked about that one already. But they've got a lot of great resources and places to get involved. Obviously the top link there is to get involved with the OCP. And they are always looking for people to be part of them as well. Whether it be companies, individuals, engineers, whatever, they like to help. ONI, there's the GitHub for the ONI project. Obviously it again is under the umbrella of the OCP. But still, they have a lot of development that goes into this. Whether it be helping develop the code or getting information out about it. You have your own open source projects that you love. Open source, whether it be cookbooks, playbooks, manifest, you name it. If you have something that can benefit the networking community, open source it, get it out there and talk about it. Show people what it is. Let them know how they can get involved with you. The more you talk and the more you get people energized about what you're doing and show them what it is, the more they're gonna be enthusiastic about getting involved with you and helping you out. Meet up some user groups. Again, these are great resources to the communities as well. I know just in this area, there is a lot of communities that are dedicated to the networking spot and also to the DevOps spots and all these other things. And with a minimal involvement of, hey, you come out and have a place of pizza and a beverage with them, they'll be glad to talk to you about anything. But it just gives you more opportunities to get involved. But yeah, involvement doesn't always involve just yourself and your time. Talk to your company. If there's some great project out there that you're using at work and you think that it's worthwhile to talk to your bosses and say, hey, maybe we should get behind this project a little more from the company sampling. Projects love that even more because that's what allows them to continue on. People resources is awesome, but the ability to continue making that resource with it's an open source project takes funding too. If everybody remembers the thing that happened with GPG a while back and it was down to one guy and the reason why he hadn't gotten to fix the project is because he ran out of funds and ran out of time by himself. And when that happened and when it hit, he got flooded with donations from companies and individuals and people to help him make this and make it work. And that goes a long way in the open source community. But let me leave you this one last thing. Get out there and start a network revolution because the only way that things get broken up like Cisco is because somebody is out there talking about it and doing something with something else that's not a proprietary or a single vendor device. That's it. Any questions? Any questions? I'm sorry? Yes. There's a little bit of different, there's a little bit of different choices out there, but Broadcom is mostly the merchant silicon that's doing the networking call for the devices. Your switches currently are going to be either PowerPC based or Intel X86 based as far as the chip processor goes, but your networking hardware is still mostly Broadcom. They are not the only option as far as it goes, but they are the best. A lot of years and a lot of involvement with them have made them the leader in that field. And again, much like I said, it's hard to break somebody's single point of control when there's not a lot of talk about the alternatives. But yeah, there are alternatives, but they are woefully behind. In that realm. But yeah, as I was saying, there are Intel based chips and there are going to be arch builds coming out with the next generation. So hopefully soon we will see something that will come about. Any other questions? All right. Thank you very much. Oh, you got more? I will say I'm not hugely familiar with ITF, but that's not to say that it's not being involved with a lot of these projects. That would be something more. I would need to look into it a little bit more. I apologize that I'm not, you know, completely up to date with all of them. But yeah. But it would be something I would definitely like to look into and find out more. Thank you. All right. Take care everybody. Hello? Okay. All right. It's 4.30 so I guess we'll get started. Thank you everyone for coming and I appreciate you staying late on Sunday. It's understandable that some people couldn't stay, but I really appreciate it. My talk is titled, Yes, the FCC might ban your operating system, but that's not the real problem. And that sounds a little clickbaity. I actually, my blog that was titled this, someone said, well that's clickbait. Well, yeah, but 40,000 people looked at it so I guess it worked out because it's actually a serious problem. So just a quick introduction to myself. My name is Eric Schultz. I'm an independent software engineer and open source consultant. I've worked with these companies and organizations in the past on open source software. I am, most recently I am still with Purple as their community manager and that's kind of where this entire topic started for me. In particular, they do a lot of work with OpenWRT because a lot of the members of Purple are very interested in OpenWRT. So this topic kind of came up and then I kind of just went with it and it kind of snowballed, it seems like. So an introduction. This was my blog that I had posted and I didn't really expect that many people would read it. I kind of figured there'd be a few people in the free software community that would get a little excited, but nothing too big. Well, it turns out that the Purple blog had on average about 20 people visiting per day. It was pretty low traffic and then one day when this was posted we had 35,000. That was a big difference and suffice it to say the executive director was very happy that we had all these people coming. What's that? Absolutely. Absolutely. That was good too. So I don't know how well you can see that, but I was real proud of this. I was at first number three on Hacker News and I'm like, oh my God, I have to call my mother. And then I was number one on Hacker News and then I was like, I really have to call my mother. This is amazing. I don't read Hacker News that much, but it's pretty cool. This blog really kind of made this issue snowball and we ended up the actual proposals, which I'll talk about in a second. It had about 4,000 comments with almost none of them in favor. And it was opposed by groups, as you can see the Free Software Foundation Open Source Initiative, Software Freedom Conservancy, Google Boeing, which was one that surprised me. Research Labs, the AWRL, the Amateur Radio Relay League, Open WRT, DDWRT, Mozilla, Think Penguin, I spelled Linus Torvald's name wrong, and Vint Cerf, as well as doctors, service members, hams, developers, and more. This was, they don't, FCC doesn't get too many comments on kind of obscure proposals that often, especially like this. We'll talk about kind of what the proposals and the rules are that started this whole fiasco, I guess the best way to call it. And we'll go into some detail and some definitions of what the FCC actually does, which I think helps us understand what their proposals and why they're there. The two particular proposals are the UNII rules, which I'll explain in a second, and the Notice of Proposed Rulemaking, NPRM on the E-Label Act and the Modular Transmitters. And we'll go into that in a second. So first let's talk about some background of what the FCC does in this space. FCC, you know, people don't really know what they do that well, and they're a very large agency with a lot of people and a lot of funding, and they handle a lot of different things. So they do kind of two things in this space that's relevant to us. And for the hams in the audience, I'm sure this is overly simplifying, but these are the big ones that are relevant to this topic. They regulate radio spectrum users. Radio spectrum users are, the radio spectrum goes from, it's in the kilohertz, I don't know the exact number, and then it goes into the hundreds of gigahertz of radiation. And users are people like you and I, theoretically. They're people that are using your laptop, your phone, the TV stations, things like that. To illustrate why they do this, this is a radio spectrum map from 2003, and this covers from about 4 gigahertz to 10 gigahertz. You don't need to know exactly what all those are, but you can see it's pretty crowded, because each one of those is a different type of user with a different type of use case and can do different things. So the key is that the radio spectrum is a finite resource. We can't expand it. Additionally, different parts of the spectrum have different use cases. In general, lower frequency has a better ability to penetrate structures and travel long distances at lower power. Now there are some subtle differences, and I am not a physicist, so I can't tell you all of them, but generally that's the case. For example, the 2.4 gigahertz, which is what some Wi-Fi uses, as well as Bluetooth and some cordless phones, there's a bunch of other things. That can go a pretty long ways and can go through a number of walls. 60 gigahertz, which is a proposal for some very short-range data transfer that is being proposed, I think, with the IEEE, it can go a few feet. It can go very high speeds, but it can only go a few feet. I believe one person told me that if you put a leaf in front of it, you could block it, I don't know. Yes? What's that? Oh, okay. Okay, partly my confusion, but they do have different use cases, I think, each of these, and different advantages. The spectrum is split into three categories. There's the part that no one can use, the part that everyone may appropriately use, and the ones that licensed parties are different classes of users. They're amateur radio users, commercial operators, which include radio, TV, mobile phone, armed forces, safety personnel, air traffic control, and each user must meet some sort of requirement to be licensed. In some cases, I don't know if they're necessarily all that complex to actually be licensed, but they do have to meet some sort of requirement. In particular, there are certain things that only they can do. And I mentioned appropriate use, and this depends on user and frequency. Appropriate use involves regulation of frequency, power output, and modulation technique. This is actually relevant to some of the FCC's proposals, so that's kind of why I'm bringing it up. Why power matters is, it's a spectrum-sharing technique in part. If you can only go, if you limit the range of networks because you limit the power, you can actually put more people to use the network because one network is not going to interfere with another because it can't go as far. There's one more side of appropriate use, and it's primary and secondary users. The issue is, what if two groups need the same slice of spectrum? I think there may be cases where there's more than two, but what if one is less than two? But what if one is, quote, more important than the other? The FCC solution is to share the same spectrum, but secondary users must defer to the needs of the primary users, and this is also really relevant. So to illustrate that, the FCC does have enforcement for this. In particular, in this use case, they find Verizon Wireless $50,000 for not meeting certain requirements. In this case, which you cannot see it very well, there's actually a $25,000 fine for a ham radio operator that was not appropriately using the spectrum. Important note, unintentional violation is illegal and can be punished. Intentional negligent violations will not be looked upon kindly, and if a user learns of transmission is interfering with others, they must stop the interference immediately. So what's the second thing that the FCC does with radios? They regulate marketed devices. And these are the devices that we're actually buying, in some cases, or whatnot, or using. And it might be unclear. It's like, well, we can regulate the users. Why are we regulating the devices? Well, the problem is devices can behave badly. They can cause interference, and if the users are legally responsible, we don't want users breaking the law and getting fines for no reason. Manufacturers are required to use accepted best practices for engineering, and as I said up there, I'm sure they occasionally do. Devices are regulated also by use case. Part 15 devices, which are unlicensed devices, like Wi-Fi, have different requirements than part 97 amateur radio devices. The things they can do, the power output, certain things like that are regulated totally differently. Now, I've been using this word device, and this is important to how the FCC regulates devices that are actually in the market, is what is this definition? And personally, I didn't spend a ton of time, but I've never found the definition. It doesn't seem to be very clear what the word device means. Now, in the case of Wi-Fi, we can understand what it implies based upon what they've required in the past. And it doesn't just mean the hardware portion of the radio, but it also doesn't mean all the software on the radio. To me, it seems to be implied to be the radio hardware and the software which control the radio parameters. Now, that's a little vague, but that's the best I've been able to come up with. So the question you might be wondering is how much software actually controls the radio parameters. It depends on the particular device. One question that I've used to try to understand as best I can is where is the last barrier that can override all radio control decisions? Where is the last place that happens? And to me, that's like where the device ends as far as I can tell. And again, this is not well-defined, so this is kind of a lot of people guessing to understand what this actually means. So it's important to actually understand how Linux Wi-Fi regulatory actually works in these use cases, because you have these... We're talking about where does this stop? Where does the device end? All these things. And it's important to actually understand how Linux actually works for wireless. First of all, you get a command that somehow goes into the kernel. Then it goes into a driver. The driver then at some point either uses stuff in the kernel to understand how it should handle the request and then sends it to the firmware. And then all that obviously goes back if there's something that has to be returned. To improve this work, the Linux kernel has a regulatory subsystem, which basically is kind of like a best practice for handling the domain regulation and the different regulatory domains. And this can be shared between drivers, but it's not required to be shared between drivers. Some use it, and it's kind of a good idea because it's highly audited and it's reusable, and it's kind of just open source how it actually works. We all agree on that. And this takes care of managing the regulatory domain, which includes the legal requirements on power, on frequency, on DSS, which is actually very relevant to this topic. If you're set to the US domain example, and you say, I want the radio to go onto a channel that where DFS is on, but I don't want to turn DFS on. The kernel, this regulatory subsystem, would say, no, you can't do that and would basically return some sort of error along those lines. So the question of where the device ends. It includes the radio firmware in almost all cases. It includes the driver in most cases, and if the driver does not have an internal regulatory system and uses the kernel implementation, the device ends inside the kernel. So now that we have the background, which is unfortunately very long, we can talk about the UNII rules. And this is one of the large-scale lockdowns as the rules began. It hasn't been totally enforced yet, but they really started putting these rules in place. The FCC approved new rules to restrict the modification of UNII devices. UNII devices are Unlicensed National Infrastructure Initiative, or 5 GHz devices. That's your 5 GHz Wi-Fi, things like that. They proposed a rule that started, all UNII devices must contain security features to protecting its modification of software by unauthorized parties. So you might be thinking, well, that's not necessarily bad. It's a security thing. From the outside, modifying devices, that's kind of reasonable. But there's more. Now, you can read this whole blob of text. You don't really need to. Some key points. The manufacturers must implement security features so that third parties are not able to reprogram the device to operate outside the parameters for which the device was certified. The software must prevent the user from operating the device with operating frequencies, output power, modulation types, or other radio frequency parameters outside those that were approved for the device. The FCC then gives examples of way manufacturers can do this, including electronic signatures on software. So your first firing is to think, this sounds a little bit like DRM. Additionally, manufacturers must take steps to ensure that the DFS functionality cannot be disabled by the operator of the UNII device. Oh, there's even more. From the instructions that they give to hardware manufacturers to comply, it specifically says, what prevents third parties from loading non-US versions of the software slash firmware in the device? Describe in detail how the device is protected from flashing and the installation of third party images such as DDWRT. They literally put the name of an open source project in a rule that they disliked because they disliked it that much. That's a fair point. The people that influenced them didn't like the rule. That's a little bit of both. We'll talk a little bit more. So the question you might be wondering is why 2014? I mean, we've had Wi-Fi for a number of years. Why is it now? Well, until then, the only Wi-Fi that was running on 5 GHz was 802.11a, which was extremely old and relatively slow. And it was an alternate band for 802.11n. Now all of a sudden, 802.11ac is the standard for high-speed wireless and it only runs on 5 GHz. So there's a problem because 2.4 GHz is just too congested at this point. And the channel sizes are much larger. We're going from things that are 20 MHz. The standard allows things up to 160 MHz wide. Now, the 5 GHz band is obviously 1,000 MHz. And you can't use all of it, actually. There are restrictions. Portions of it towards the top are in particular disallowed. So you can't fit too many channels in there. Sadly, something else is also there. Terminal Doppler Weather Radar. It's high-precision weather radar. It's used at about 50 of the busiest airports in the country and more around the world. It's in the middle of the 5 GHz band, but it differs slightly across countries, which just makes all of this more fun. So the question would be, how does the FCC actually manage this problem? You can't just have Wi-Fi transmitting on these interference with these radars, which is understandable. So they use something called dynamic frequency selection. A DFS was required for operators and for manufacturers since early in the last decade. Anytime an unlicensed 5 GHz Wi-Fi device is on a shared frequency, it listens for a special signal from the terminal Doppler Weather Radar. If it hears it, the device negotiates a new frequency with the client and switches to the new frequency. It's actually required, I believe, to do it within 10 seconds. As a backup, 5 GHz Wi-Fi routers could only be operated inside a building until 2014. This was the extra protection that they had. So the FCC has some logic, because now they know that this Wi-Fi standard is coming out and realistically, if this is the high-speed Wi-Fi standard, they're going to have to put some of these outside. It's not practical to not do it when it's all running on 5 GHz. You're going to have to have some outside Wi-Fi. So if we can't restrict it to indoor usage and we want to make sure people can't turn off DFS near airport, then our only solution is to lock everything down. I'm not sure it's actually the best solution, but that's kind of their logic. So you might be wondering, this has got to be a huge problem. People have got to be interfering with this all the time. You know, planes are crashing, terrorists are destroying everything. It turns out they've had 10 cases in 7 years. All involved for-profit companies, AT&T, for example, who are breaking the law. And the reason is because in areas that are near airports, you know that DFS is basically going to disqualify a few of these channels all the time. And they only have like 14, I think it's actually 11 channels in the U.S. in 5 gigahertz. I could be wrong with that. There's a certain number. And they were losing a bunch of them, but they were selling city-wide Wi-Fi in San Juan, Puerto Rico. And they didn't want to have to put in more of these routers. They wanted to be able to use all the channels. So they'll just turn off DFS. They were basically endangering people because they wanted to make more money. Now these weren't tinkerers. They weren't average people. They weren't people trying to control their routers that are running their lives. These are for-profit companies that are acting irresponsibly. And they were fine for it. There aren't many of these cases, though. Most of them could have been avoided by simple UI changes to manufacture and third-party router firmware to eliminate unintentional violations. One thing that was really interesting is there was one router that was shipped in one of these cases that actually had a box that said DFS on or off. Now I'm going to say that that's not reasonable. That's not something that should be shipped by default. Additionally, there were some third-party open-source firmware that was also basically allowing people to do something similar in the UI. I would say irresponsible as a default. So now we're going to have problem number two because the UNII rules basically wanted to affect last year, or about June last year. They seem to have been delayed, but it's really vague because everything in the FCC is really vague, basically, as best I can tell. They're not seeming to be enforced, but they're on the books that they could be enforced at some point. I think I found something that said that basically they were exempting things that were imported into the country. I'm not sure if it was covered in this case. It was unclear, and I'm not a lawyer. I've had lawyers tell me a lot of this stuff, but this was not an area that I had clarification on. But I do know that it has been exempted at least for a period of time. Problem number two is the Elabel Act on modular transmitters, and SDRs and a ton more. An MPRM is a Notice of Proposed Rulemaking. This basically is when the FCC says, we think there's a problem, and we want to make a rule, but we want to talk about this first. Basically, people can send their comments, and they can ignore them, and whatnot, and whatnot. This was a really long MPRM. It came out in August, and they gave you, by default, a month to actually comment on this. This thing was just hundreds of pages long. It was just ridiculously long, and it covered all kinds of topics. So it's important to talk about definitions. Modular transmitters are approved transmitters that can be added to hardware without requiring approval of the whole device. It's kind of an add-on. Basically, the goal is the FCC and a lot of these companies basically want to be, I can just snap this thing in, and I don't have to get my device approved anymore, because I know I'm not modifying anything, and the transmitter's been approved anyway. The eLabel Act is also in here, and it's an act of Congress to allow electronic FCC labels instead of physical ones. They have to put these certificates of conformances on boxes. They sometimes put in boxes, or in the manual, things like that. Most people ignore them, but they are required for devices. So the goal was, well, let's just, to reduce cost, let's just have them somehow put on the device on some sort of UI, if there is a display on the device. So, and I should have shortened this up, but basically, for devices including modular transmitters, which are software-defined radios, and use software-controlled radios, they have to describe it. They have to state which parties are authorized to make changes, and the software controls that are provided must prevent unauthorized parties from enabling different modes of operation. Manufacturers must go into detail on how to secure the software and the application for equipment authorization. It must include a high-level operational description or flow diagram of the software that controls the radio frequency operating parameters. The applicant must provide an attestation that only permissible modes of operations may be selected by the users. That last one seems to be kind of hinting at the UI problem, I think. There's a second part of this. And it, right in the beginning, it kind of says the problem. Manufacturers of any radio include certified modular transmitters, which include a software-defined radio. Must take steps to ensure that only software that has been approved where a particular radio can be loaded into that radio. We can go into the details, but it also includes things like digital signatures and basically the same stuff we saw in the UNII. So you might be wondering, what's a software-defined radio? Basically, the radiologic has been moved into software, much of it. This is actually a very vague definition, as well, because things that are called software-defined radios, previously were not software-defined radios in the FCC's mind, and vice versa. It's very confusing. It allows complex algorithms that were kind of impractical to do in hardware for reliable transceiving, which is transmitting and receiving, handles beamforming, and even a DFS, to some extent. Hardware could be sold in a wider range of use cases, but there's changes in software, which is kind of a benefit, because all of a sudden we're opening up this stuff that you had to have the electronics built a certain way, and it would only do these certain things, and all of a sudden we can start expanding this out for people. And as it said, a broader range of people could innovate or experiment. The history of software-defined radios, and this is actually, Cory Doctorow has actually talked about this. Quite a bit, he actually wrote a blog post on this a few weeks ago. For the last decade, the FCC saw SDRs, and were pretty horrified. Instead of educating and enforcing laws, they wanted to avoid needing to come up with a better plan. They wanted to avoid it by needing to come up with a better plan. They would certify that SDR software doesn't violate rules, signing all SDR software in the FCC key, and require software to only run SEC signed software. As you can imagine, this was not a popular plan, and it was totally impractical because they could never verify all of the software actually didn't break any rules. So they came up with a better plan. They told people to secure the SDRs, but they didn't tell them how to do it. They said it was possible for open source software to be used for securing it, but it would have a high burden. There were separate approvals for software-defined radios and non-software-defined radios, although the devices aren't technically all that different in many cases. And there was a more difficult approval policies for SDRs, but they were slightly more flexible in what you could do. They're mostly used as niche products for hands. From what I can tell, the SDRs aren't actually that secured. As I said, I haven't investigated much, but I didn't see any obvious ways that they were. Maybe they are. And it's possibly due to the FCC being worried about the lack of a market. So the question is, why are we talking about it? Well, they admit in the NPRM that the SDRs market is pretty much doomed and the approval is just way too difficult right now. So they're going to get rid of this distinction. And then apparently make a bunch of devices meet some of the SDR requirements including securing it, which were too difficult and didn't succeed in the market. I'm not sure why they feel this is a good idea, but they did. So the eLabel Act, this is another dangerous part of this. It allows manufacturers to show a certificate of performance on a display instead of a piece of paper. The rule proposal reads, the necessary label information must be programmed by the responsible party and must be secured in such a manner that third parties cannot modify it. Now this is potentially not as dangerous as the other parts of the NPRM, but it's still very vague. Because the question is, what does it mean to be secured that the third party cannot modify it? Now if it literally means we can write this onto a right once ROM or flash or something like that, that nobody can modify and you don't have to actually make sure it's displayed on the device, then this isn't actually very harmful. This is perfectly fine. If they want to enforce it and say basically that, you know, you go into your system preferences in your phone and your certificate of conformance always has to be there, well, how the heck do we do that and allow people to replace the firmware on their phone? Because if we allow people to replace the firmware on their phone, they could make sure that the certificate of conformance doesn't display. We're not sure what this means, and again, it's kind of a proposal, so I think they're a little loose in the language. I'm not sure if it actually came to a rule or whether they would be as unclear about it, but it's possible they would. So anyways, we had this big blog at one point and the FCC got to the point where they responded, which was kind of interesting because they don't think they really care about these things. So they were a little caught off guard. A spokesman said in some of the articles that the policy didn't affect open source operating systems, which is absurd. And a confidential high-ranked FCC official said they felt that there was way to comply and protect open source. And as I said, apparently 4,000 people didn't agree with him. And I personally have a suspicion that the high-ranking official is the chairman of the FCC, in part because of this next part. The blog post was done by the Chief of Office of Engineering and Technology and was titled Securing RF Devices of Mid-Changing Technology. The Office of Engineering Technology, as I quickly learned, actually is very much run through the chairman's office. And they have been discussing why this is so vital. They're basically the people that actually understand the technology involved as best they can, as obviously the people on the FCC are not all people that are engineers and physicists and all those kind of things. So basically the blog said, we don't tell you how to secure the radio and that you can't use free and Libre and open source software images, but you have to secure the radio. And we're not opposed to open source as long as you can secure the radio. I kind of missed the point. There was a reply period and this Chief of Office of Engineering and Technology wrote another blog post called Clearing the Air and Wi-Fi Software Updates. It was just warm and fuzzy. Basically, we're going to work with these stakeholders. We changed the UNII guidance to not mention DDWRT anymore, which they did. It was very warm and fuzzy sounding, but they didn't actually say they were changing any of the rules practically because the UNII guidance basically said, well, you still have to make sure, you still have to explain how you're going to prevent people from modifying the radio and the firmware. So realistically this was, it sounded really nice and a lot of people were like, oh, it's very warm and fuzzy. You're going to work with people. And it was, they didn't do anything. So I has a map. So I decided to respond to this on my blog. And as part of that, I asked 17 questions that the FCC should have clear answers to before moving forward and I'll discuss them as part of one of the problems with all of these proposals. They have not responded. Maybe they won't respond. By the way, is anyone from the FCC here? Interesting how that is. No, but actually it was after the NPR amended. They do contact with people. And I do know people who have contacted them. My point is really that they feel like this idea that they're going to be all this warm and fuzzy, but we're not actually going to do anything and meet you on your own turf or contact you or say, hey, you brought up a good point. Let's talk about this more to figure out a solution. The NPRM is not really a very good mechanism for coming up with collaborative solutions. I appreciate that. They can't have ex parte discussions though, which they do. Well, we did comment. There was a number of comments. They were done with the comment period is when I brought this up. We're going to talk about the workarounds. And a lot of people feel like these are workarounds. We can get around some of these problems. In particular, they're going to lock down the entire device. Which I don't think is particularly a workaround, but it's there. One of the ideas is running the radio firmware on a coprocessor where root can't touch it or in some kind of mechanism where you can't interfere with it. Something like cell phones. We do that with cell phones. And I say that these are both unacceptable and they should be completely condemned. They're terrible ideas. And there's a lot of reasons why they're terrible ideas. So you might be wondering what's the problem with lockdown. It takes away control from users and puts manufacturers completely in charge. Manufacturers at this point have to lock down devices. But since they have the key, they're the only ones who can decide what goes on on that router. How often is your router updated? There are security holes. If there's a security hole in your router and you can't modify it, how exactly do you protect yourself? Unplug it? Well, now you've just thrown away like $100 or whatever because your device can't be fixed if they choose not to. There's unintentional violation by the users due to bad hardware. For example, the unintentional violation, remember, is a crime and is illegal. Additionally, the user has no control over the hardware at this point. So basically we punish for running hardware that you have no ability to fix. Your best chance is just turn it off, which again is $100 down the drain. There are functionality limits. Most of the radio firmware doesn't actually have very good support for a number of features. One of them is ad hoc networking, which is used, ad hoc Wi-Fi, which is used for mesh networking. It's incredibly poorly supported usually because it's not a very big market. It usually has to fall in the community to actually improve it over time and they have. The second problem with lockdown. Ignores that different users have different privileges. Hams, for example, have a much different set of rules and things they can go under. For example, they actually, at scale earlier today, the Hams had a ubiquity router running I think was 3.3 gigahertz. I could be wrong on that. Which only Hams can use. Average Wi-Fi people, unlicensed users, can't use it. This is also, I believe, used by public safety personnel. They potentially could have it running at a different frequency. And Hams in particular, this is actually a major one, is in disaster recovery, they use the mesh networks to help disaster recovery because you need to have data coverage over a very large area where, all of a sudden, none of the lines are working. And you need to be able to do this across, you know, let's say there's been a tornado or something large like a hurricane. Hams go of their way to help with this. And this would not be possible if part 15 devices were restricted to only doing things that unlicensed users can do. It ends low-cost wireless radio research. There's a ton of this actually being done. Simply because research equipment is extremely expensive. And you can just use a router. Why bother? I mean, it works perfectly fine. Research labs actually were some of the people that complained about this because they were like, well, we can't test to see if this device actually works the way we wanted unless we can make sure that we can modify it. And that, and community members have done things in the past, such as finding bugs in radio firmware that were submitted to Qualcomm and Ferros. And I've heard of someone who's actually working on algorithms to handle reducing power in transmissions if the quality of the signal is high enough. Basically, this is a way to reduce radio interference because obviously, again, if the power is lower, it's not going to go as far. It's not going to penetrate as far. And that increases the quality of the spectrum for everyone. These things would pretty much be eliminated. It prevents the use of devices across some borders. And this is an issue that is particularly for service members. There was a US service member who actually submitted a comment and said, basically, I'm going across borders. I'm transferring around the world. I want to comply with the requirements of the Wi-Fi because they're all slightly different in all these countries. And right now, I can go into the kernel and I can modify, I can switch the country code and I'm using the correct Wi-Fi frequencies and where DFS is and all that kind of stuff. This would be impossible. This person would have to buy a router in every country. That makes literally no sense. It's incredibly wasteful. And I mean, are we going to just give these out to US Army members? Like, what are we going to do? Additionally, business folks, people who travel between countries, it's absurd. We don't live in a world where we all just live in one country and we never go anywhere else. We travel all over the world. You need to be able to make this work. So the question is, why are they doing this? I have suspicions. I don't know if they're all correct and in fact, probably many of them are not. But one thing is it reduces enforcement cost. I think this is actually probably a major part of it as with any federal agency, budgets get cut. And over time, there are... they have fewer and fewer people to do enforcement. Additionally, in the summer, they had closed a number of enforcement, of local enforcement offices. I think there were three or something. I'm not showing the exact number, but to illustrate, they want to reduce their enforcement costs. If they know people can't actually modify things, they're not going to have to enforce the law because it's just not going to exist, they think. That's unrealistic because obviously people can import things and we all have all these old routers and it's a very silly idea. But it potentially could reduce their enforcement costs. The possibilities, they may want to sell part of the spectrum and need a way to actually enforce that that sale is valuable. Obviously, if they do so, that whoever buys it is going to pay billions upon billions of dollars potentially. And they're not going to want to buy something when they think that there's going to be tons and tons of interference. One thing that is happening that is... I think they're not selling it, but is related to this is something called unlicensed LTE. And there are a number of companies that are... sell companies that particularly are concerned that basically they have... their bandwidth is getting sucked up and they've bought tons and tons of this bandwidth for billions of dollars. It's unbelievably expensive. And they want a way to actually put this on the unlicensed spectrum. So they're trying to get approval for something called unlicensed LTE. And basically it is you could actually put the LTE data transfer on the unlicensed area of the spectrum. They claim it wouldn't interfere with Wi-Fi. I don't know if that's true and honestly I'm probably not qualified to tell either way. However, there obviously is competition for the spectrum. The FCC does... I don't think trust individuals. I think they are fundamentally view innovation and particularly innovation and experimentation on some level as something that companies do. They feel the market should solve this problem in some way. They like high-tech solutions to social problems. And I've had a lot of people say, well, basically there's not enough of this spectrum out there. Wouldn't it be great if we could just make sure that people didn't break the rules so we could just fit everything in better? I think another way of handling that would be to actually teach people and punish people for breaking the rules. But that's a different opinion. And I think they want the regulatory world before software defined radios. To some level before software defined radios it was really easy. This was a part 15 device. It could only do this. There's no way to. The hardware didn't allow it. You would have to solder things and all kinds of things. They view it as users of part 15 unlicensed devices are consumers. They view ham radio operators as experimenters, tinkerers, and they view companies as innovators. And that is not the world we live in. It is not in any way the world we live in. Each one of those people may apply in one of the other areas. They may innovate. They may experiment. They may consume. It's not a straightforward, you're in this area and you're in this box and that's all you do. It's far more complex than that. And I feel that they actually, that is what they don't appreciate that or simply don't care. The question is what is the solution? They need to work with manufacturers to make sure modification of radio parameters actually requires reflashing. I think that's an appropriate idea in that the default should not allow you to break the rules effectively. If you flash something, it's on you. And it should specifically say, if you're flashing a new piece of firmware, it should say, if you basically warn you, you could break the law doing this. This is very serious. There's a very severe punishment for this. Make sure you know what you're doing. It's kind of a buyer beware or in some cases when you flash your phone you lose the warranty. It's like that kind of concept. They could work with the free software community to make sure default UIs aren't dangerous. In the example of the firmware where it allowed you to just click a button to turn off DFS or use frequencies that you're not supposed to use, make sure the defaults aren't dangerous. Make sure that people actually have to go to the effort of recompiling if they want to break the rules. One of the things is that the quality of this firmware, the radio firmware, that most hardware companies release is very low quality. And we have no ability to trust much of it. We don't know if it's actually breaking, in many cases, the SEC's rules that require because it's not really audited very much. The SEC should require the release of radio firmware source code. That's a big ask, but I think ultimately given the world we're living in where we have IoT devices and we have lights that can do all these things and you've got your thermostat, we need to have some level of trust that this actually works. Now I would argue that they should be open source. I think for this case it should be open source or viewable. I would like it to be open source, but I understand that the government probably cannot enforce something like that because that is pushing it in a certain direction. I think that they may need to get some level of authority, but I think at least the release of the source code for it to be audited is at least a step in the right direction. This is probably going to be controversial. I think HAMS should work more in protecting the spectrum for everyone and vice versa. That's what I said it was going to be controversial. In the AWRLs, in almost all the HAMS that submitted replies that I saw, many of them were saying protect the right of HAMS to use to modify devices. I can understand why HAMS are doing that. At the same time, I would argue that we're in this together. Ultimately, there are very powerful people that want to start taking away spectrum. If they start taking it away for unlicensed users, they're going to try to take it away for licensed users. It'll be tougher probably. HAMS are extremely well organized. I think this needs to be an extremely collaborative approach. This is not a criticism of HAMS as much as let's just look outside the box more at how this problem is affecting all of us. I'm actually a licensed ham. I have my technician. I don't think I've ever used it, honestly, but I am a ham. Additionally, there should be a collaborative campaign to discourage inappropriate usage. If this is a problem, there are ways to actually discourage people from doing this. You can actually teach them that it's actually harmful to turn off DFS. You can do things along these lines. We have educated people to not do dangerous things in our society for many, many years. One example that I've heard is the concept of drunk driving. Drunk driving is far less common than it used to be, simply because as a society we've changed, we have considered that outside the norm. Now, again, this is a little bit of a different concept, but there are things we can learn from that, and that could be a practical solution. Additionally, there should be fair, firm punishment to those who break the rules, particularly if they endanger others and do it for profit. I don't think that there's any doubt that that should happen. No, I don't think it is. Particularly in this area, it's more for licensed area. They are allowed to, if you use it in a way that is against the rules. For example, they did do that in the case of the AT&T where they turned off DFS, because that is illegal, and they were in the unlicensed spectrum. They are allowed to provide fines. They're simply not doing it very often, which indicates to me either it's not a very big problem or they don't care about the problem. The rules in this area, or it's not happening that often. They're not listening. They're literally not listening to this. I agree with you they did extend it, and that indicates that there was a problem. The issue is that even when they finished that and they had the reply period, they stayed in the exact same spot they started with, realistically. I mean, this isn't a ham or not ham thing. They're simply protecting the spectrum, which is something to be quite honest. I mean, the FCC exists to protect this for people. This is a political process. The first blog post, the second one was not responding to. I was somewhat being sarcastic at the point. The point is that there is a very large problem because there isn't any collaboration that's actually happening. It is not happening. If there was collaboration, there would be discussions happening. Now, I understand that that's difficult in the case of the FCC. However, they did not have any problem going to CES and discussing, talking and taking pictures with the T-Mobile CEO. It was on his Twitter feed. They're not coming here to take pictures. Yep, absolutely. We can create better tools for the community to find a storage library because this is actually Cory Doctorow's proposal. I don't know how feasible it is, but obviously we all have devices that have radios in them right now. Could we choose to install software to actually report and triangulate where lawbreakers actually are? I don't know. I don't know the feasibility of that. I know you can do some level of triangulation. Something to consider. Again, it's ending the forcing of people and devices in these regulatory boxes. It's not quite as simple as this person does this thing and this person does this thing and this person does this thing. It's a lot more complex. It doesn't mean that we should just say, well, don't regulate the spectrum. No, you absolutely have to regulate the spectrum to protect it for everyone. But between these areas, they use devices in unique and valuable ways. That is part of what is American innovation. They need to understand that and appreciate that as part of their rules. I would agree with you. My point is simply that when I said there's a set of things, one of them is simply making it so that people don't do these unintentional things. Discouraging the unintentional violation. Because that is a serious issue in the FCC. One thing that we can actually look at is make it so that if somebody is violating it, it's very likely that they are violating it intentionally. What are the ramifications? Lockdown. I mean, that's a possibility. I mean, I don't know where it goes. Maybe. One thing that I think is important is simply and this is probably part of just the fact that rules in the federal government bureaucracy is very complex. With the lack of information about the UNII router rule, is that there's these cases where companies are locking it down and it's unclear why they're doing it. Some of the times they claim it's for business purposes and other times we're not sure. The truth is because of the vagueness of the rules and simply because you need to have a lot of experience to understand how this all works. Some of which, even though I've been working on this for parts of six months, I still don't understand. It's unclear what the rules actually are in the case of routers. This is another side of the same problem. Additionally it's important not to discount the fact that basic software freedom is you should be able to control your own device. Now you could break the law with that device and or break a rule or what not and if you do then you face the consequences of whatever that is. But that doesn't seem to that doesn't we don't prevent people for example, lots of people could drive drunk today. We don't require everyone to have a breathalyzer on their car. We don't do that because we feel that there is some level of user responsibility. In this case I believe that that is similar. Yes? I agree with you. I would agree with you. To be fair I have been told that if you do operate on the DFS frequencies the terminal Doppler weather radar is running and you are doing it outside of your house apparently that is extremely sensitive. I don't know the details but again that is this problem for people who turn off DFS who live within a certain distance of one of these airports and only of these airports that are in 50 cities it's an extremely rare occurrence effectively and the idea that that is sufficient to then take away all abilities to control your router and that there is massive danger is simply not true. To be quite honest I don't want to be glib but if terminal Doppler weather radar is that sensitive then why haven't terrorists attacked it in other ways? It's a silly concept to me that is somehow so dangerous as a translation for those of us who don't really understand this it's very rare is that the correct understanding almost impossible I agree there is certainly a fear of that the idea that this is an attempt to actually I agree I don't admittedly there is a lot of potential less than open reasons they may be doing this I mean admittedly and we could come up with them I'm discussing the one they gave us and I'm saying that this is absurd and that that's generally that whatever the reason that they're coming up with this is bad for a ton of other reasons and we need to oppose it in every way a lot of it is admittedly they I agree but I mean some level have to respond to what they are saying as best we can and we have with numerous again that was mostly sarcastic and again this is a political debate it's a political debate there were a number of comments and replies 4,000 I provided one of them we did and they responded with we're keeping it basically the same I agree and I'm saying this is a political effort long term it's the same effort that encouraged them to re-evaluate the net neutrality rules and then came up with ones that are more broad you need to have there's lots of different sides to this I agree and it'd be great if ideal situation we have somebody in congress that says this is a terrible idea and then they pass a law or what not that's probably not feasible but we'll try or something I mean there's a lot of effort we can do to do this in fairness we are past our time I'm happy to continue discussing to be fair because there's nothing else I just want to tell people that we are past our time but I'm happy to continue discussing it thank you thank you