 Hi, this is Brock Palin and this is RCE. You can find us online with all of our old shows and RSS feed and iTunes link at RCE-cast.com. Also head on over to iTunes and give us a couple reviews over there. It definitely helps out the show and lets other people know what we're doing here. Also I have Jeff Squires who is back from a one time ever hiatus who is helping me out again. Jeff, thanks again for your time. Brock, yeah. Let's just say I was on vacation to Morbados or something exciting. Like that. That makes a better story but glad to be back and today I think we have a pretty good, pretty cool topic. It's a little departure from our normal HPC, Linux-y kinds of stuff but it's a very related topic. It kind of strays into the whole IO field and that is a big deal for all of us. So something we're talking about today. Yeah, we're moving into, I mean, when a lot of people talk about in our industry about big data, these guys definitely have big data but it's a different kind of approach. So let me go ahead and introduce our guest today. Our guest today is Scott Long from Netflix and we're going to be focusing on something new that Netflix is doing to try to make their service better. So Scott, might take a moment to introduce yourself. Yeah, hi Brock, thanks a lot and thank you Jeff also. So my name is Scott Long and I work at Netflix. Previously to Netflix, I've been there for about eight months but previously I was at Yahoo for about five years working on their operating system infrastructure. And prior to that, I've done years, many years at places like Adaptek and other storage companies that have been specializing in basically drivers, storage infrastructure, low level stuff like that. Outside of work, I've been a very active member of the FreeBSD community for about 12 or 13 years now. I was the release engineer for a number of years and prior to that I actually attended University of Michigan briefly in the early 90s and I now have a degree in actually aviation science of all things but I still do computers as my day job. So that's value for me for background. Okay, well thank you and being a U of M person myself, it was actually when you came out and gave this exact same bit of material to a number of students here and faculty at University of Michigan and gave me ideas like, hey, this fits right into what a lot of people listen to our show would be interested about. Yeah and I would have two that you're in good company because Brock, you're like a nuclear physicist or something? I studied nuclear engineering and yeah, I was looking for a summer job and started doing swap and CPUs and old burnout clusters and here I've been ever since. Yeah, I have a BA in English, can you believe it? Very nice. I was at the airline pilot for about six months last year but gave it up. Oh wow. Alright, so we're coming from quite the diverse background but that brings different perspectives to computing so this is a good thing. Okay, so the actual thing we're talking about, Netflix kind of is known for DVDs in the mail, we're going to be focusing on the streaming service which has been kind of a growing part of the business. Can you tell us a little bit about Netflix streaming and some of the challenges you guys have? So yeah, as you said, Netflix's original business was DVD by mail and part of what made Netflix so good at that was understanding the data science of that that it's not just about putting a DVD into the mail, it's also about knowing how many DVDs to inventory and how to pre-stage those DVDs out throughout the country into distribution centers so they could quickly get to customers. And that was all very much data science and data analysis and that's what they got very good at over time. About five years ago, it was becoming very clear that the DVD business would not last forever obviously and there was time to get into streaming so they took some of their data science knowledge and applied it into streaming, licensing the same content that they were getting in physical media and hosting it in the same Amazon cloud services that they were using for their DVD sciences and getting it out to customers. And it's been a growing thing ever since and it's been growing so much that we're actually trying to take some new directions with that and instead of having everything be hosted in the Amazon web services and distributed over the content delivery networks like Akamai and level three and I'll talk more about those in a little bit. We're actually starting to take a lot of that in-house in order to help ourselves grow more and optimize for better service and better customer experience. So how much data are we talking about here? I mean video is kind of the next generation internet. My company, I work for Cisco has done big bets on video and has made a lot of predictions on how much video is and where it's driving the network and things like that. But how much data do you guys stream, say per day per month? Something like that. I actually can't give a good numerical value. But what I can tell you is that we stream over 33% of the internet's peak traffic during peak times. So. So 30% of a ginormous number is basically what you're saying. Yes, yes. Yeah. So with that into a little bit of perspective, the old saying was that pornography actually is what drove the adoption of many technologies over the years of VCRs and then DVDs and the internet and dial up to DSL, the cable and all that kind of stuff. Movie streaming in terms of things like YouTube and Netflix and Hulu have actually eclipsed that original segment by a large factor now and now the actual TV shows and movies that we're streaming is makes up. When you add all sorts of data over 50% of the internet, whereas that pornography is now just a small fraction. So that gives you an idea of how much things have grown just in the last five years and now services like Netflix and Netflix especially are really what's driving technology. So if I've got a device at home and I've got a Netflix streaming account, what exactly is the process of you mentioned Amazon, web services in there and so where did the bits, where does all the hardware and where does stuff come from? All right, so there's really two sources of what is going on. First source is that when you bring up your client or your web browser to go to Netflix to choose a movie, that's going to go out to the Amazon web services. And those services might be on the east coast or on the west coast or in some data center in between. But you'll go there and you'll pull down the selection of movies and you'll be able to browse through and even see some clips, all that kind of stuff. That's all coming from Amazon. But the next big step is when you actually go to play a movie. And the first thing that happens there is when you select a movie, you go back to the Amazon web services to exchange some keys. It's basically encryption key type of stuff. All right, so along those lines, you talk about, well, wait a minute, actually, I'm sorry, Brock was I.M. in the middle. Did you say anything about your Amazon, like you pick which Amazon center you end up coming from? So which center you come from is really based on your geography. You know, the big networks out there have a big database of everyone's ISP and approximately what geography, geographical location they're coming from. So, you know, it's based on DNS, based on IP address, that kind of stuff. So you get directed to the closest, reasonable Amazon center for your geography. All right, interesting. So you're basing off locality. Do you also base off what kind of network they're coming in off of? So like mobile versus broadband and these kinds of things. And do you pre-stage different movie formats and resolutions and whatnot to match? So when it's just getting the movie started, no, but so the next step for playing the movie, like I said, is you actually hit the play button, you change some keys with Amazon and then Amazon sends you your client out a list of servers that it thinks you should contact in order to start getting the bits and those that was the servers based on your geography. It might be based on your client. It might be based on what kind of network you're on, whether it's a cell network or DSL or cable modem. There's a lot of decision making that goes into that. But in the end, you get a list of servers and basically a list of files because all the streaming is just pulling down HTTP content. So yeah, so and then you start streaming. OK, and one of the big things that you're doing now, and that we're specifically talking about today, is that you're pushing down some of this content even lower, right? So you're not just doing everything in the Amazon cloud now. You're pushing it down to the ISP level. Right. So there's a couple of reasons for that. First one was that you know, while we had this great network of content delivery networks of third party networks that were they were delivering the content to the to the local regions, the ISPs were still complaining that they were being killed by our traffic that, you know, especially some smaller ISPs were having a huge percentage of the traffic being Netflix and it was costing them a bundle to link up to the upstream where the content was. So we started having this idea that maybe it would make sense to trying to get the data closer into the ISPs, especially especially ones where there's enough traffic for it to really make sense for for it to be mutually beneficial. OK, so you've gone from having no data centers and running everything in some sort of other, you know, operation service provider. So you're putting actual physical boxes with disks in them in local ISPs all around the country. And I understand heavily in Europe also like like what's the makeup of these boxes and what do they really do? So the boxes can be seen as kind of a building block type of thing. One box can hold about 10 percent of our library and we'll place one or more boxes into an ISP or into a larger regional data center based on our traffic and our content needs. You know, the boxes can be can be built out horizontally to serve more clients the same data or they can be built up vertically to serve the whole library to to to a given set of clients where they can be built down both directions to make a big super cluster that can serve a whole region. So what's the hardware makeup of of let's say I'm, you know, a tiny relatively local ISP in the Midwest in the United States. What kind of box would you deploy there? Like what kind of storage and things like that? So we have really one kind of box that's we're working on a second one, but I can't talk a whole lot about that right now. But basically the main box that we're deploying right now, it's a PC box. It's a 19 inch chassis, fairly fairly customized because we want to be able to fit into small data centers and telco data centers, that kind of stuff. So it's not very deep. It's very compact. It's about four years for you tall and it holds 46 hard drives, 36 spinning drives and six SSDs. And how much storage total are we are we talking here? We're talking about 120 130 terabyte of storage. All right. And do you use redundancy or raid or any kind of thing like that? Or if the box crashes, it's gone. So we do have a mirrored boot volume, you know, obviously you don't want to lose the boot volume and have the whole machine be offline. But beyond that, we actually don't mirror or stripe or anything like that at all. Each disk acts as its own independent file system. And we do that very much for reliability. We don't want to have one disk go down and bring down a whole stripe of disk with it. At the same time, we don't want to pay the storage overhead of redundancy because like I said, this is supposed to be a very compact box for supposed to gain as many bits into as possible for mirroring every drive or, you know, striking parity across a side of drives. We're losing capacity. So do you have these boxes phoning phoning home and you have people run around the country swapping out disks in them all the time as they fail? Or do you just a cache keeps getting smaller and smaller and smaller until you decide it's time to ship them a new box? It's pretty much a cache keeps on getting smaller and smaller and smaller. We do have an operations team that does monitor the performance of the caches. Likewise, there's algorithms in Amazon that are also automatically monitoring the health of the box. But really what happens is that if a disk dies, that disk happens to have to the whole, you know, X number of movies. And when that disk dies, those movies are no longer available from that cache. So your client then goes find goes and finds another machine that does have those movies. OK, so how is a client deciding which cache to talk to? Is it just like a DNS thing or is your control backplane actually telling that, hey, talk to this thing first? Yeah, so it gets the list from the control backplane. The backplane is monitoring the health of all the machines. Like I said, there's those algorithms going on. They're constantly monitoring. So so, yeah, so that control backplane just always has a list of candidate machines that sends out to the clients and the clients just go through that list. And if they if they run up to the end of the list and they go back to the backplane and ask for a new list. So how do you decide what to cache and where? Is it very much based on demand? So if, you know, 20 people are asking for a movie, you'll cash it down there. Or do you ever let some go upstream? Or do you always try to cash? What kind of strategies do you use? It depends on the size of the of the deployment. Like I said, for some eyes, some ISPs might only get one box. And that one box will cash. It's only about 10 percent of our library. It'll be able to handle about 80 percent of the traffic because it's the whole popularity versus long tail type of problem. So that's part of our survey. When we when we when we talk to an ISP of seeing how much traffic they have and what their what their mix is. But the other part of it is that, once again, data science in in Amazon will sit and look at the popularity of content on a day by day basis and will shuffle the content around on each box at night during a during a fill window. So new movies could get brought in unpopular movies get deleted. And we we just want to shift things around that that way based on what we think is going to be popular the next day. Here's a non technical question. So if you're preloading movies on two boxes in different localities, potentially even different countries with different laws and copyright issues and things like that, are there different legal issues about where that movie can come from or what format it's in or or or things like that that all has to be taken into account by your control by claim? Not in real time, so much as when we go into a country, we work all that stuff out ahead of time. So it's mostly in licensing with the movie studios and with the studio representatives in each country, you know, exactly what those those parameters are going to be. So that's why it's hard to expand very quickly. It's not just a hardware issue. It's also we have to go to every single country and work out its laws, work out its licensing restrictions and work out its its representatives from movie studios. So luckily, places like Scandinavia have been very receptive and very relatively easy to work with, but there may be other countries that aren't as receptive and aren't as easy to work with. And we won't show up there as soon. So you mentioned that a single of one of these for you cash is holds about 10 percent of your library right now. And I assume that's before the ultra HD or 4K or whatever it is you guys just announced, which I want to touch on later. That doesn't tell how many like IOP kind of needs, like how many actual people demanding streams can one of these things serve? So that is part of what I do is optimizing that right now. One box can serve, we'll say about 10,000 customers at one time, 10,000 streams at one time. But there's a little bit of variability into it because because of basically different clients and different bit rates. You're you're really streaming to your iPhone or your iPad is going to be a lower bit rate than someone going to your desktop PC. So really what we think about more in terms of this maximum bandwidth and right now these machines have to 10 gigabit either that links on them. We try and fill those links up and that might be 10,000 customers might be 15,000 customers if they're all lower bandwidth customers. Okay, so that's the only metric like as long as we say a cash can only push out this much out it's 10 gig interface. And when it gets to 90% of that, we stop sending clients to that cash. It's a little more complicated than that. The control back plane is constantly looking at the health of the cash and by the health is looking at things like latency, it's looking at latency on the network interface, latency on the disk interface. Also each client that's bringing the movies also phoning back to the back plane and telling it how good quality of bits is getting from its server. So between all those metrics and all that monitoring, the back plane starts to make decisions and says, you know, it'll see that the server is starting to get overloaded that it's starting to drop packets or its service latency is getting too high, and it'll start to shift away from that server and try and rebalance away from it. Now you mentioned that you have a bunch of spindles on the machine and you have some SSDs, what do you use the different technologies for? So the spindles are all three and four terabyte drives and you know, obviously as hard drives get bigger, we'll be adopting those to get as much storage as we can. And that's really for holding the majority of our content. The SSDs are really for gain more IOPS out of it. So the SSDs really just mirror what's on the hard drives, but they mirror, you know, the super hot content, the really popular stuff that we expect to have a lot of streams going off of them once. And for that level of decision inside the box of spinning rust versus, you know, flash disk, what is that also like predetermined ahead of time and placed onto there or does the box actually make a decision? I'm going to start moving this up to SSD. The box, when it gets its manifest from the backplane of what movies to get that that manifest is also sorted by popularity. So the box will take a look at how much space it has on the SSD skin because once it's getting an SSD might fail too. The box will take a look at what how much space it has and it'll manually shift stuff over based on what sees the manifest. So we talked about the hardware in here. You got two 10 gig, you got a bunch of spindles, you got some SSD and it's being told what to have locally it's being pushed to it by the backplane. What software does the actual appliance run? So our operating system is previous D and we use a web server called engine X to push out all the bits. All of our streaming is actually HTTP streaming. It's not a special protocol and we just use pretty much a standard web server. We chose engine X. So why do you use engine X? Why not the conical Apache? Engine X is well known for being very lightweight. It doesn't have all the baggage that Apache has grown over the years, but it's also very asynchronous. We can have thousands of active requests going on it once without having to have a thousand of thousand threads back in those requests. So that was really the big thing. You know, we knew that we'd have tens of thousands of clients at once and since it's all static content, also, we didn't need to have all the fancy Apache modules to do it. So it really made sense from that economy. All right, now you mentioned earlier too that a machine looks at its manifest and whatnot. So let me dive a little deeper on that. How does what do you ship to an ISP? Do you ship something preloaded or do you ship basically an empty box with some kind of unique identifier and when it phones home, you know, the back plane says, Oh, here's the, you know, 10 percent of the database you're supposed to have or or these 10 things are going to, you know, what's the protocol for bringing up a new machine once it's physically co-loaded? We really just pretty much ship it to the ISP. We rely on them to install it. We try and make everything as easy to install as possible. It's just a matter of plugging in power, plugging in the network and then bringing up a login console locally to set the set things like the IP address and routing parameters. And then once it's up, yeah, the machine phones home and and we configure it with its Netflix cash ID and then it starts filling. One thing to keep in mind, though, is that this isn't just going to ISPs. This is also going into major data centers too. And for the major data center stuff, we're actually the ones that are that own those machines and configure them. And, you know, we've talked a little about trying to preload those. But yeah, for the small ISPs, it's very much we ask ISP just to unbox it, put it in a rack and plug it in and and they're done. Now, when you say major data center, what exactly by that do you mean like a time warner like they serve an ISP of a 10 state area or what? You know, we still can consider those be ISPs. Well, what we mean by major data center is the major network concentration ports points around the country. You know, the old May, May West, May East in San Jose and in New York and, you know, data centers where where a lot of bandwidth comes together and and aggregate aggregates and go back goes back out. So we're actually buying rack space in those data centers and those big exchange points and building huge clusters of machines that we own. We operate ourselves and then we are offering a hearing out to ISPs, ISPs that maybe are too small to justify us giving them a box or ISPs that for whatever reason don't want a box. So we basically partner the bandwidth between us at those pairing points and those those other ISPs and control everything ourselves. So what kind of real savings are we talking about here to these, you know, these last mile network providers? What's one of these boxes really kind of equating around a country? Um, we offer the boxes for free to the ISPs that we give them to so it costs them nothing. And we tell them that, you know, with an embedded box in their data center, they could save 80% of their traffic or if they have multiple boxes, maybe 90% of their traffic. And when you talk about, you know, us being a third of their overall traffic, that's a significant amount of savings for them for the other kinds of installations where where the machine is is that one of our data centers are one of our racks and we peer to them once again, it's a pretty similar savings in that, you know, we're we're we're partnering on the bandwidth costs between us and them. So the bits that we send down the bits that they ask for from us pretty much even out in what's going to get a very significant savings from partnering with us. Okay, so what about the challenge of you went from having no servers, no data centers, everything's hosted by a third party, it's really just your, you know, your sauce and your licensing deals to you have machines you own around the country in multiple places and then you have machines that you're kind of responsible for for security updates and everything else hosted in third party places you don't have access to what's been the challenges and everything else of dealing with this. So, um, big challenge when it comes to especially those those embedded caches that we don't own that we get to the ISPs is making sure that we have some sort of a reliable back channels that the machine does go down, we can get into it. And that's actually been been a big topic recently that we've been working on is is is making sure that if the network link goes down that the other link is available on the second port or if a ISP only has one port hooked up, they only want one 10 gig port hooked up trying to figure out what we do with that if that one link goes down. So that is a bit of a challenge. So part of our job is to make machines as reliable as possible that if it does crash it comes back up. But obviously we don't want to crash at all. So I think so that's why we do have an operations team that is monitoring the performance of all machines all the time so that we can be predictive about when a machine might have problems so we can take it off the line for that ISP or we can alert them and tell them that, you know, we need them to do something for us. Now, how distributed is the control of of your machines? Let's say here's a fictitious but probably not unrealistic scenario. Some ISP or data center has two or three or more of your caching boxes there. And for some reason, let's say the ISP goes off the air, right? They lose their general internet connectivity and your machines lose connectivity to the mothership. Can they still function? Do they coordinate between each other at all? These kinds of things. They can't actually. They really need to be able to talk to the back plane all the time. And if they do lose connectivity to the back plane, if they was connected to the back plane then presumably the whole ISP has lost connectivity. You know, there's been some sort of network outage. And which means that if the clients can't talk to the back plane then they're not going to be getting the decryption keys to watch the movies anyway. So it's kind of a kind of a done deal. If you're watching a movie and in the middle of watching that movie the back plane goes down I think you can still watch to the end of the movie but you can't start anything new. So are the movies held on to cash or even in your, you know, you could say your original source for all the content. Is it one file per movie, per bit rate, per format? Or are they like broken in multiple pieces? Like we're talking about very large files. We're talking about very small files. We're talking about what really happens here. So the movies each video bit rate has a different format or it has a different file, each video format has a different file. So your iPod and your Android tablets will have one certain kind of format that's different from your PCs or your PS3s or your Wii. And a lot of that's driven by technology of those different platforms. And some of it relates to the bit rate. So what it comes down to is that each movie might have as many as seven or 10 different video files associated with it and your client picks the one that's most appropriate for it. The audio also gets stored as a separate file. And things like previews, if you want to scan ahead and see screen captures of upscaling ahead or scan behind, those will be stored as a different file also. Okay, so then it sounds like the content cash is, you know, as you put those out, those are kind of infinitely scalable as long as there's enough network bandwidth there. You just keep plopping down more, you've got more capacity. But everything's got to be talking back to his back plane, I assume the back plane's got to do the check to see can you actually ask this and everything else. Is the back plane actually the thing that you have to worry about scaling more than anything else? Scaling not so much because the great thing about Amazon is that we can always put more instances of machines online. And our protocol has always been designed to be very scalable in that respect. So, you know, as our customer base grows, we just put more instances of back plane machines online in AWS and everything works. Where we do have problems is when we have outages from Amazon and that's something that, you know, it happened right before Christmas, they happened a few months prior. That's something that we've become very sensitive to and we're working to build more resiliency into that by basically working better with Amazon to make sure that doesn't happen. So you've got all these machines all around. What software do you use to manage them? Just to manage, you know, putting images out there and just, you know, the day-to-day system administration of things, do you use any popular software for that or is this all a homegrown solution? It's all very much homegrown. When it comes to like data analysis and visualization, there are some packages that we use that, I apologize, I can't think of off name at the top of my head, but that's just not my area. But when it comes to like deploying new images or upgrading an OS, that's all homegrown tools. Okay, so is there more plans? You mentioned you were working on a second box. You don't have a whole lot to say about that, but what are the future things for trying to improve the Netflix streaming service, this ultra-high HD thing that will, I guess, require one of these caches? Am I correct the way I was reading that? That is correct. So our plan is always to get bigger and do it more efficiently. So as we talked about at the beginning, all the streaming has come originally from the content delivery networks, the places like Akamai level three that catch a lot of data from places like Google or Yahoo or Facebook, they also catch our data. So we are actually starting to move off of those, not just to help all the smaller ISPs but also to help our scalability. We're starting to move off of those and the work for the foreseeable future is gonna be making our own network faster, more reliable, cheaper, all that kind of stuff. So the super high HD, how is that gonna kind of impact these caches, either the amount of content they can hold or how many people they can service or like network connections? Yeah, that's pretty much, you touched on it right there with the question, is that the video files are gonna get bigger. So in the long run as we re-encode, now as we re-encode, we're also using the H265 codec now, so that H264, so that's giving us better compression with better quality at the same time. So going to the super high depth isn't quite as much of a bandwidth hog as it might seem, but it still does increase our bandwidth overhead. So yeah, over time, one cache may only hold 8% of our library instead of 10% or even less than that. So we've got to be able to scale to have more machines out there to serve the same customer base. And likewise, higher bandwidth, we're looking to be able to expand beyond two ports on a box to three or four ports on a box and have that one box be able to serve, you know, maybe similar customers by higher bandwidth or give and take along those lines. Here's a random question that I like to ask a lot of software developers and it's purely a curiosity question on my part since I'm a software developer myself. You guys have developed a lot of this homegrown software for the control plane and for the sysadmin and all these kinds of things. What version control system do you guys use to maintain all your stuff and why? I just love to hear people's different answers to this. So the main system that we use in Netflix is Perforce and also I can't tell you why that it was, it predates me by quite a long time but that's what's used for all our clients and all our control software and pretty much 99% of the business. For us in Open Connect and the software that actually goes into our cash boxes we're using Subversion. And the reason that we're using Subversion is that once again we're based off of FreeBSD and FreeBSD uses Subversion as its native source control. So it makes it, for us to use it too makes it very easy to integrate and be able to pull upstream changes from FreeBSD.org. Can you touch a little bit on your open source involvement? I mean, you guys are using BSD, you were a BSD release maintainer. Netflix has released a number of their in-house tools. Can you touch on that kind of like your philosophy and how you interact with like the BSD project as you fix network driver bugs? It's been a very, very good relationship. And I have a lot of history. We've hired a number of people at Netflix to have history with BSD. So we're not reaching out to BSD so much as we're just integrating with the community. We try and push as many of our changes out to the community as we can. And it makes our life easier to do that. We don't want very huge patch sets to our code base that become hard and hard to maintain over time. We want to get everything out. We really have no secret sauce other than like our control point stuff, but that's not part of the OS. So as far as the OS goes, we push out as much as we can and in return we get a lot of involvement from the community, both people who are interested and motivated to help us, but also we can find problems that are affecting the community and work together with the community to get those problems solved. So what's the most popular content being served out there? I'm sure you guys are these heavy duty data science people and maybe we should get some of your data science people to talk about that aspect of Netflix. What's the most popular thing? Is it the thing that's the most expensive for you to license or is it something strange and odd? Well, you know, obviously blockbusters are always popular, but really what's driving us is My Little Pony and SpongeBob because that stuff is easy to license. It's cheap to license and kids love it. And yeah, during the day we get a lot of traffic from kids watching My Little Pony or adults watching My Little Pony. Okay Scott, well thank you very much for your time. This was very informative, a little outside and normal area, but I think people are gonna like this a lot, especially anybody who's looking at how you would manage an extremely large infrastructure. Thanks again for your time and this will be up soon. All right, thank you very much. Thanks Scott. You're welcome, have a good day.