 Live from Midtown Manhattan, it's theCUBE, covering Big Data, New York City, 2017, brought to you by SiliconANGLE Media and its ecosystem sponsors. Okay, welcome back, everyone, live in Manhattan, New York City and Hell's Kitchen. This is theCUBE's special event, our annual CUBE Wikibon research, Big Data event in Manhattan, alongside Strata Hadoop, formerly Hadoop World, now called Strata Data as the world continues. This is our annual event. We are 50 or here, sixth overall, when I kind of move from uptown. I'm John Furrier, the co-host of theCUBE with Peter Burris, head of research at SiliconANGLE and GM of Wikibon Research. Our next guest is Tim Smith, who's the SVP of technical operations at AppNexus. Technical operations for large scale is an understatement, but before we get going, Tim, just talk about what AppNexus is company, what you guys do, what's the core business? Sure, AppNexus is the second largest digital advertising marketplace after Google. We're an internet technology company that harnessed, we harnessed data and machine learning to power the companies that comprise the open internet. We began by building a powerful technology platform in which we embedded core capabilities, tools and features, with me so far. Yeah, we got it. Okay, on top of that platform, we built a core suite of cloud-based enterprise products that enabled the buying and selling of digital advertising in a scaled, transparent and low-cost marketplace, where other companies can transact either using our enterprise products or those offered by other companies. If you wanna hear a little about the daily peaks, peak feeds and speeds, it is strata, we should probably talk about that. We do about 11.8 billion impressions transacted. On a daily basis, each of those is a real-time auction conducted in a fraction of a second, well under half a second. We do, we see about 225 billion impressions per day and we handle about five million queries per second at peak load. We produce about 150 terabytes of data each day and we move about 400 gigabits into and out of the internet at peak. All those numbers are daily peaks. Make sense? Yep. Okay, so by way of comparison, which might be useful for people, I believe the NYSE currently does roughly two million trades per day. So if we round that up to three million trades a day and assume the NYSE were to conduct that volume every single day of the year, seven days a week, 365 days a year, that'd be about a billion trades a year. Similarly, I believe Visa did about 28.5 billion transactions in their fiscal third quarter. I'll round that up to 30 billion and average it out to about 333 million transactions per day and annualize it to about four billion transactions per year. A little bit of math, but as I mentioned, AppNexus does in excess of 10 billion transactions per day and so it seems reasonable to say that AppNexus does roughly 10 times the transaction volume in one day and then the NYSE does in a year and similarly it seems reasonable to say that AppNexus daily does more than two times the transaction volume that Visa does in a year. Obviously, these are all just very rough numbers based on publicly available information about the NYSE and Visa and both the NYSE and Visa do far, far more volume than AppNexus when measured in terms of dollars. So, given our volumes, it's imperative that AppNexus does each transaction with the maximum efficiency and lowest reasonable possible cost and that is one of the most challenging aspects of my job. So, thanks for spending the time to give the overview. There's a lot of data. I mean, 10 billion a day is massive volume. I mean, the internet, you see the scale. Yep, it's insane. We're in a new era right now of web scale. We've seen it through Facebook and it's just enormous. It's only going to get bigger, right? So, on the online ad tech, you guys essentially doing like a Google model. That's not everything, but Google which is still huge numbers. We include Microsoft and everybody else. Really heavy lifting IT like situation. What's the environment like? And just talk about what's it like for you guys because you've got a lot of ops. I mean, in terms of DevOps, you can't break anything because that 10 billion transaction has a significant impact. So, you have to have everything buttoned up, super tight, yet you got to innovate and grow with the future growth. What's the IT environment like? It's interesting. We have about 8,000 servers on, spread across about seven data centers on three continents and we run, as you mentioned, around the clock. There's no closing bell. Downtime is not acceptable. So, when you look at our environment, you're talking about four major categories of server complexes. We have real-time processing, which is the actual ad serving. We have a data pipeline, which is what we call our big data environment. We also have client facing environment and an infrastructure environment. So, we use a lot of different tools and applications, but I think the most relevant ones to this discussion are Hadoop and its friends, HDFS, Hive and Spark. And then we use the Vertica analytics platform. And together, Hadoop and its friends in Vertica compromise our entire data pipeline. They're both very disc-intensive. They're cluster-based applications. And it's a lot of challenge to keep them up and running. So, what are some of those challenges? Just trill down a little bit and just explain, because you also have a lot of opportunity. I mean, it's money flowing through the air, basically, on digital air, if you will. I mean, they got a lot of stuff happening. Take us through the challenges. You know, our biggest apps are all clustered and all of our clusters are built of commodity servers, just like a lot of other environments. The big data app clusters traditionally have had internal disks, while almost all of our other servers are very light on disk. One of the biggest challenges is, since the server's the fundamental building block of a cluster, then regardless of whether you need more compute or more storage, you always have to add more servers to get it. That really limits flexibility and creates a lot of inefficiencies. And I really, really am obsessive about reducing and eliminating inefficiencies. So, with me so far. Yep. Great. The inefficiencies result from two major factors. First, not all workloads require the same ratio of compute to storage. Some workloads are more compute-intensive, and others are really less independent on storage, while other workloads require a lot more storage. So, we have to use standard server configurations and as a result, we wind up with underutilized compute and storage. This is undesirable, it's inefficient. Yet, given our scale, we have to use standardized configurations. So, that's the first big challenge. The second is the compute-to-disk ratio. It's generally fixed when you buy the servers. Yes, we can certainly add more disks in the field. But that's labor-intensive, and it's complicated from a logistics and an asset management standpoint. And you're fundamentally limited by the number of disk slots in the server. So, now you're right back into the trap of more storage requires, more servers, regardless of whether you need more compute or not, and then you compound the inefficiencies. Could you just move the resources from, unused resources from one cluster to the other? I've been asked that a lot, and no, it's just not that simple. Each application cluster becomes a silo due to its configuration of storage and compute. This means you just can't move servers from clusters because the clusters are optimized for the workloads, and the fact that you can't move resources from one cluster to another, it's more inefficiencies. And then they're compounded over time since workloads change, and the ideal ratio of compute-to-storage changes, and the end result is unused resources trapped in silos and configurations that are no longer optimized for your workload. And there's only really one solution that we've been able to find, and to paraphrase an orator far more talented than I am, namely Ronald Reagan. We need to open this gate, tear down these silos. The silos just have to go away. They fundamentally limit flexibility and efficiency. What were some of the other issues caused by using servers with internal drives? You have more maintenance. You've got to deal with the logistics, but the biggest problem is servers and storage have significantly different life cycles. Servers typically have a three-year life cycle before their obsolete. Storage typically is four to six years. You can sometimes stretch that a little further. With the storage inside the servers that are replaced every three years, we end up replacing storage before the end of its effective lifetime. That's inefficient. Further, since the storage is inside the servers, we have to do massive data migrations when we replace servers. Migrations, they're time-consuming, they're logistically difficult, and they're high-risk. So how did DriveScale help you guys? Because you guys certainly have a challenging environment. You laid out the story, appreciate that. How did DriveScale help you with the challenges? Well, what we really wanted to do was disaggregate storage from servers. And DriveScale enables us to do that. Disaggregating resources is a new term in the industry, but I think a lot of people are focusing on it. I can explain it if you think that would make sense. What do you mean by disaggregating resources? Can you explain that now it works? Sure. So instead of buying servers with internal drives, we now buy diskless servers with JBODs. And DriveScale lets us easily compose servers with whatever amount of disk storage we need from the server resource pool and the disk resource pool and their separate pools. This means we have the right balance of compute and storage for each workload, and we can easily adjust it over time. And all of this is done via software. So it's easy to do with a GUI, or in our case at our scale, scripting, and it's done on demand, and it's much more efficient. How does it help you with the underutilized resource challenge you mentioned earlier? Well, since we can add or remove resources from each cluster, we can manage exactly how much compute power and storage is deployed for each workload. Since this is all done via software, it can be done quickly and easily. We don't have to send a technician into a data center to physically swap drives, add drives, move drives. It's all done via software, and it's very, very efficient. Can you move resources between silos? Well, yes and no. First off, our goal is no more silos. That said, we still have clusters, and once we completely migrate to DriveScale, all of our compute and storage resources will be consolidated into just a few common pools, and disk storage will no longer differentiate pools. Thus, we have fewer pools. For more, we have fewer pools and can use the resources in each pool for more workloads, and when our needs change, and they always do, we can reallocate resources as needed. What are the lifecycle management challenges? How do you guys address that? Well, that's address with DriveScale. The compute storage, the compute and the storage are now disaggregated or separated into disk servers and JBODs, so we can upgrade one without touching the other. We want to upgrade servers to take advantage of new processors or new memory architectures. We just replace the servers, recombine the disks with the new servers, and we're back up and operating. It saves the cost of buying new disks when we don't need to, and it also simplifies logistics and reduces risk as we no longer have to run the old plant and the new plant concurrently and do a complicated data migration. What about this qualifying server and storage vendors? Do you still do that, or has that impact? We actually don't have to do it. We're still using the same server vendor. We've used Dell for many, many years. We continue to use them. We are using them for storage, and there was no real work. We just had to add DriveScale into the mix. What's it like working with DriveScale? They're really wonderful to work with. They have a really seasoned team. They were at Sun Microsystems in Cisco. They built some of the really foundational products that changed the internet that the internet was built on. They're really talented. They're really bright, and they're really focused on customer success. Great story. Thanks for sharing that. My final question for you is, you guys have a very big, awesome environment. You get a lot of scale there. It's great for startup to get into an environment like this because, one, they could get access to the data, work with a good team like you have. What's it like working with a startup? You know, it's always challenging at first. Too many things to do. They got talented guys. I mean, mostly the startups, as early as they got out there, A players out there. They have their A players, and we've been very pleased working with them. We're dealing with the top talent, some of the top talent in the industry that created the industry. They have a proven track record. We really don't have any concerns. We know they're committed to our success, and they have a great team and great investors. Final, final question for your friends out there that are watching and other practitioners who are trying to run things at scale with the cloud. What's your advice to them? You've been operating at scale. You've got a lot of billions of transactions. I mean, huge, the only to get bigger. What's the, what's your IT friend and the advice hat on? What's the mindset of operators out there, technical ops, as DevOps comes in, seeing a lot of that track? What do people need to be thinking about to run at scale? There's no magic silver bullet. There's no magic answers. The public cloud is very helpful in a lot of ways, but you really have to think hard about your economics. You have to think about your scale. You just have to be sure that you're going into each decision knowing that you've looked at the costs and the benefits, the performance, the risks, and you don't expect there to be simple answers. Yeah, there's no magic beans, as they say. You know, you got to make it work for the business. No magic beans. I wish there were. Tim, thanks so much for the story. Appreciate the commentaries. Live coverage at Big Data NYC's theCUBE. Back with more after this short break.