 I'm the Chief Technology Officer from Teradata. And I'll be speaking today with Jesse Lynch, who's the Director of Business Solutions at T-Mobile. And what we're going to do is talk to you about actual use of big data solutions and the technologies that we believe are going to be important for sort of maximizing the value in these solutions going forward. So what I'm going to do is talk a bit about sort of give the overall context in terms of sort of what's going on in the industry from our perspective and identify, you know, three key technologies that need to be delivered to the marketplace in order to sort of bring these capabilities to fruition. And then I'll turn it over to Jesse, and he's going to talk about the T-Mobile strategy and deployment using some of the technologies that they've had working in their labs and environment to deliver that value. So when we look at the growth of volume and data, we talk a lot about this concept of big data. And people focus on the bigness of the data. And I'm going to kind of take us away from that focus a bit, because I think the bigness of the data is actually the easy part. I think the interesting part is it's a different kind of data. That's the interesting part. Because historically, what we've done in analytics is analyze the transaction level detail, we called it. But in fact, transactions are not really the detail. We want to go to subatomic level detail. So for example, if I have a call detail record, we call it a call detail record. But a call detail record is actually a summary record for the billing system of lots and lots of things that happened on the network in order to get you there. And we look at interaction data, the nature of the interaction data is quite different than the stuff that comes out of billing systems and ERP systems and those kinds of things. The structure of the data is fundamentally different as well. And so the new vocabulary word that I want you to take away from today's discussion is spime. Now, those of you who read a lot of science fiction, you may know this term already. It's an object aware of space and time. And in science fiction, you talked a lot about this concept. And I would argue that today, it's not science fiction anymore, it's reality. In fact, I can see by looking at you that everybody in this room is a spime. Because you are all holding mobile devices that allow me to track your location as of any point in time. They're not all T-mobile devices, I noticed, so we need to change that. We're working on that. They should be. They should be. The ability to track parts in the supply chain, items in the retail store shelf, assets like trucks and trains and so on, this is going to proliferate data everywhere. And this is really what I would call the third phase of big data analytics. And we haven't really fully reached that yet. But that's where we're going. And so when we look at evolution of big data analytics, first wave was web analytics. Get the click stream data. Get the search data. Do lots of interesting analysis and so on. That's a pretty old story here in the valley. Web 2.0, that's mostly where people are today. Get social media data, blog data, Twitter data, and so on. So to get the text, do the natural language processing, extract facts, dimension, sentiment scoring, do the analysis on that. People are trying to figure out how to do it and how to monetize it and so on. Pretty interesting. That's kind of where we are now. And some of the examples Jesse's going to talk about are sort of use of text in a kind of semi-structured form for doing interesting analysis. That next phase, SPIME says that we're going to be tracking all objects. Everything is a SPIME. Every vehicle, every person, every part in the discipline, all SPIMES. Some of you may already know this. I know I have some friends here in the valley who chip their pet, put a little chip back behind the ear of their dogs and go online. You can see where your dog is at any point in time. When in Hong Kong, they actually do that to their kids. Now, not with a chip behind the ear, but they actually put a bracelet on and you can go online and you can track where your kids are. Junior was supposed to be out of school at 3 PM. Why isn't he home yet? So a huge amount of data, orders of magnitude higher volume, but it's also a different kind of data. So what I want to do is focus on three key technologies, which I think are going to be critical for big data exploitation. The first is multi-temperature data management. If we're talking about big data, we're talking about high volumes of data of many different types, and we need to make it economic to exploit this data. I hear a lot in the industry right now about in-memory databases. Put all your data in memory and performance is done. That's great if you're selling hardware, but I would argue that anybody who says big data and in-memory databases in the same sentence doesn't understand what they're talking about. Economically, it's not reasonable. So yes, we like high performance, but we also want rational economics. And multi-temperature data management is a very important technology for delivering that. So part one. Part two is the idea of a polymorphic file system. Polymorphism, it's a funny word. It means that you can change shapes to adapt to new situations. Big data is not always square data. It doesn't always fit in a table. It doesn't always fit in a relational database. It's not always in record form. And so a polymorphic file system allows you to store objects in a form that is appropriate for that object in terms of access, in terms of the computation, in terms of the framework for programming to get access to the information. Key technology. There's a lot of vendors just over in the other world that are doing a lot of interesting things in this area of non-relational data. Key concept of this conference. And the third one, which I think is not nearly as well understood as it needs to be in this big data space, is the idea of late binding. So in the big data, we talk a lot about high-volume data. That's actually less interesting to me than this late binding idea. So in traditional data warehousing, we put the data into the database using these ETL processes. We extract from the source systems. We transform the data. And then we load it into a relational database. We are defining the structure of the data at load time. But when we think about big data and the rate of change of the structure of the data, that structure is changing all the time. And so very often, for a data scientist, I don't want to be bound by the structure determined at load time for the data. I want to add the structure at query time. So the idea of late binding is we defer overlaying the structure of the data at the time of query. And this is going to cause a radical shift of power in the BI space in the marketplace. And it hasn't happened yet because the vendors haven't really fully figured it out. So the BI tools that are popular in the marketplace, most of them kind of depend on the data being highly structured in these relational forms. They do all this mapping. They generate SQL and so on. That model is going to be blown up. And the ETL vendors who generally know how to transform data, they're going to have to move upstream and be part of the data access, not just part of the data loading. But I don't think they've all figured it out yet. Some have sort of gotten the idea. But they don't really know how to build the tools to provide end users access to the data. So the BI tools know how to present the data and generate SQL, but they don't know how to sort of do this late binding thing. And the ETL tools know how to present data. There's going to be a very interesting land grab that goes on between these vendors over the next several years as they figure that out. So these are the three concepts that I want to focus on here. So this multi-temperature data, we can talk about the trends very clearly around data acquisition. Want more data and want it faster. But the important thing here is the appetite for data outpaces Moore's law. You'll often hear the discussion that memory is getting cheaper by roughly 30% every 18 months. And that compounds, it builds on itself, right? Cheaper and cheaper, cheaper. And therefore, it'll be cheap enough to put all your data in memory and performance will be great. This is what I call marketing math. See, marketing math, you have only one side of the equation. That's what they tell you. I'm an engineer. In engineering, there's always two sides of the equation. You've got to solve for the variable. What's the other side of the equation? How fast is your data growing? So do you believe that your data is growing faster than 30% every 18 months? Now if you only understand ERP data, if you only understand how many customers, how many accounts, how many orders I got, probably that's not growing faster than 30%. But if you look at the interaction data, if you look at spying data, even if you look at clickstream data, if you look at social media, if you look at all these things, any sort of reasonable analyst, Gartner, Forrester, IDC, any of the guys that study this stuff, they will tell you 50% every 12 months. So the point is, data grows faster than memory gets cheaper. So it's economically irrational to say, I'm going to put all my data in memory unless you're going to restrict yourself only to the boring relational data that we've been working on forever. You're trying to get analytic excellence, that's really not what you want to do. And moreover, if you look at access patterns to the data, so this is captured from an analytic environment for a company that has over a petabyte of data, so this is on some toy academic benchmark thing. This is real in production, excess of a petabyte of data, very complex workload with 1,000 users hitting it. Notice, over 90% of the IOs is to less than 20% of the data. And this happens to be in telecommunications, but the reality is almost any industry will have similar types of characteristics. So this is what I'm going to call the hot data. The data we access very, very frequently, that's the hot data. This data, absolutely. I want it all in memory. At least it should be in flash memory if not DRAM. This data over here, memory prices at 50 times the price per terabyte versus electromechanical disk drive doesn't make any sense at all. I want the lowest cost per terabyte over here. So the idea here is with multi-temperature data management that you've got a range of storage devices at different price performance characteristics. So for the hot data, I want the lowest price per terabyte, but actually that's going to be less than 20% of my storage requirement. So I'm willing to pay a higher price per terabyte because I get a lower price per IO. But this data over here, I want the lowest price per terabyte. And some of the interesting open source projects like Hadoop understand that very well. Yes, we have memory built into those servers, but we also have a vast disk farm so we can store the data in a very cost-effective way. And so the key point here is not can you configure hardware that way. Any competent hardware vendor or engineer can configure hardware with a storage hierarchy. The hard part is the multi-temperature data management should not have to be done by a human. It should not be an army of DBAs doing this. Now when you look at the performance characteristics and the advantages and price and performance of doing this, it's very, very high. So you might be willing to send your best DBA away for two weeks. And she would analyze the access patterns and understand all the data and how it's being done. And then come back and map it to the devices. And it would be worth it, except by the time she gets back, she's wrong. Because the data changes temperature all the time. It's not static. It's very dynamic. And it's not as simple as the older data is colder. That's kind of obvious analogy people make. But it's actually much more complicated than that when you look at the real world. So multi-temperature data management means automatically no hands, no humans. Software is measuring the access patterns, predicting access patterns, and getting the data to the right place. So that's the first key technology. Second key technology is around this idea of the polymorphic file system. When companies build traditional data warehouses, they tend to focus on this stuff up here. Getting the data from a relational ERP system or a comma-delimited file or a fixed file format may be a legacy database like IDMS or IMS. But even those, we can force our way into a record-oriented format. But as soon as we get into this model of getting XML data, or weblog days, or rich media, they say, oh, I don't know what to do. Too hard. In fact, this term that's often used in the industry, unstructured data, I hate this term. There is no such thing as unstructured data. Unstructured data is a random series of zeros and ones that nobody cares about. Unstructured data is a term that relational database bigots use to describe data they don't understand. And there's a lot of value in this data down here that's going to waste. So there's a bank that I work with in North America. And when you call up the bank's service center, there's a voice that says, this call will be recorded for training and other purposes. Other is a huge word. So what does the average bank do? They store the voice call in case the customer complains later on. I said to transfer $1,000. You transferred $10,000. We can go back and listen to it and figure out who was right and so on. That's not really very high value. It's certainly not analytically interesting. This bank, they take that voice recording. They do voice-to-text translation. This technology is easily available today. And by the way, multiple languages. And then they do fact extraction, dimension extraction, sentiment scoring out of this voice. And they can predict defection. They can predict cross-selling opportunities and so on. All from these voice recordings, this rich media that the traditional relational database biggest, they have no idea what to do with it. Huge value if you can harness that. Just as an example. Here's another example. Social network analysis, more relevant to the T-Mobile case, where I want to be able to understand, within all my subscribers, who are the most valuable subscribers and list valuable subscribers, not by their revenue, but their power of influence within the social network. So yeah, you hear, of course, LinkedIn and Facebook. They do this kind of stuff as well. But it's not just for the social media companies. Any company where you can understand the relationship of your customers and use that to predict behavior and influence it is very, very powerful. But this is graph processing. All these metrics over here are probably terms that most of you haven't heard of, things like centrality and betweenness and closeness. These are standard metrics that you can compute using graph theory very powerfully. Now, can I compute these metrics using SQL processing on a relational database? Well, sure I can. Everything's touring complete. If I'm good enough SQL programmer, I can sort of wedge the data in and wedge the SQL and do the right thing. But it's hard. Imagine that I have the native ability to store a graph and do graph processing on it, completely in parallel, no hands. So the idea of the polymorphic file system is, sure, we can store relational data. We can put it in a column format and a row format. That's easy stuff. Any reasonable relational database can do that. But the ability to store graphs or key value pairs or text or XML or JSON or whatever it is that you need, we should be able to store that in and coexist multiple of those data types, because it's not like you store all graph and nothing else. You want to be able to store different kinds of data types each in the form that makes sense for the kind of processing that you want to do and the representation that makes sense for that data. The natural representation gives you more efficiency and more productivity in doing your analytics. Now once I've got all this non-traditional data so I can do my click streams and key value pairs and my social network connections and a graph, et cetera, et cetera, now I want to process it. And I don't want to be limited to SQL. Sure, SQL is interesting. This is a no SQL conference. There's a reason that you're all here. SQL doesn't solve all the problems. It solves some interesting problems, but certainly not all of them. And so we need to go beyond that. MapReduce is clearly the most well-known of the beyond SQL paradigms, but graph processing and all kinds of other models are very interesting. Time series analysis, one of the most common kinds of analyses that you want to do is not very well suited for SQL processing. SQL is a set processing language. Mathematically sets have no ordering, so it's really hard to do time series analysis without some pretty ugly self-joins and multiple paths of the data and really ugly SQL coding. In MapReduce, I can, in a single pass of the data, do this kind of time series analysis very efficiently and very easy to express. The kind of graph processing we're talking about for social network analysis, deep data mining, text analysis, we need to go beyond the constraints of traditional programming languages. And yet we need to have them fully parallelized. And we do not want to assume that you're sucking everything into memory because that's not a scalable, from an economic point of view, solution. And I want to make these capabilities available to data scientists who do not have to be computer scientists. It's very important. A data scientist should not have to have a PhD in computer science from Stanford University to get the value out of the data. The fact that in the valley, there are so many data scientists who are computer scientists, this is a shortcoming of the tools that they have to work with. When you get outside the valley, when you go to companies that are not .com companies, banks and retailers and telcos and so on, they don't hire Stanford University PhDs for data scientists, especially not from computer science department. So we need to change the paradigm there. And so when we look at what we need, clearly a parallel data store with very good scalability. And we need to be able to store relational data and non-relational data, not be constrained just to square data in rows and columns. We need to be able to process that data. We need to be constrained not by SQL but use multiple programming languages, multiple models of access to data. Now, oftentimes a conference like NoSQL conference, it's NoSQL, SQL, that's for dinosaurs, relational databases, put that over there in the corner with the cobalt programmers. We're doing new fun stuff now, NoSQL, Hadoop, et cetera. This kind of, I'll say, religious extremism is actually not very smart. I'm an engineer by training. Whenever you see religious extremism, you know they're both being stupid. Good engineer, no pride, no shame. We steal good ideas from whoever has them. So I want to use the right tool for the problem that I'm solving. That means I want the choice to use SQL when it's the right thing. But when it's not the right thing, I want the choice to use other stuff. So, antsy standard SQL? Absolutely. But graph capability built in, text capability built in, map-produced capability built in, I want all those things in addition. So no SQL doesn't mean never SQL. It means not only SQL. Sometimes yes, sometimes no, depending on the problem that I'm solving. Up here is the idea that I shouldn't have to reinvent everything myself. If I'm a data scientist, I don't want a solution because a solution implies that you know the question. Data scientist's job is to find the questions. We're looking for patterns or relationships in the data. So I don't want a solution. But I want a toolkit. I shouldn't have to reinvent the Levenstein minimum edit distance algorithm. Just give it to me. Just give me the Bayesian statistical models. Just give me the text parsing and give me those things and let me put them together in interesting ways. And allow leverage of the open source community to build these things out and add to that library that we can use and share and increase the value and decrease the time it takes to get that value. So that's the vision from a technology perspective, what we need to do. Now I'm going to turn it over to Jesse, he's going to talk to you about the journey that we're on at T-Mobile. Thanks a lot, Steven. So that's a tough act to follow right there. There's a lot of good information. I'm going to talk to Jesse Lynch. I actually am in the IT organization at T-Mobile responsible for all the solutions that we deliver for information management, BI, predictive analytics, you name it, we do it. And a lot of the things that Steven talked about we are struggling with or we're working through as our business starts to look at different ways they want to see data, different sorts of business models we need to tackle. And from an IT standpoint, we're trying to figure out what the right tools are. And I'm going to talk a little bit about some of our strategy and a POC that we've done to start us along our way in this journey. And I'll say right now, we're actually not the most tech savvy folks in the world. There have been a lot of people I've talked to here, very tech savvy. We're trying to find tool sets for us that can help us get into this space, but not have to have every detailed skill of Java and all the other tool sets that one needs to utilize. And so we did a POC here with a tool set that allowed us to use SQL and also get some value out of what we were doing, because we will probably never be in that space as a big telco. So real quickly, and I think people see this picture a lot and the term big data is a term that bothers me a little bit. But when I look at this, I do agree that our world is changing from we have more data than we had 10 years ago. We've got more variety of that data. I'm dealing with it every day in a telco space trying to figure out what to do with this data. It's coming at us faster if we so choose to ingest it. But we can't forget the fact that we've got to get value out of it. That's the fourth V that we sometimes don't see on these pictures. And we've got to have a goal that says we have even more data, in my opinion, to make a mistakes on or to misread than to get actual value out of. So as we think about this, when we look at our technology stacks and we think about how we deal with all this data, we're dipping our toe in that water and trying to figure out to make sure we can get some value along the way and don't lose the force for the trees as we get caught up in everything that's coming at us and doubling every two years, I think, is where that's going. So T-Mobile in general right now. So I don't know if people follow the news. We're certainly trying to shake things up from a telecom environment. I think we're doing a pretty good job. This year we've done quite a few things to make life hopefully easier for our customers. We've said, hey, we're dropping contracts. We're no longer going to make people have a contract. You can bring your own device. If it works on our network, great. We'll charge you a pretty low rate for that. And we won't lock you in that contract. And if we're not good by you, you can walk away. There's some of these sorts of things that we're doing out there. And what it's also driving us to do, though, is it's creating a little less sticky environment for our customers. And it's a little nerve-racking for us. We're taking some chances. And we think it's better for customers. But we better be pretty good than at understanding our customers and the experience that they're having with us because we may not have some of those hooks that keep people with us for a while in a contracted form. And if anybody's seen our CEO, he's very strong in this space. But he also is saying it's very important for us to make sure our experience for our customers, both from a network, a customer care, et cetera, is very strong. So there are pressures on us to start bringing data together in ways we haven't looked at it before, and make sure that we are creating the right experience for the customers. So we are embarking on journeys to bring data like this together into single places so that we can run analytics across those touch points that we have with our customers. And many of us, some of this data is stuff we've all dealt with for a long time. Some of it's pretty new, social. Some of these different channels we have. It's bringing new types of data to us and new ways we need to think about using it. And we do a very good job at siloed looking at this data, not a real good job looking at it holistically. And that's what we're trying to move toward as we go into the coming years. So along these ways, we did a POC where we started to say, we've got to start somewhere in this journey. And we've got some data that we've already tried to work with for years. And we thought, let's do a POC on this and start to look at some of the new technology that can help us. And what we started with was something called our account memo data. And if people have similar systems, do we do out there? There is a big, fat table that is stamped with all sorts of information on our front line systems every day, as interactions happen with our customers. And it's pretty ugly. And it's hard to work in that in the traditional tool sets that we have today. It's multiple terabytes worth of data. It has all sorts of information in it. And with this POC, we said we've got to figure out a better way to get nuggets of value out of this piece of data and see if we can prove some tool conversations out. This is sort of what it looks like. This is not all of the fields that are involved. There's also account information. There's some other screens we'll show that. But it's a classic. It's some structured data. It's some semi-structured data. It's got some stuff that our care reps can type into. A treasure trove of information. But for years, we have really struggled to look at it holistically and pull nuggets out of there to either say, our operations are having some real problems here, everybody. We should be changing what we're doing. Because some of this data tells us that. That's what we believe is within this data. And we just haven't been able to get to that. Or we've got some indicators in here of some things that could be happening to our customers that can be pretty bad situations. Or very positive. We're doing some good stuff and the data can tell us this. So this is what the data looks like. And when we went through this POC, we wanted to look at cases where there were actions that happened. Certain actions. Somebody activated. Somebody deactivated. Somebody poured it out. What kinds of steps and what kinds of data within this semi-structured piece of information can we pull out to make meaningful for our business and start to make some change? Once again, a POC, and it's something that we're developing more into a production state, but an area that we think was going to provide us a lot of value. As I said, we're really an informatic SQL, Teradata, business objects type of a shop. Nobody on my team knows Java. We're in our infancy in some of the new tool sets that exist out there that we know can help us in this case. And we turn to Teradata. We said, guys, what do you have for us? What can we use to help us, from an IT standpoint, a business standpoint, get at some of this data, access some of this data, and get some value out of it in a fairly easy way? And we started looking at their Aster platform. And it made sense for us. We've got information in Teradata already. We could pull some stuff over there. And we've got, from a business standpoint, quite a few SQL savvy folks on our business side. In fact, a lot, which is a good and bad thing from an IT standpoint. But a very smart group of folks out there. And so it was a good tool for us to work with and with some business partners to say, hey, you just have to know SQL. We'll walk you through what we need to do. Let's do a POSG together. So we did a couple of things. So we started to look at this data and go, how can we once again winnow down the data that's important? And all of our goals, as I said earlier, was there's a lot to need with a lot of data to make some mistakes. How do we narrow in on the data that's important? So one of the things we started talking about was creating sessions out of the data. Because this data comes in a time series format. And stamped into this thing. And we didn't know what kind of experience our customers having, or what kind of sessions they were having with our care apps or with our sales partners. And we decided to say, let's take a look at that. And let's do some work to say, let's take a look at sessions and those records that correlate for, say, a 60 minute time period and start to say, are we seeing anything in the data that can help us understand? Are we having challenges? Are we doing good things about it? And the nice part for us was we could call functions within this. That would take care of some of the map-reduced work and make it much more simple for us to work with this data and use this data and sessionize this data. If we try to do this with SQL, it'd be a much harder task. And we tried to do it in the past with much smaller amounts of data. It's much easier to do in this type of a format. And our business partners and our IT folks understand it pretty simply. We could just say, give us a session. Go through the IDs. Give us a time range. And let's start to see what comes out. Let's start to see how that data starts to prep more. So here's an example before we did some sessionization. Once again, plenty of good scrub detail here. But this is a customer who had some interactions with over the course of a June time period. And after we sessionized it, we realized, all right, this person's had four, what we would call, interactions with us. One could say there's 15 interactions. But really, there's been four interactions they've had with us, and one took quite a few steps to go through. And this is just an example. And we see it hundreds and hundreds of times over our database. And these are the types of things that we want to be able to narrow in and go, why did it take this many steps in order to do some sort of function? Luckily, a positive came out of this. I believe we added some new service to this person. But we had to take some steps in here. It took a while. We had to verify their information twice. We had to do some things in there that maybe not can be hurtful or bad from a customer standpoint. And we need to be able to analyze that and understand, is there anything operationally we should be doing different than we do today? And we've never really had the ability to look at this in a wide swap. And potentially, on an individual customer, not that big a deal. But if you're seeing that across thousands of customers, it starts to become worth it to go, hey, we're seeing a pattern. Let's make some changes here. And ultimately, something like this, and I like this kind of statement, it starts to give us what the effort of a customer is with us. I had to think about that for a little bit. Because we often have customers that come to us to go, I've been all day working with you guys, trying to figure this out and trying to do this, that. And our care reps or sales reps are going, I don't see it in my system. And I don't know what's going on. I see you called care a couple of times. But the reality is there are some steps through there. And it took 20 minutes to add a single sock to the account or to add a new product. And we're just looking at the memo data. It could be that if we brought web data into this or handset data or IVR, it could have been that they did a lot more things out there than just these 20 minutes. And it'd be really nice to know about it and really nice to know where we may have had some shortcomings or where we did something really well within that session. Like I said, this turned out well. Some of the sessions may not turn out so well. And we want to be able to hit through those. We then started to talk about, well, we've got sessions. So we've got cases where we can start to pick out certain sessions that may be anomalous or have certain characteristics that we want to dive deeper into. We can also then talk about end pathing. And this may be a very common thing for folks out here. But right away that you can start to look at, I have an action and I want to see what happened after that action. And what's the common paths that people took after that action? Or something happened. And I want to see the steps that happened before that action that led up to this situation. And it allows us to start to look at patterns once again and define what are we doing things right? Are we not doing things right? Are there things that we can improve in this process to make our experience better for our customers? And if people try this in SQL, and I think it was mentioned earlier, this is a pretty hard thing to do in SQL, to recurse back through the data and to create a pathing sequence across a large set of data. Using this function, using a MapReduce function, it was a lot simpler for us to have that end pathing done and to start to highlight where are some common paths or steps that our customers are taking, depending on how we're looking at the actions at the end or the beginning. And these are some of the things that we started to look at, right? Hey, you're about to go over your data usage. We send out messages to people when that happens. Is that a good experience? Do they not like the message? Do they call a care and complain? What happens when that happens? Hopefully they do what we want, which is they top up. Account closure and somebody porting out. This is churn. This is the big pain for all of us telcos. People leave us. What are the steps that led up to somebody leaving? And can we look at this data in a different way to find patterns than we've been able to look at before? And some of these people may be looking at it and going, well, this is the next common sense, Jesse. What are you talking about? It's pretty big data for us to deal with, and we haven't always done it on a large scale. And now that we're having different tools, I think we're going to have a better ability to be able to look into this and really start to understand our churn patterns better. And finally, new accounts. What happens when somebody signs up for a new account? How good are onboarding? Phones are getting complicated, if people have noticed. And when you get a new phone, there can be some challenges. So we have to do a really good job to onboard folks, let them understand what their bill is going to be, help them set up their phone. And if we don't do a good job at that, what's happening? Are there some steps people are taking after we are putting somebody, a new person on an account, that we need to address operationally? And we get it in bits and pieces. But if we can do predictive analytics on that, we can find patterns that we can take care of in a much more cost-effective way. And in the new accounts stuff, I thought I'd put a couple slides just in detail here. So this is a Sankey Graph. I learned that through the process, the Sankey Graph, kind of nice visual representation of what comes out of this end pathing function and starts to show visually, where do I need to look at where people, if I say a new account opened, what are the steps that people generally take after that? And lo and behold, there's some fat lines up here. And one of the fat lines is they call care. Is it good that they call care? Is it bad? They're excited. They want to go add some more services to their account, or they're calling care because they're confused. And they don't know what's going on. It's a great opportunity for us to winnow into the data and just dig into that line. Ignore other stuff for now. Let's dig into the stuff that's the big line that is meaning that there's multiple customers taking a step. And why in the world is that happening? And it allows us to start to dig into the data. And I've got a couple slides like this. But basically what it allows us to start to do is to start to dig into that information, both the different memo codes that get stamped into that data, plus the manual text information. And we can start to massage that data to make some more sense of it. And I've got kind of a picture after this. So you can start to take those memo path codes, start to narrow in on, hey, when I've got activations, I can create some session actions. And I can start to say, hey, when I end up taking a certain action, and I take these sort of paths through my data, which are these stamps that roll down, how can I get into this manual text and start to do some work to dig into that? Can I go mine it? Can I narrow it down? Because I don't want to mine all this stuff. It's 3 billion rows worth of stuff. But if I can narrow down into this, I can start to see some better patterns. And I might start to see some things that I've got third parties that are selling that are causing problems. Or I've got certain call centers that are causing me challenges. And there really isn't better ways to figure that out sometimes rather than getting into this data. And so the goal is to narrow it down as quickly as we can. And a quick final example here. This, for example, with some logic we're able to build in that said, hey, for new activations that actually turn into a problem, i.e. somebody leaves, they take something off their account, there's an issue, what's the first action that started that process? And I can start to prepare that data and then start to go. And then what are the paths that were taken after that? And I can make some decisions, whether I want to go mine through that text further, whether I want to do some more analysis on that on an aggregate level. But I can really start to shape this data into better ways to do analysis. Or simply, as one of our business partners said, we got to a point she was able to start to look at this by, it was small enough that she could look at it physically over a couple of pages and go, oh my goodness, I'm seeing some crazy stuff in here. So it was an eye-opening thing for our business partners that this is wonderful, because I've never been able to really do this before from a SQL standpoint. And by the way, you've got all the data in here, which is great. I've always worked on little bits of subsets of this stuff. And some boring code here, but it's something that is hard to do if we didn't have this. It allows us to write this and use the end-pathing functions some of the map, reduce it more of a SQL way that our business understands and our IT folks understand in a simpler manner. And we can start to parse that data out in a much more efficient manner than we've been able to do before. And I know it's fascinating, but it was really helpful for us to be able to make it this simple to get through this data. A final picture here real quick, just for curiosity. This was the synchi diagram for our upsell messages. The great messages, people tend to just go top up their data. They don't get offended by our message. They don't call care. If we tell people you're about to go over your data plan, people are usually pretty receptive to that. It was nice to confirm that, not see any issues out of that. But something visual like this was very powerful for our businesses to go, cool, I don't really have to worry about that. That's kind of nice. That's working well and not have to worry about it. And I would say from kind of wrapping up from my standpoint, and Stephen showed a picture of this, and this is kind of a teradata picture. But for us, this is an architecture that we're starting to look for as a company that already has some EDW components in place. We're certainly looking at saying there is a piece of architecture that we need to drive. And we need to have a cost to value conversation with our business around, hey, landing all the data for you to look at is something we know that you want to do. And we can do that. And we're going to give you some tool sets to be able to get to that data, say, in Hadoop, in some of our exploratory environments. But you business are going to have to put a lot of the leg work into this. From an IT standpoint, and you hate when you come to IT, and I tell you it's going to be $10,000 or $50,000 in this amount of time, we're going to give you some self service capability and some metadata. But you're kind of on your own to make this work for you and decide, is this data valuable enough to move farther along the chain and make more available to others across the organization? Because when we go to put in our EDW, it's a little more work upfront for us to model it and make it available to other people across the organization. But it can be very well worth it, and it very well makes sense, but I'm going to make you prove it to me business. Because in our old days, we wouldn't have had this discovery kind of platform concept. We would have taken the data, modeled it in the EDW, and they would have gone, that's not what I wanted. The world changed yesterday. This didn't work. So we're saying, hey, this is great. You're stopping here. And either you need new skill stats business, or we can bring people in and third parties to help you with some of the elbow grease, or some of the tools for you advanced SQL folks. You'll be able to use that. As we move along, though, and you want to make this more widely available, we will certainly help you productionize that and codify that. And our holy grail gets to a point where we have this picture, and we take the predictive analytics that I was talking about earlier, and we turn that into actions for the front lines. So if we identify in there these patterns that go, hey, we are doing something majorly wrong when we go and have an IVR conversation and in care, and there's a couple steps, guys, you need to know about, let's get something in the front line so they know that this customer is coming in with an issue, and there's a next best action that you need to take to help save that person. And that's ultimately what we want to head with this from these analytics into our front line impacting situations. This unified data architecture, I think the key point here, which is a theme to the whole conversation this afternoon, is use the right technology for the problem that you're solving. No one technology solves all the problems. So what you have here is Hadoop distributed file system, store all my data forever at the lowest price per terabyte. Very, very interesting. But if you put a data scientist on top of HDFS, they're going to really struggle to get the value out in a productive, cost-effective way. So having this discovery platform, this is kind of like a data R&D platform. There's actually not a lot of data in here. It's like your lab. You bring some data in. You experiment with the data. And if it doesn't have value, it's OK. You throw it away. You do another experiment. All the data lives for here forever. So I've got this storeroom of data. I can go get it. We store it very effectively. I can bring it in here. But I'm not constrained by SQL. I can do map reduce. I can do graph processing. I can text processing, all those kinds of things here. And if I find something really valuable, then I'm going to promote it into the enterprise data warehouse. So the core data that has value, we put it here. We move the data from Hadoop to the discovery platform. We use a no ETL strategy. Just bring the data in pretty raw. And let's work with it with map reduce and other things. And we have some technology built on top of the open source H catalog to allow us to use SQL H to go in and grab that data. I don't have to be, as a data scientist, a sort of flat file program on top of Hadoop. I can just reach in with a SQL type of interface. But I have access to advanced capabilities like map reduce and time series analysis and text analysis and graph analysis. I'm not constrained by the SQL model. Now, when I promote the data, then I'm going to add the structure. I'm going to use ETL here because I'm managing for efficiency and repeatability and auditability and all those kinds of things. So data R&D is very different than data manufacturing. So we use different technologies for those. So with that, we thank you very much for spending the last session of your afternoon before the beer and wine happens. We've got a couple of minutes for questions. If you have them, other than that, thank you very much. So the question is, in the architecture, we are using the statistical modeling. So the way that I think about it is there's kind of the model building, and then there's the model scoring. And those are different things. So the model building, training sets, finding the right data, doing the discovery, that's largely happening here. Now, the scoring, that's actually a kind of interesting question because it depends on your skill sets a little bit. So if I had really good skill sets on HDFS, I might actually decide to score down here. But the commercial products aren't as mature here as they are here. So it depends on your skill sets and the tools that you're using, but the tools are getting more mature down here. And we believe our vision is a lot of the data transformation scoring preparation will happen down here. The reality is there's still a lot of it happening up here because the tools and the skill sets are maturing. So you have to kind of decide what the right combination is. And then you bring the scores here in order to produce the value. And I'll just say from a practical standpoint, for example, with T-Mobile, we still use SAS. So SAS is still a tool set that's there that we will probably still use for some of that. There's production models that kick out of there. And there's a skill set, a wide variety of people who have those skill sets. And we'll still do things like push down into Teradata, put data in there, and run that over full data sets within Teradata. But that is after you've discovered that makes sense. It is a predictive model. And it's one that we want to use on a regular basis for predicting upsell, cross-sell chances. So that still fits in this picture. And by the way, SAS Institute is making huge investment in order to get their stuff to run on HDFS effectively. And until they're there, we can do that. So those are the types of things. And we actually also have folks who will be using R in other sorts of systems as well as in other statistical pattern or tool set that we can feed Hadoop into those kind of discovery platforms as well and use R. So yeah, our goal, it's coming from what we would lucid clumar source systems, which are some internal systems, all of our OLTP systems, our engineering system off of the network, third-party data out of from a third-party standpoint. We had a lot of third-party data. So our goal, one of our challenges is within our company, our source systems have 10 or 15 different feeds that come off them to various groups in the business organization wanting a little bit of information here and a little here. We're saying, no, we're going to send it to one spot and we're going to do, we're going to have some metadata so you know what the world it is. And then if anybody wants to use that for analytics and BI purposes, then we shoot it out from that one spot. So our goal is eventually to have all of the interesting and reasonable data in that one landing zone. Some people call it data lake or common landing zone. That's the goal. Thank you very much. We appreciate it. And if you see any of us in the conference afterwards, just come grab us if you have more questions. Thank you. Thanks a lot.