 Thank you very much, Stephen and Andy, and thanks to everyone at Edgeserv, who's organised this and put this together. Welcome to everyone in the audience. What I'll be covering today is EMC's position on the big data markets. Just over about two years ago, EMC started getting very interested in this space. As the leading provider of storage into public sector organisations, as well as into the enterprise, we have a very, very big focus on this area. We've made a number of strategic acquisitions in this space to ensure our technology leadership. Here, my former company, Icelon, is one of those, I'll be speaking as well, about another one called Green Plum. We will continue to be making a number of acquisitions in this space as well as developing our own technologies and services to complement our portfolio. Those things are going to come back towards the end. I'm going to try and not have this be very product-focused. I'd really like to keep this high level and just kind of set the tone for the rest of the day. You're going to have some fantastic examples today from the speakers about their own environments, what they've built out, and some very good examples of how to grapple with the big data problem. I find myself in the different areas I go into. It's really a different way of tackling the problem everywhere. Hopefully, I'll highlight some of the things that I think are common among some of those environments, but you'll find the means of analysis, the ways you store and retrieve data is very different from case to case. In big data, as you highlighted, I think before, that big data has still got a lot of hype around it, and I think that's a very good point. I think in a lot of cases that there is a huge amount of hype in the big data space. I'll try to bring it down to a concrete level and show you what we're actually seeing, and I think, as I said throughout the day, you'll get some even better examples. The one thing that everybody always starts with in keynotes with big data is, God, there's a lot of it. There's a huge amount of data. I actually don't think that's the most important point about big data is the size of it, and actually most of the new cases that I'm seeing emerging in things like the financial services space or retail we're not really talking about enormous amounts of data. It might be a huge amount of data compared to what the organisation has done before, but it's not necessarily petabytes of data. We'll start with this anyway just to point out that there is an enormous amount of data, and the growth rates are very high, and I think that's important to us all. The quantity of data we've got stored is already really more than we can already analyse, and really big data is a lot about not accumulating more data but just finding better ways of analysing what we already have, and that leads to the next challenge, which is that most of that data is not in a form that we can readily analyse. So we're very used to, and this is true for EMC as well. EMC is a company with a history in block storage, in structured data, structured access, and also BI, and all of those things are very different than the world of big data, where we're dealing a lot with flat files, we're dealing with video, we're dealing with images, we're dealing with genomics data, and things that are analysed in very exotic ways and very different ways than what we're used to, and so that's really the primary problem. And if you look at the directions that EMC are going in terms of the technologies, it's very much focused in this direction, towards objects, towards unstructured, towards ways of experimenting with that sort of data to come out with insights. So some of the areas that we're starting to see out there started off really in areas of uncomputable big data. So we started a lot in areas like video, which undoubtedly is a huge amount of data, but it's not what most people would think of today as BI. That is transitioning, and we're really seeing that move more into things like retail analytics, where people are looking at, for example, what are people buying on a given day? They're even moving to things like video and actually analysing customers as they walk around the shops and seeing what products are they moving from to as they move around, how long do they spend in particular areas. So you really are seeing that shift towards unstructured data there. The same thing goes for the financial services space. There where people are actually amassing huge quantities of market data that's being stored in a number of different formats, but they need to be able to analyse that data very, very quickly to be able to determine their position, their profitability and so on. So we're seeing this emerging in a number of different areas. One of the big areas for us right now, particularly in Europe, is utilities. So this is something that I think had a little bit of hyperrandom, but it's actually real. We're working on three that I know of, and I'm sure there are several others, of very large scale projects, typically in the petabytes, where they are actually storing data that is either coming back from the grid, so we have sensors out in the grid that's recording data and bringing that back into the data centre where it can be analysed. It could be coming back from smart meters, so it could be coming back from consumers' households where that data is sent back in a real-time stream. This brings up another important point about big data, which we'll come back to with the three Vs, which is the velocity of the data. With structured data, we're used to running reports on a periodic basis every once in a while. This is a very different type of data. This is something that is coming into the data centre on an ongoing basis and it needs to be processed and understood on an ongoing basis. How the analysis is done, this is where the experimentation comes in. There will also be an experimental element to it where we try to discern what are the right questions, what are the right ways to analyse the data to bring back some new insight that we didn't already know. Just a couple of examples of things that EMC worked on. These are both based in the US, but we've got very similar examples to these over here. Healthcare is obviously something that is a great discussion for the people in this audience, because it has a big public sector impact. One of the things we've seen as we look at the life sciences market over the last few years, something I've been very involved in, is this move towards things like customised drug development, is things like electronic healthcare records. Being able to identify quickly whether a newfound drug will provide an actual benefit to patients, whether that drug will harm patients in what quantities, and whether we can isolate a drug down to a particular segment of the population. We're in sort of a learning curve in this space, and the IT is evolving as we move on. It's going from things like summary data, where maybe you've got information on a few treatment pathways, and that's moving on as we incorporate data from much larger populations. Instead of sampling, we're actually bringing a lot more data in. 1,000x, 100x, 1,000x is much data we're using before, because the IT has caught up with that now. We're moving on to things where we have accurate information on the genomes of the people that are in particular experiments, that are in particular research projects. That's bringing precision to the data and allowing us to make our decisions, not based on what we understand from the existing literature, but combining that with exactly what we're getting back from very large population studies. That's bringing accuracy and it's improving outcomes for those patients. You're going to see continual movement on this for at least the next decade. There's a huge amount to do. We're very close today to the $1,000 genome. I know Guy will talk about that shortly. That is something that is really going to bring a day where we really can have a number for every person and use that number to identify how they will respond to some new treatment. That's not tomorrow. We need much, much more research after we've gotten to the $1,000 genome, but you can see that day coming. It's not that far away. Retail banking, the bane of our existence. If we'd had some better technology back in 2007, 2008, we might have been able to avoid at least some part of the crash, which nobody really saw coming. You didn't have good ways in retail banks of being able to identify risk positions within those banks. We didn't have a way of identifying that when someone took on a credit default swap, what was the implication for the rest of the bank? It was too much data. We didn't have a way of analysing it. The banks have resolved to change, and I'm seeing that in the banks today. There are changes going on today where the accuracy of the positions of that bank from a risk and credit perspective are moving towards a real-time model. There was a centralisation of data that's taking place, where data is being lifted from across the data centre from around the globe, being placed in a single area, often being analysed with Hadoop, as well as a stack of technologies on top of that, and being used to come up with a real-time credit analysis of that bank, a risk analysis of that bank. So this is real. This is not hype. This is something that is actually happening in most of the high street banks you know today. Let's talk about some examples of different types of big data. The ones that I started off with with Icelon were the media ones, video on demand, special effects, and these are ones where people create new intellectual property. So they're creating new data which might be then modified by others to add value to that data. It might be someone creating a special effect for a new film. It might be for example the Titanic film that came out where it was originally in 2D. That was turned into 3D. That's a big data operation. It requires a huge amount of data to do that processing. So that's one area that people talk about as big data. But this is not computable big data. This is not something where, apart from a couple of niche cases, like again with camera data for example, this is not something I would really put in the computable space. You've got data that's generated from workflows. So we work with a lot of high-end manufacturers like Peugeot, Jaguar, Land Rover to take simulation data from their systems and allow them to avoid the use of physical elements to test their cars, crashing them together, putting them in tunnels to test the airflow around them, and instead do all of that in a digital manner on computers. That also is a big data problem and now you're moving more towards the computable side. So this is data that people are actually using and reusing to be able to determine the safety of vehicles, to be able to determine the efficiency of vehicles, and of course also aeroplanes and other mechanical devices. So this is something we're seeing a huge amount of momentum on as well. Developing new intellectual property based on existing big data sets, and this is a good example of this of course, is pharma. People taking existing intellectual property portfolios could be public domain lists of genomes or proteomes or other life sciences data and using that for research, which is something that many people in this room are probably already doing. Companies and agencies and organisations in the public sector as well as utilities are also mining data to either optimize their operations or to provide some sort of competitive advantage in the private sphere. And this is something I alluded to before with some of the large energy companies we're dealing with today, oil and gas companies doing this as well. But in the public sector, I think one of the most exciting things is the availability of very large data sets about our populations, about how we use public services and about how we can optimize the delivery of those public services. And this is really the beginning of this, of how this is being done. There's been a lot of good research into this showing that actually very significant savings could be achieved in the public sector. Were we to apply these data science principles to these data sets and actually determine again, how does someone use the benefit system? How can that be optimized? Who really benefits from the money we spend? And that's the sort of thing that we have some information about today and of course there have been innumerable studies done that by getting the actual data over a period of time we can actually analyze that very, very precisely and come up with much more accurate answers. Finally, consumer data. So I'm actually going to wrap around and come straight back into consumer data. The reason I think this one is interesting is this is really where it all began. So the technologies that you see developed in this space, whether you're looking at the original MapReduce, whether you're looking at Hadoop, the technologies that have come around that like Cassandra, those technologies were principally developed by the big internet players and that's because they first saw the requirement to process real time vast amounts of data. So the good news is they've already developed a lot of the technologies. There's a lot more to come obviously but that work's already been done. Now that is trickling down into the public and private spheres where people are saying I don't really look like Google but I can use some of that technology to improve my organisation. That big data, as I mentioned before, is principally file an object. This is a huge change for EMC. It's a huge change for most of the customers I'm dealing with. The infrastructures that they're building on are not really suited to this type of data and when they look at what they're storing and I've seen many, many slides at big data conferences around this, a tiny fraction of that data is the structured data they run the business off of and a huge amount of that data is taking up all the spindles that they own the disks that they own is actually unstructured data and a lot of it's been generated very, very recently. You'll hear figures thrown around of 90 to 95% of the data in the world was generated in the last two to three years. Where is it coming from? Why is it appearing now? There are reasons for this. So any time there's a new trend I think it's worth asking oneself is there a real reason why this is appearing now? I think about this for things like cloud. You know, why did cloud come around at that time? We really must ask ourselves about big data the same questions. First of all, things have come around to make it cheaper to manage and store and deal with and analyse. So it's become cheaper and of course cost is always the first thing we look at. Disks are getting cheaper all the time it's becoming cheaper to manage at scale to an easier and more reliable to manage at scale and the technologies are coming around as well for analysis. Hadoop being one of them, but there are many, many others. And those technologies are making it much simpler to store and analyse data. Increasing access to data in the cloud. So the cloud is actually causing part of the creation of this big data. Users have access to data in the cloud it could be other people's photos, it could be tweets it could be any amount of data but when they get that data back to their mobile devices and their tablets and they create more big data which goes back into the cloud. So it's a really feedback effect. So this also is a major part of the change. Tools and I'm not talking really about the infrastructure tools like Hadoop I'm really going further up the stack than that. The tools that are used to actually do the experimentation to do the analysis to do the iteration of the data and come up with those insights. These are being developed in a daily basis. We actually have one we're about to launch this year that will really make it possible for researchers to have a dashboard where they can sit down and actually try out different experimentations and analyse those results and then iterate and try out different things. These are the sorts of tools we need and we need these because one of the things that's holding back big data and I'll come back to this in a moment is the lack of data scientists. There aren't enough people who can actually do this. I'm no expert in the science myself but during my master's degree but it is quite sophisticated stuff. If you want someone to do a clustering analysis of your customers and determine how they fragment into different groups and their spending patterns not everybody can do that. So you need someone with a little bit of mathematics background to do that and that's why that's going to hold things back just a little bit. But the tools are making those people more effective, more efficient and that's helping to progress things. Finally, proliferation of data capture devices and this is another one I think is probably one of the more obvious ones. The sensors that are out there today the fact that your mobile phone is constantly sending data back about how that phone is being used the fact that you have sensors out there in those smart grids that are sending data back the tablets and mobiles that we have everywhere. So there's a lot of different things that are creating data and putting that back into the cloud. So the combination of these four effects together that's what's causing this inflection point of data now. You'll have heard the three Vs so let's go over these very quickly because I think most of you probably have heard these but the volume of big data we've kind of covered so we know there's a pretty big quantity of data that's out there that needs to be analysed. What's not as well known is the velocity of the data meaning how quickly is it coming into my enterprise, my organisation, my data centre and typically it's coming in really in a real time rate. So a lot of people are very focused for example on Hadoop which is a derivative of MapReduce that Google produced but Google's already moved on to much more real time methods. That's a very batch orientated method. Google have moved on to things that actually can analyse the data as it's being trickled in so that search results are instantaneous and that's the direction that most organisations will ultimately go as well. Is as this data is coming in as you get feedback from your customers as you're analysing for example clickstream data that's coming in from your website or through your own sites. For example it could be something like you're working for a public sector organisation and people are downloading forms and filling forms on the internet. That's a real time process. We'd like to have that data analysed and fed back into the engine real time. Finally, variety. The different types of data we got out there. I'll talk about green plum at the end but one of the reasons we selected green plum rather than one of the other technologies to acquire is because green plum is very focused on these types of data. It doesn't matter whether it's flat files, PDFs, images, videos you can bring all of that data in and all of it can be analysed using the same mechanism and that's a very important part of how this analysis takes place. The internet of things. You may have heard this term before as well. This is simply saying that we now have a huge amount of physical devices out there in the world generating data and feeding it back. Mobile's tablets sensors, social networking sites these are all actually creating a lot more data. If we think back to the web of the 1990s that was very much a read-only web. It was a web where we went to the internet and we read things. We didn't really submit things back. Today's world is very, very different. There's as much data going back in as being read and that's a huge change. That data can be used in many very effective ways. Marketers are looking at that obviously principally now, but there will be many other use cases coming along. To speak just for a moment about the velocity of data you could cast an eye over this image which just gives you a sense of how quickly things are being generated. The number of emails, the number of Twitter feeds or tweets online the number of images being uploaded the number of Google mails being sent it's endless. The quantity of data is huge and this data at least for the internet sites this is being used. This is not just sitting there in repositories. Google provide email for free for a reason because they can find out from that information that's in the emails that you've sent. What you're doing, who you are how much are you like your friends or not like your friends this data is being used and it will be increasingly used throughout organisations. We talked about Google used to be fantastic search engine. I remember when it came out and I thought it's a great search engine I can't imagine what else they'll do with it but it's a fantastic search engine didn't really know what they were about to do next. The graph of the entire internet stored within their servers now Google plus so they also have social networking data about you they know your relationships they have your email they have all sorts of information about trends they instantly know when something jumps as a trend anywhere in the world disease trends simply by knowing who's asking to search for something so there is great value in the information they have in fact the entire corporate value of Google is based on that information that they keep Facebook and Twitter are no different Facebook is a huge customer of ours and they store we store audio and video for them but there's a lot of data that they store that is simply structured messages that go back and forth between people the most important piece of information that they have are the graphs the graphs that go in Twitter's case from the sources to their audiences in Facebook's case the graph between you and your friends network and that information is far far more valuable than what Google own and that is why Facebook is about to have 100 billion dollar IPO because that information is probably the most valuable information anyone can have from an advertiser's point of view Amazon where I used to work Amazon knows not just probably more about the world supply chain data than just about anyone else but they also have every single purchase you've ever made and they can track those purchases you've made as well as those of your friends again immensely useful data Carriers are not being left out of this by the way either we're talking to a number of telcos is they know when you've gone from one mobile tower to the next obviously they get that information as you move around so they know where you're located they don't need GPS for that they know that from the beacons and so using that information they can advertise to you and we already have majors in the UK major telcos in the UK that are already doing that so there's another use case of big data frightening in ways but they're also very good uses for big data and in the public space I've touched on a couple of those examples but this is the thing that got it going graph theory sentiment analysis you've probably heard these terms before but this is basically the analysis of this data and the structure behind that data that determines the value that graph is what makes everything so important in knowing who's connected to whom and what they'll do next so from my point of view not going into the organisations I'm going into particularly in the private sector the race is truly on I'm going into banks that are trying to buy up petabytes of market data because they're behind so there is definitely a race and it does take a long time to build this stuff you need a data warehouse you need copious amounts of unstructured data you need people to analyse it it doesn't happen overnight it takes time to build this and there has to be a compelling reason on top of all of that on my point you have organisational change to confront which is probably the most vexing problem because right now decisions are made at every single level of the organisational hierarchy and they're made by people based on pretty poor information if you want an organisation that actually moves according to the scientific results of the data things have to change in the organisation and that's probably the toughest nut to crack many companies have got a head start I mentioned some of them there but there are some surprising ones that you might not know about who have made big moves in this area I'm not going to start naming corporate names here but this isn't hype several companies have already invested massively in this infrastructure have hired people for it and have begun those changes within the organisation so every CIO needs to at least look at this I'm definitely not going to say that every organisation has got a big data problem that's definitely not true it may not be true in five years but for the ones that do have access to that sort of data it can make use of it now is the time to start considering what to do with it as I mentioned I used to work at Amazon I'm just going to dive into one little example in a little bit more detail I worked in building supply chain in back end systems for Amazon as well as some aspects of the website I just want to give you a little bit of sense of why this big data and how it's used to provide efficiencies first of all purchase information immediately adjusts the supply chain if prices change the supply chain is instantly real time changed and that leads to much more rapid efficiencies in terms of the prices that they can buy at shipping and logistics are also dynamic and all based on big data the inventory at every single distribution centre that Amazon has is all known real time every time you go to buy something it may adjust that purchase to be supplied at one distribution centre or another it may fulfil your shipment from multiple different ones all of it is computerized and all of it is running off big data so it is instantly accurate at all times the most efficient model will be chosen every single time very hard for brick and mortar retailers to compete unless they make their back ends equally efficient other customers information they have access to everything they know everything that everyone goes and buys and now they are starting to learn the relationships between the buyers between the people who go on the site this is something that most brick and mortar retailers don't have access to they are able to use this information to do things like as many people know recommendations but they recognize trends much much more quickly than other retailers do and they recognize the fragmentation in their own customer base and how their customers cluster and how those clusters change over time and again they learn that much more rapidly than everyone else the big retailers obviously have jumped into this as well Walmart over in the States is one of the early ones who got really big into ICT for retailing Tesco, Carrefour and the rest we are working with many of these to deliver exactly the same capabilities good data is very hard to get so when you think about making a decision if you are a mid tier manager you think about making a decision how do you make your decisions based on very often based on what you think might be true what your manager might have said to you what you had in a past life based on hearsay very very difficult to get in good information and therefore a lot of decisions are actually not the right ones you see a lot of companies and organizations coasting along doing the same thing for long periods of time and then the market changes and it changes too fast for them before it's too late so they failed to detect shifts in consumer demand this is one of the biggest killers of companies we've all looked at examples like Nokia who are now struggling because they didn't recognize the changes Blackberry and other ones in many cases by actually analyzing the consumer behavior either using their own data sources or other data sources that might have been detected earlier the internet has made customers more segmented more fragmented and it means that each of these different groups needs to be talked to with a different dialogue it's not an easy thing to do but that is what the internet has caused is for the consumer market to actually be shattered into pieces and we need to understand each of those pieces differently and not treat consumers the same so we need to move to a data driven model we want to improve that decision making process and that means managing with data managing with the facts which is not an easy transition we need to make a science out of it a lot of people confuse BI or business intelligence data warehousing or data mining with big data and actually they are very different models big data is an iterative it's an experimental model it's one where a data scientist sits down and tries to understand the insights but often fails and has to repeat the experiment the results have to be proven they have to be tested against those results it's very very similar to the scientific model we're moving from gut field to rational decisions based on this data this is the overarching trend what do we get if we get to the other side when we see the light at the end of the tunnel when we go through this what is the result of it first of all the decisions become and the data becomes more transparent so we're no longer saying is this true the data directly from the dashboard the big data dashboard we have that information accurately back and it makes things transparent through the organization it also means that decisions can be taken at a higher frequency again today we're used to organizations that run a report four times a year hold their board meetings, make decisions and then move on that's not the new world the new world is a real time one where decisions are taken on a day to day basis and decisions can change throughout the month information is obviously more accurate I think that's a fairly obvious one you can tailor your products more precisely now even in the public sector this still makes sense because you're still providing something out to your customers and that may mean that the services you provide can be adjusted for efficiency to improve the use of those services to make sure people come back and use them where required there are ways to tailor what you're offering and in the public sector that data happily is very much available better decisions it means the decisions obviously on more accurate data are superior ones and finally better products because you get feedback it's probably more in the private sector but it means that the products that are being manufactured actually send back their own quality data they tell people how it's being used they can confirm how do people actually use the system as opposed to how the design is intended for the system to be used this information can come back real time as well what's holding us back why isn't everyone lept into this well it's not ICT data storage is cheap we can buy lots of data storage that's not the problem we can buy lots of compute that's not the problem servers have become cheaper than they've ever been become more powerful than they've ever been that's not really the problem it's not the quantity of data people have got access to data and again principally here in the public sector you have access to big data and the quantity of data again for most of these analyses is not actually that large maybe the hundreds of terabytes not necessarily in the petabytes so these are solvable problems it's not the value we've already got a number of very very well documented big data analysis cases where they're paid for themselves very very quickly a lot of these cases are in things like retail where positioning things in shops changing how coupons are sent out or how products are marketed have paid for themselves very very quickly so it's not the model the model pays for itself the real problems are the organizational change which we mentioned before getting the organization to adapt to the new model and acquiring the talent required to actually do these analyses and until these two problems are solved we won't see further momentum in the space so this is where most of the companies including my own are focusing on what they can do to help and this actually has often very little to do with technology it has to do with consulting in the organization it has to do with providing services so one of the things that we have provided is a team of data scientists that can be hired at to organizations to inform their own models and this is where a lot of companies are going it's really a services model around big data more than it is a technology one if you're thinking maybe I don't really have much big data in your enterprise it might be true but consider the different ones you may have you may track data on your consumers if you have a website you probably have clickstream data that tells you what people are doing on your website again it could be RFID tags it could be sensors only you know what's in your own organizations but it's probably more than you realize and again the bigger problem is not really acquiring more data most people have plenty of it they just don't analyze it it's just finishing off let's just quickly dive into some of the technology so Hadoop some mentioned before I'm sure many of you know Google invented their own technologies first published a paper on MapReduce that was then turned into an open source project called Hadoop and Hadoop is a system for processing key value pairs or objects at very very high speed through parallelism through scale out so it's a mechanism for allowing hundreds or thousands of commodity servers to do processing of data cheaply quickly and at lower scale and the key thing is the scale because if you want to run this stuff real-time if you want to be able to address these scales then you need to have an architecture that can grow to those if you start off with something that is designed at a small scale it can be very hard to grow it apparently run out of battery technology problems is there any manual way of moving it forward is this one right or is that something else no no apparently it's just died magic the ferries have come and saved me ok so we'll talk a little bit about scale up the product that I guess I have the most experience on Isilon was designed around big data use cases so it was really designed to come up with an infrastructure that could accommodate ever-growing amounts of data and be able to process it so it has a very different fundamental model than traditional storage and the idea behind it I'm not going to get too product focused here there's really only a couple of slides in this but the idea behind it is with traditional models when you're cramming on all those unstructured files those objects on traditional storage it tends to cause hotspots it tends to cause limitations in scalability and the network and the storage and the compute and that causes unfortunate consequences it causes you to make decisions around replacing hardware it comes with cases where you're not able to get your data back and it comes with simply problems where you have to manage a lot of independent things to get your work done so no matter which way you go on these infrastructures it tends to be complex it tends to be painful scale out technologies and this will be true of other scale out technologies not just Iceland but other ones as well I designed generally to be modular when you want to expand what you're researching you don't want to have to fuss with it you want to just grow it very simply so most scale out architectures will be designed to accommodate commodity parts most of the scale out infrastructures I see whether they're open source base whether they're coming from a vendor tend to be lots of commodity parts being joined together but the idea is hopefully they will be holistically joined together in a single entity and allow you to keep processing at the performance rates you had before plus the power of whatever you just added and the capacity as well will be expanded by what you just added and hopefully because it's a single entity it should be much simpler to manage that at scale so that's kind of the infrastructure part of where EMC are positioning in terms of big data and then I'll just talk quickly if this leads into the second product and then I'll be done we just released something in January to incorporate Hadoop directly into that storage so we now move to a model where Hadoop when it's running can plug instead of having to talk to the storage over an older protocol like NFS or SIFS or something like that it can just simply treat the storages if it were a Hadoop native file system but with all the enterprise benefits that come in the storage so that we can plug into Green Plum, which I'll talk about in a moment as well as Apache Hadoog and it is literally seamless plug-in you can download the open source and fire away at the storage immediately I think most of these points before but the demand coming from the analytics sites will move up the stack now to compute is moving again towards that complex analysis deep rich information that is desired increasing amounts of compute time delivery of information and this was what motivated for the purchase of the other product that we picked up, which was Green Plum so the reason we wanted Green Plum is it's not a traditional BI or data warehousing product most of those are built around optimizing a particular set of queries on the data that you're doing running those very fast not so good at experimentation Green Plum is built around experimentation so it's built around the model of I've got a lot of data and I don't know where I'm going with this and so because of that the focus here is if you don't have knowledge a priori about what you're going to do the only way really to process it quickly is parallelism is find ways of taking queries that you want to run across structured and unstructured data whether it's a sequel or no sequel approach and run that very fast on commodity x86 hardware and that's exactly the design of it similar to the Icelon product as well as a scale out model but this is now living at the analytics layer this can be this can use Hadoop to analyze your unstructured data it can use sequel if you want to at scale to analyze generally in a reader only way your data at very very high speeds so this forms the other half of it this is the analytics side now we're also releasing tools around both of these in the marketplace to facilitate easier analysis of data to help the data scientists progress through different experiments and make their working life more efficient EMC, like many companies have openly embraced Hadoop that is absolutely the direction of the market and there's no reason to ignore that so we have invested in Hadoop we have several committers to the open source code base within EMC we're making very big efforts in this direction and that's it, so just to finish off the stack again EMC very focused on the space simply because of our dominance in storage and now in big data and we are very committed to providing tools to all of you to help move you along this journey towards big data and we do provide a complete stack going from the storage layer going all the way up to the analysis layer but I hope as you look at the presentations through the rest of the day I hope you can maybe compare those to what you're doing now hopefully see some of the struggles some of the challenges that they've faced and again I suspect despite this discussion here of products and infrastructure they're not going to be product and infrastructure problems the thing to really I think to really get out of today is what are the organisational changes required for this what does this mean for my organisation and how decisions are going to be made at the top at the executive layer middle management down to the bottom what does that mean for the hierarchy for the tree and what is it going to mean for my own role in the coming years so that's really all I have to say about big data thank you very much for your time and happy to take any questions if you have them, thank you Thanks very much Robert great introduction to the day I think so we have a couple of people with microphones who are going to run around the room any questions here I should say to the people watching on the stream just a quick reminder if you have questions feel free to tweet them or raise them in the forum okay so we've got a question here could I ask you to say your name and organisation when you ask a question please Hi Rob, it's Edleddsky from Edgyser Rob, can you speak a bit about what change has taken place or what organisation change has taken place within EMC to deal with your own analysis that's a great question so we pay attention a lot of attention to our own ICT division within EMC and about how they use these technologies so one of the things that we've immediately done is revamped how we do our supply chain so that is now being analysed with green plum to be able to determine how prices are changing from different suppliers which is not just this, there's a lot of other components that we bring in as well so that's one thing we're doing we're also starting to look at our own customer data as well we often have found our own decisions are actually quite poor based on the trends within our own customers and a lot of that stuff really is coming back from it could be an account manager it could be a consultant that we have with that customer bringing back information so that is anecdotal if we can actually look at the data coming back from customers in a more detailed way we can make decisions based on that so we're employing in a number of fronts internally to do various things you're actually fitting your sales to your account people with RFID tags that's right, we're going to be sticking customers with sensors on their heads and that way we'll know everything they're doing at all times Rob, can I just ask a quick question which is are you seeing much interest in the education community in big data? Within the education community we are seeing I think we're seeing more really around the cloud space at the moment than we are around big data so I think I see a lot within the research sector but not so much education per se at the moment I do expect that to change over time Any other questions? It's a kind of follow on really Ken Chadd from Ken Chadd Consulting the AGE sector is full of very rich structured data in terms of both the users and what you might call let's call it library data for now the journals, the resources, the learning materials so I just wondered I thought it's a very interesting point you said most of it's unstructured but what particular opportunities do you think there are in this rich area of highly structured, highly organised data in this context whether there's anything particular that the education community stands to gain from that attribute Big data is still incorporate structured absolutely that is a huge part of it so we are working with a few Unis right now who are looking at multi petabyte projects more around cloud but some of them are moving more towards collaboration as well ideas about creating app stores for both academic staff as well as students and collecting information back from those students about how they use the services again it's fairly embryonic for us right now so I don't have a lot of information on how it could be used I suppose I can count with some guesses on how it could be used but so there I wouldn't want to exclude structured data from Big Data that's a very big portion of it, Green Plum was originally a product that was focused on structured data at scale so it's still a very important part of it absolutely I mean I would say within you know within HE if you're stretching out to is that including research or not including research? Well I think research is an area that's growing important to believe basically I can take the Amazon example there's a lot of structured data about books but journals that whole guy I was just thinking what particular I take it's not there of course you're dealing with it but I just wondered if you saw any special potential we're seeing some movements around things like intellectual property portfolios that's being done in the HE space so there's some of that that's going on but I'm still looking for some kind of I suppose leader examples in this space because of how it's being used it's not something I've seen come up a lot at the conferences but I'd love to be proven wrong I'll also be closer to the microphone now, good Any questions from Twitter? No, nothing. Okay, what I'm going to Oh, I apologise, there's one at the end I'll make this the last question Hi, my name's Miles Danson, I'm from JISC Thanks for the last question, I better make it a good one Big data analytics is it all about data exhaust being actors interactions with systems and services which I also know as activity data that seems to be the thrust of your talk is there anything else in there about big data? I missed one of the words at the beginning you said about actors, is that what you said? Yeah, I think the actors you've been talking about are mostly customers and their data exhaust, so the information they leave behind after their interactions with systems and services, is that the gist of what big data is? Is it the gist of it? Not necessarily, I mean certainly if we want to capture electronic data back from people in an accurate form it certainly helps if they're providing that data through a website, through some kind of mobile device or a sensor or something like that because we know the information is accurate but there are huge data sources out there today that have been generated through other means and we don't mind those, we don't use that information so it's not exclusive to that no I think in terms of the information that tends to be real time that we analyse yes, that tends to come solely from an actor you know, approaching some kind of electronic intake mechanism I think really but it's definitely not the only source now Can I press you into what the other sources might be? Yeah sure, I mean it's I'll give you a good example here I mean, so I guess one of the things is we've collected in the research space a lot of data over the last 10 years on clinical results and yet most of the drug discovery that we're doing out there uses a very small portion of that data that's not data that came in through interaction with any of those patients that's come through, obviously, experiments through research that's taken place with those patients but the data is just not being used maybe by one organisation or a few organisations one of the things that for instance, Guy's organisation has really helped with at the Sangre Institute is getting that data out into the public space where it can be used by researchers around the globe and that's something where that is a case where we're taking data that's maybe out there already and promoting it out there to researchers globally for them to use so there are lots of cases of stuff we have in making better use of it Okay, I'm going to wrap this session up because we're running a little bit late we've got a coffee break now which is back upstairs and then if you can manage it can we be back in here at midday and then we'll be back on target, I appreciate this a little bit of a rush but I think there's time to get a coffee and be back in here by midday can we just show our appreciation to Rob one final time