 Big Data is a term that's come to be used in reference to data structures that are diverse, complex and of a massive scale. Although the term has been in use since the 1990s, it's only with the rise of Web 2.0, mobile computing and the internet of things that organisations find themselves increasingly faced with a new scale and complexity of data. The term Big Data implies an increase in the quantity of data, but it also results in a qualitative transformation in how we store and analyse such data. It is certainly the case that with Big Data, more is different. The world's technological per capita capacity to store information has roughly doubled every 40 months since the 1980s, resulting in an extraordinary increase in data storage capacity. Since that time, the amount of information in the world has exploded. Likewise, the digitisation of that information has happened in a historical blink of an eye. Back in the late 80s, less than 1% of the world's information was in a digital format. By now, more than 99% of all information in the world that is stored is in a digital format. Equally, the amount of data available through the internet has grown at an extraordinary level. The world's effective capacity to exchange information through telecommunications networks was 281 petabytes in 1986, 471 petabytes in 1993, 2200 petabytes in the year 2000, 65 exabytes in 2007, and predictions put the amount of internet traffic at 667 exabytes annually by 2014. From this data, we can see how a little after the year 2000, the amount of digital information began to explode, and at the same time, largely due to the mass adoption of the internet and user-generated systems, the nature of that data changed from being largely structured to being largely unstructured. We might identify this as the tipping point from the world of data to this new world of Big Data. Indeed, industry, government, and academia have long produced massive datasets such as remote sensing, weather predictions, scientific experiments, or data from financial markets. However, given the costs and difficulties of generating, processing, analyzing, and storing such datasets, these data have been produced in tightly controlled ways that limit their scope, temporality and size. For example, to make the compiling of national consensus data manageable, they've been produced once every 5 or 10 years, asking just 30 or 40 questions, and their outputs are usually quite coarse in resolution. While the census may wish to be exhaustive, listing all people living within a country, most surveys and other forms of data generation are just samples, seeking to be representative of a population but not technically capable of representing all features. Big Data has a number of key attributes that make it distinct in nature from these more traditional datasets, including its volume, velocity of data capture, variety of data sources, its high resolution, and its often exhaustive nature of sampling. First, as the name implies, Big Data is truly massive in volume, consisting of terabytes or petabytes of data. Take, for example, the Chinese ride-sharing platform DD, which serves some 450 million users across over 400 cities in China. Every day, DD's platform generates over 70 terabytes worth of data, which uses more than 20 billion routing requests and produce over 15 billion location points. Or, for example, a typical 20-bed intensive care unit generates an estimated 260,000 data points a second. Likewise, a military fighter jet drone may have 20,000 sensors in one single wing to enable it to fly by itself. On one single flight, an A850 aeroplane can produce 250 gigabytes of data. Secondly, these data sources may be high-velocity, as data is being created in or near real-time to produce massive dynamic flows of fine-grained data. For example, Facebook reported that it was processing 2.5 billion pieces of content, 2.7 billion-like actions and 300 million photo uploads per day in 2012. Similarly, in 2012, Walmart were generating more than 2.5 petabytes of data, relating to more than 1 million customer transactions per hour. The variety of data and data sources is a key aspect of Big Data that differentiates it from more traditional forms of structured data. Photos, videos, text documents, audio recordings, books, email messages, presentations, geolocations, tweets are all data, but they're generally unstructured and incredibly varied. An article in Sloan Review entitled Variety Not Volume Is Driving Big Data Initiatives notes that the past several years have been periods of exploration, experimentation, and trial and error in Big Data among Fortune 1000 companies. For these firms, it's not the ability to process and manage large data volumes that is driving successful Big Data outcomes. Rather, it is the ability to integrate more sources of data than ever before. New data, old data, big data, small data, structured data, unstructured data, social media data, behavioral data, and legacy data. While this variety may be the key source of complexity to Big Data, it may also be the true source of insight. By referencing different sources, we can begin to build up context to events or outcomes instead of unidimensional interpretations. For example, if we take something like fraud detection on a debit card, an ATM machine may just swallow your debit card because you're simply using it in a different country from where you usually use it. This is the result of an analysis based upon a single data point which gives very crude outcomes. But with a variety of data sources such as social media, purchase history, geolocation, etc. a much more nuanced picture could be built up to better understand if it's really used standing in front of the ATM or not. Big Data is often exhaustive in scope, striving to capture entire populations or systems and all relevant data points. Take for example a recent project initiated by the US Securities Exchange Commission to try and capture and analyze every single US financial market event for every single day. The goal of the project, called Consolidated Audit Trail or CAT, is to track every life cycle event, every tick, every trade, every piece of data that's involved in the US market in one place and bring it into one data platform. The goal is to build a next generation system that will allow them to understand in a reasonable amount of time what is going on in the market. This involves taking data from all the different silos across all these different banks, the broker dealers, the market makers, the dark pools and so on and bring it all into one system. The system has to ingest between 50 and 100 billion market events per day that's 15 terabytes of data a day that needs to be processed within 4 hours and made available for running queries on the whole dataset so that any trade from that day can be traced from its origins all the way through to completion. We can note here how we're no longer simply looking at a very limited amount of historical snapshots but in fact all market events are available for analysis. Previously due to limitations of storage and computational devices we would basically compute based upon samples and then make inferences but the hope of this exhaustive sampling is that with big data there may be no sampling errors at all. Big data is characterized by being generated continuously seeking to be exhaustive and fine grained in its structure. Examples of the production of such data include digital CCTV, the recording of retail purchases, digital devices that record and communicate the history of their own usage such as mobile phones, the logging of transactions and interactions across digital networks like email or online banking, measurements from sensors embedded within objects or environments, social media postings and the scanning of machine readable objects such as travel passes or barcodes. The scale and complexity of big data requires in turn a change in computing models both in how we structure data and how we process it. Big data usually includes datasets with sizes beyond the ability of commonly used software tools to capture, curate, manage and process within a reasonable time. Data systems up until just a decade ago were almost completely structured relational databases. Data was structured into tables and columns but with the rise of big data has come the evolution of databases into a non-relational form, what can be referred to as NoSQL. A NoSQL database provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in the relational databases of the past. In contrast to relational databases where data schemas are carefully designed before the databases builds, NoSQL systems create flexible data schema or no schema at all. For example, one type of NoSQL structure is graph storage. Graph data storage organizes data as nodes which are like records in relational databases and edges which represent connections between the nodes. Because the graph system stores the relationships between nodes it can support richer representations of data relations. Also unlike relational models which are reliant on strict schemas the graph data model can evolve over time. Additional technologies being applied to big data include massively parallel processing databases, multi-dimensional big data can also be represented effectively as tensors making it more efficient to handle through tensor-based computation.