 An explosion in the production of big data along with the development of new analytical methods is leading many to argue that a data revolution is underway that has far reaching consequences for not only how business is conducted and governance enacted but the very nature of how knowledge is produced within society. This is because big data analytics enables an entirely new epistemological approach for making sense of the world. Rather than testing a theory by analysing relevant data, new data analytics seeks to gain insight that simply emerges from the data itself without apparent interpretation being imposed upon the data itself. This idea was expressed in a somewhat provocative way in a 2008 article by Chris Anderson of Wired Magazine, where he argued that big data analytics signals a new era of knowledge production characterised by the end of theory. He wrote that the data deluge makes the scientific method obsolete that the patterns and relations contained within big data inherently produce meaningful and insightful knowledge about complex phenomena, essentially arguing that big data enables a more empirical model of knowledge creation as terabytes and petabytes of data allow us to say correlation is enough we can simply analyse the data without hypotheses about what it might show. As he writes, we can throw the numbers into the biggest computing cluster the world has ever seen and let statistical algorithms find patterns where science could not. Correlation supersedes causation and science can advance even without coherent models, unified theories or really any mechanistic explanation at all. There's no reason to cling to old ways. Anderson's article is a flamboyant elaboration of what has come to be called datarism. Datarism may be recognised as the general underlining philosophy of big data which holds data as a primary source of truth in its own right. Big data offers the possibility of shifting from static snapshots to dynamic flows, from coarse grain aggregations to high resolutions, from data scarce to data rich, from relatively simple models to more complex, sophisticated simulations. This all-encompassing, pervasive, fine-grained nature to big data takes us into a new kind of paradigm where we could at last, as datarism would hold, access the world without any kind of mediation directly in the language of ones and zeros. In its capacity, to present us with raw facts, data may take us beyond our intuition, assumptions, bias, prejudice and other distortions. But at the same time, data can be deceptive, hiding behind a veil of objectivity, while excluding the relevance of context. Thus if we want to really push what we can do with data analytics, we need to be aware of where its limitations lie and how this paradigm of big data works. The traditional way we've done science is by creating hypotheses. We then go into the data to test that hypothesis. Traditionally, statistics has aimed, firstly, at the discovery of pre-existing hypotheses. But the very idea of data mining is not to determine pre-existing hypotheses but make hypotheses surface from the data itself. So hypotheses or categories do not pre-exist the collection and processing of data. These, instead, is the result of the processing of data, which reverses the traditional more theoretical approach. What people increasingly want now are tools that find interesting things about the data, what is called data-driven discovery. The analyst does not even have to bother proposing a hypothesis anymore. The argument is that mining big data reveals relationships and patterns that we did not even know to look for. Rebecca Siegel in a 2013 paper states this as such. We usually don't know about causation and we often don't necessarily care. The objective is more to predict than it is to understand the world. It just needs to work, prediction trumps explanation. As an example, we can take the case of a retail chain that analyzed 12 years worth of its purchased transactions for possible unnoticed relationships between products that went into shoppers' baskets. Discovering correlations between certain items led to new product placement and a 16% increase in revenue per shopping cart in the first month's trial. There was no hypothesis that product B was often brought with product Z that was then tested. The data was simply queried to discover what relationships existed that might have previously been unnoticed. Amazon's recommendation system produces suggestions for other products a user might be interested in without necessarily knowing anything about the product itself. It simply identifies patterns of purchases across customer orders, while it might be interesting to explain why these associations exist within the data. Such explanation is often seen as largely unnecessary in a world of commerce where all that matters are outcomes. There is a comprehensive and attractive set of ideas at work in the data paradigm that runs contrary to the traditional deductive approach that is in many ways dominant within modern science. The basic premise is that because big data can capture a whole domain providing a complete high resolution data set, there is no need for prior theory, models or hypotheses. As through the application of data analytics, the data can speak for themselves, free of human bias or framing, meaning transcends context or domain specific knowledge, and is thus neutral being able to be interpreted by anyone. As Yuval Noah Harari in his book Homo Deus a Brief History of Tomorrow writes, For politicians, business people and ordinary consumers, dataism offers groundbreaking technologies and immense new powers. For scholars and intellectuals, it also promises to provide the scientific holy grail that has eluded us for centuries, single overarching theory that unifies all the scientific disciplines from musicology through economics to biology. According to datism, Beethoven's Fifth Symphony, a stock exchange bubble and the flu virus are just three patterns of data flow that can be analysed using the same basic concepts and tools. We can already see how this idea of the universality of data is becoming applied as small groups of mathematicians, physicists, computer scientists and data analysts come to be incorporated into more and more domains, from finance to business consulting to energy providers and all forms of technology companies, which implies that there is a single language of data that applies to all equally. So what are the limitations of this data paradigm? Dataism is an extension of the empirical and reductionist paradigm in the age of information. Reductionism is the idea that a system, any system, is nothing more than the sum of its parts. It is to say that nothing is truly continuous, everything can be rendered into a discreet quantifiable format without any loss of content. This is of course what dataification does, all data is discreet in that it takes a section of the universe and sticks a label or value on it, presenting it as in some way separate from everything else and thus making it possible to move around and process into new configurations. Through analysis we break systems down to isolate component parts, quantify them and describe the whole as some combination of those parts. Reductionism has many great achievements, but it also has its limitations. It systematically deep emotes complex relations, context and continuous processes. It takes no account of emergent phenomena that results in irreducible whole systems and processes. It can tell us about the billions of neurons in the brain, but not about consciousness. It can tell us about the molecular makeup of water, but not why when we combine the molecules they create something that has the property of being wet. Likewise data is objective and it's discreet. It can't tell us about what is subjective and continuous. Just as it can't tell us about emergent whole systems, it can't tell us about what is subjective and continuous. The discreet nature of data is why it's so useful. It means that we can take it, separate it from the world and put it into an algorithm to manipulate and interpret it in new ways. But it's precisely that that is its inherent limitation. It can't tell us about the synergies between the parts that make them continuous, more than the sum of those parts and irreducible to those discreet units. Datification gives us a new tool to look at the worlds, but the problem is that that tool is incomplete. As convincing as it appears, reductionism is only ever half of the story. By its inherent nature, it lets us see some things and not others. The risk of this though is that we stay looking under the street lamp because that's the only place that data sheds light and forget about everywhere else that it doesn't shed light. Such an incomplete interpretation of the world can only ever lead to incomplete outcomes and ultimately unsustainable results. The technology ethnographer Trisha Wang describes as well when she writes, That's why just relying on big data alone increases the chances that we'll miss something while giving us this illusion that we already know everything. We have this thing that I call the quantification bias, which is the unconscious belief of valuing the measurable over the immeasurable. But the problem is that quantifying is addictive and when we forget that and when we don't have something to kind of keep that in check, it's very easy just to throw out data because it can't be expressed as a numerical form. This is a great moment of danger for any organisation because often times the future we need to predict it isn't in that haystack but it's that tornado that's bearing down on us outside the barn. This analogy of the haystack and the barn touches on an important idea which is that analytics helps us to better understand what is inside of the box and how the box works but it can't help us in seeing what is outside the box. For that you need a very different process of reasoning called synthesis which we'll be talking about again in the later module.