 In this video we'll be talking about the nature of fault tolerance within complex engineered systems as we discuss their robustness and resilience. We'll be largely talking about this from the perspective of network theory as it provides us with one of the best tools for analyzing failure propagation within our highly interconnected infrastructure networks. We'll try to give some context as subject by talking about some of the limitations to our traditional industrial age infrastructure systems and start to dig into some of the key factors surrounding systems robustness by breaking it down into internal and external factors. With the modeling of infrastructure robustness we're interested in how failures occur, how they spread within the system and how resilient the system is to those failures. And researchers are particularly interested in critical infrastructure for obvious reasons. We are so dependent upon these infrastructure systems that we hardly notice them until faults occur. Therefore the ability to model and analyze the behavior of these critical infrastructures and their interdependencies is of vital importance. Critical infrastructures are defined by the US homeland security as such, quote, critical infrastructure is the backbone of our nation's economy, security and health. We know it as the power we use in our homes, the water we drink, the transportation that moves us, the communication systems we rely on to stay in touch with our friends and family. Critical infrastructure are the assets, systems and networks, whether physical or virtual, so vital to the US that their incapacitation or destruction would have a debilitating effect on security, national economic security, national public health or safety or any combination thereof. Due to a number of features to the industrial age model of design and technology development, our industrial infrastructure has evolved to become highly unsustainable and along many dimensions we might say fragile. Key factors to the industrial age model that have contributed to this are its linear model of take, make and dispose that requires a high input of resources from the environment. Also its centralized structure that creates critical hubs and its model of batch processing that requires standardization and thus reducing the diversity in the system. Added to this, globalization and information technology have networked our world, creating many interdependencies between different infrastructure systems. For example, the Amsterdam Electrical Exchange was the first power exchange to be entirely conducted through the internet, making the electrical infrastructure dependent upon their IT infrastructure. And today almost all of our products depend upon the workings of a globally distributed supply chain network. Thus we are increasingly dependent upon global networks whose complex interlinkages and interdependencies we only partially understand. With every new shock to the system like the financial crisis of 2008, we've become more aware of these global networks and the need to be able to properly model and analyze them. So what we're really interested in then is the continued functioning of these infrastructure systems. And this is what we call their resilience. Resilience is the capacity for a system to maintain functionality despite the occurrence of some internal or external perturbation to the system. Which is very similar to robustness, meaning the ability to withstand or overcome adverse conditions. We can understand robustness along a number of different parameters, primarily relating to the system's dependency upon its external environment and the internal structure and makeup to the system. In terms of the system's dependency on its environment, we're asking about what the inputs or range of inputs does the system require because the technology infrastructure that runs our global economy is a dynamic system. Like all dynamical systems, it requires an almost continuous input of resources to maintain its dynamical state. 24-7 around the globe, these infrastructure systems need a continuous input of resources and energy from the environment. Without it, they will start to degrade very quickly. Like all dynamical systems, they are in a precarious situation where engineers and administrators are trying to maintain their high level of functionality and resource throughput when things can go wrong at any time. As we all know, our modern infrastructure systems have developed to become highly dependent upon a particular subset of energy and resource inputs. This has become a key source of vulnerability as everything from plastic to shampoo to hairspray to all forms of manufactured products are dependent upon the stable input of petroleum and of course all forms of energy likewise from heating to transportation to electrical generation. As an analogy to this, we might think of a tree that receives all its nutrients from its trunk that then ramifies out to all the branches. Being so dependent upon a single input value is a critical vulnerability that reduces the system's robustness. Moving towards distributed generation will help to diversify this set of input values and increase its dependability. Moving towards a circular economy is another factor that reduces dependency upon the input of raw materials into the system. We can also think about robustness in terms of connectivity. Can the dynamical system ensure its continued access to sufficient resources required for its functioning? Thus we are interested in what will happen if we remove one or more of these linkages. With the advent of network science much of this analysis can now be done using network theory. As a system with a high level of dependency upon a single input would be a centralized network whilst diversifying these dependencies would result in a distributed network which are known to be more robust. Network analysis of infrastructure systems is becoming a key tool and rising topic of research. Next we want to consider the internal structure and make up to the system. Again we can represent this as a network. We want to know how centralized the network is as a centralized system such as a hub and spoke air traffic network will be susceptible to strategic attack. Taking down one major hub will drastically reduce the network's level of connectivity and may result in its disintegration. This is why distributed systems like peer to peer file sharing networks are typically very robust. They often come under attack from law enforcement agencies due to copyright violations. But because the system is distributed there is no single node or cluster of nodes through which we can damage the entire network. These distributed networks typically have a low level of specialization between components. Meaning any nodes function can be easily replaced by any other or simply duplicated to another location. The first generation of internet peer to peer networks like Napster resided on a single server. Due to this it was possible to take the network down. The second and third generation of P2P networks are able to operate without an centralized server. Thus eliminating the central vulnerability by connecting users directly to each other remotely. This kind of distributed network has a very low level of criticality. They're extremely resilient and can be for all intensive purposes virtually impossible to destroy. And this is in strong contrast to many of our centralized industrial systems such as broadcast media, cities and airports that all exhibit a high level of criticality because the networks are dependent upon centralized nodes. But it's not just dependency upon a single set of major hubs that is important to robustness, but also dependency upon a limited number of linkages. These critical linkages between nodes are called bridging connections. Peer to peer networks also have a high level of resiliency owing to their low level of linkage criticality. Any linkage between two computers can be replaced by using a proxy server as an alternative pathway, meaning the network is not dependent upon any specific connection. This independence from any particular node or edge is central to achieving robustness. Next we need to consider how failures spread within the system. A primary consideration here is the overall degree of connectivity to the network with a relatively isolated system like a small farm in a rural community. Failures don't spread very far. Isolation through low connectivity is the most basic contagion mechanism. But if we take an urban center like central Hong Kong, a dense network of many interconnected infrastructure systems have to be working for it to run smoothly and small glitches propagate quickly. In these highly interconnected and coordinated systems we can get positive feedback loops that can work to amplify some small change into a large effect. This is the butterfly effect that we previously mentioned. And in these highly interconnected systems it is often the source of major systemic shocks such as bank runs or cascading failures within power grids. Key barriers to disaster propagation are redundancy and buffers. These can be engineered into the network but are also an emergent phenomena of maintaining diversity within the system. There is often a trade-off between diversity and optimization. Supply chain networks are a good example of this. Holding just the right amount of inventory is critical to optimizing costs. After all, inventory costs are incurred every hour of every day in areas including warehouse storage, heating electricity, staffing, product delay and obsolescence. All this makes for a strong drive towards ever-increasing optimization and just-in-time practices, which can lead to self-organizing criticality where we reduce the diversity of the components and the buffers between them to such a low level that we position the entire network at a critical point where a small event can trigger an avalanche of failures. And there's a core tension here between optimization of components and the system's overall robustness. It takes intelligent design and management to integrate both, thus maintaining an efficient and sustainable system. As different technologies and systems converge, the interconnections and interdependencies across different infrastructure systems increases and so also does the level of unknown linkages increase. A basic premise of complexity theory is that we never know all of the linkages in these complex systems. As an example of unknown interdependencies, we might think about the 2011 flooding in Indonesia, a country that accounts for approximately 25% of the world's computer hard disk production. This flooding caused a disruption to the manufacturing supply chain for automobile production and the global shortage of hard disks which lasted throughout 2012. Now, some people may know that Indonesia is a major producer of hard drives. Less people know these hard drives are in our cars and very few know the dependency of the automotive industry on this critical supply chain linkage. This is an example of a teeny linkage in the vast complex system of our global economy that is both interdependent but also no one manages or fully understands. There is no instruction manual where every interdependency is listed and this is the nature of our distributed globalized world where information technology enables people to set up their own networks. Companies, financial institutions, engineers, software developers, criminal gangs, government security agencies, hackers, they all just set up their own connections and they don't have to tell anyone. There is no government of globalization keeping track of all these connections and interdependencies. We simply do not know them all and often only really find out about all these linkages when the system breaks down. Because we can never really know all of the linkages within these complex engineered systems we can never say the system is fully fault tolerant but instead often the best option is building robustness into the system through diversity. In this video we've been looking at the nature of fault tolerance and robustness in complex engineered systems through the lens of network theory. We've talked about how systems robustness is a product of both external factors as to the systems dependency upon its environment and internal factors as to the systems network structure. Its overall degree of connectivity and its dependency upon centralized hubs and critical bridging links. We briefly mentioned the importance of diversity as a barrier to failure propagation looking at the tension between subsystem optimization and resilience. Finally we talked about the fact that we often do not know all the linkages within these complex distributed networks and thus can't ensure complete fault tolerance making inherent diversity an important security strategy.