 My name is Patricia Floresi, we only have half an hour. And I have material for an hour and a half. So number one, I will bring some clarifications. Yes, I'm Brazilian. My mother tongue is Portuguese, the Italian last name, like everything else in my life is my husband's fault and I can prove that. Number two, we are going to talk about a new concept. So I really want to ask you to please buckle up on your seats and enjoy the ride and be with me all the time. So thank you for being here. We are entering Cognitive Revolution 2.0. Do you know when Cognitive Revolution 1.0 was? I recommend that if you don't have anything to do this week and read the book Homo sapiens by Yuval. And he talks about the fact that human beings actually evolved when Homo sapiens came along and the difference between Homo sapiens and the previous human beings, where they didn't have the ability to think abstractly, to generate patterns, and so on. And he gives a wonderful example how we humans actually have this ability to believe in entities that we don't see, like the government, the church, and laws, and so on, so forth. And he is saying, look, if you have a group of hungry gorillas and you show up with a pack of bananas and it's impossible to actually convince the gorilla that he should not eat now because it's Holy Friday, but if he doesn't, then he will have a lot of bananas, an infinite number of bananas when he dies through all the eternity. But we humans are actually capable of believing in abstract concepts and that differentiates us and that was Cognitive Revolution 1.0. And then I said, if that was Cognitive 1.0, Cognitive Revolution 2.0 is what is happening with artificial intelligence evolution. So you all know about AI, machine learning. Many of you have not learned about representation learning. It was a marketing-starved area that never took off. And now we are in deep learning. And why is deep learning different? Because before, in the first era of AI, you actually focused on teaching the computer something that was very simple for you to do, like playing chess, but you couldn't scale to actually analyze all the possibilities and all the moves ahead of you. Now we are trying to let the computer learn how to do things that are very simple for us. We take it for granted, like recognizing the face of our parents and our siblings. But we cannot express easily in rules how we actually do that. To be honest with you, we don't understand ourselves. We take it for granted. So now what we are doing is that we are giving the computer a lot of data. And that is generating what is called the unreasonable effectiveness of data. And what you have here is a graph, a very important graph that shows four algorithms or four different analytics algorithms to accomplish the same task. And what you have in the x-axis is actually the number of words that you are feeding the algorithm with, measured in millions. And on the y-axis, you actually have the precision of the algorithm, the accuracy. Any of you have used lexia, alexia, whatever that is, that actually has interrupted the numerous phone calls of mine, by the way. I don't know what triggers that thing. But then you have the accuracy measured. You are looking for precision in translations and so forth. You observe from this graph two important lessons. It doesn't matter how good the algorithm is. They only achieve 95% accuracy when they cross the 100 million number of words. That's number one. Number two, after they did cross the number 100 million of words, the performance between them, the difference is negligible. Yes, there is a difference, but sometimes one performs better than the other and so on and so forth. And even though there are many, many learning algorithms and there will be a quiz afterwards on these names, what there is one thing that they do share in common. They perform better than more data they analyze. And then you actually see the emergency. What I hope it becomes Moore's law for data, that's very, very deep, simple algorithms with lots of data will outperform sophisticated algorithms with less data. So far so good? Okay, now why do you care? You actually care because with the number of IoT devices estimated to be one trillion by the end of this century and I'm very glad I'm not alive to clean up that mess. You actually have all the data in the world that you can think of. The problem now then becomes how you actually scale. Now we need to design today for the future, not for the past. Now if you actually look at what historically has happened and yes that has a hypnotic effect, what happens is that over time we moved from centralized to distributed, centralized to distributed. When we went to the cloud we thought hey, we are done. We are done, we can send everything to this magic place and boom, there we get the results. And then came IoT. Now let's take a look at what spectrum or what side of the spectrum should we take with IoT? If we go all centralized, I will give you five reasons why that will be challenging. Reason number one, if IoT data is inherently distributed at hard to reach places, remember, we put sensors to begin with on place that humans could reach. It was all born, you could say, with oil and gas. And with an oil and gas platform, you only want to put as a man and a dog. Why do you need a dog? To make sure that no human touches the buttons. Why do you need the man to feed the dog? So now we are saying hey, we put sensors in those hard to reach places but hey, now we can bring them over. Well, welcome 5G, but we are not there yet. The second problem is regulatory compliance that's creating really, really hard boundaries on data movement across geopolitical and social boundary. Then you have the effect of multi-cloud, the eternal silo of IoT, the eternal villain of IoT is the silo. Now you have half of your data in Salesforce, another half on Workday, another half on Amazon, another half on Microsoft Azure, and another half on VersaStream. And where is the data? How are you going to bring this data to a centralized place? Then you have your traditional data systems like your ERP, USAP, Oracle Databases, your data warehouses that is locking in data but also has a lot of value. And then last but not least, you have bandwidth constraints in terms of speed, capacity, and real timeness of actually getting the data where you need to analyze. So that's if you want to centralize. And what if you want to actually distribute? This is how your world looks like. You actually have now, if you follow NIST, is not only the edge to core to cloud, but NIST also came recently with the notion of the mist and the notion of the fog. The mist is close to the edge, the fog is close to the cloud, and I'm glad the weather system doesn't have many more terms that we could leverage. But in any case, this is how the architecture looks like. So you're going to distribute it where? Where are you going to put the analysis of the data? And if you're actually going to put the analysis of the data at the edge, you get a myoptic view. Remember the whole point of deep learning of the cognitive revolution to the RO is what? Simple algorithm with lots of data is going to outperform sophisticated algorithms with less data. So I don't care what AI or deep learning you put at the edge. If you don't feed enough data, you won't be able to draw patterns, correct? So now all of my work is inspired in the shoulders of bright women and some men, but I will leave you to actually research this Pythagoras wife. And she came up with the notion of a golden mean, and I take that to the heart. And the golden mean is the ideal median between two extremes. So in the area of IoT, perhaps our answer is not going fully centralized. Perhaps our answer is not going fully distributed, but actually is to find the golden mean that will give you the scale that you need without requiring data to be centralized and without the side effects of full anarchy of a fully distributed data. So let me tell you, if you were to take an x-ray of your edge fog cloud core continuum, then you actually would be able to detect what we call pockets of data. These are represented by the globes here and we call them data zones. And a data zone is a logical boundary where data resides and together with data, you have processing capacity. So the data is being collected in that neighborhood. The data can be stored in that neighborhood. The data can be analyzed in that neighborhood, but you really have severe constraints on how to move the data outside the data zone. So far so good. And our goal is to actually enable, I'm sorry, our goal is to actually enable analytics of data in place in parallel at worldwide scale without giving data scientists the abstraction that there is a virtual fabric connecting all of them without making, we make the data scientist aware that the data is not centralized, but yet we abstract away from him or her all the details and hard work around having data analyzed in a distributed fashion. So let me give you an example. Suppose that you want to calculate a histogram. By the way, we do do deep learning but it's much harder to explain in half an hour how you do deep learning in a federated way. So let's stick with the histogram. The histogram is a graph where you have these slices or the columns and you want to count the number of elements in each column. And you want to calculate, let's say, the worldwide histogram on anything. It can be on the vibration sensor measurements that are being taken on your factories across the world. It can be actually the age of survival for patients with diabetes is a true use case we have worked on that you actually are taking a particular type of insulin and you want to know how many of those are actually taking a particular interval or dosage. And one way, in the centralized, you have to bring all the data to a central location, correct? In the federated analytics, what we do is that we orchestrate the execution of local histograms in place close to where the data is. So each of the data zones of interest are going to calculate the histograms and they will share only the histograms with the initiating node. And we will then be able to calculate after we receive all of them a global histogram. Now, again, I'm giving you a very simple example. Please believe me if you want, let's go and have a conversation of how you do regression analysis and other kind of analytics in that fashion. Now, I want you to think about two things that just happened. The first one, we reduced a number, a data space that could be infinite in the trillions, billions of entries that you could have. And because all we want to calculate is a histogram, the number of samples that are actually sent is actually finite. It depends on the number of slices that you want for the histogram and all you want is a count. Of course, we actually have what we call the Federated Analytics 1.0 package, 101 package, and you actually get me, max, average, standard deviation, percentile, all the good stuff, but in one shot and you actually have more data samples, but it's still finite. So you conserve bandwidth, you don't waste. Number two, because the amount of data is very, very short and you're actually doing analytics in place, do you get the time to learn or the time to insight as much shorter because you only need to transmit a short amount of data? And number three, which I think is a very, very positive residue is the fact that this is private preserving. It is impossible to reverse engineer the individual values if all you have is a histogram. Of course, we put provision in the algorithm that if a site only has five samples or less, we don't let that site participate in the Federated Analytics because you could reverse engineer. So this is what we define as Federated Analytics is analytics in place close to the data source where you analyze data very, very near to the place of collection in near real time and you share the results and you let those results be emerged. So why is this challenging? Couldn't we be done yesterday? Well, I have to tell you. If I were smart enough, we would have started from the right to the left, but we started from the left and then we saw that where we had started was the easy place and then we started facing other challenges that I want to share with you. There are four components on Federated Analytics that you need to take into consideration. The first one where we start was how you actually orchestrate analytics, how you push code close to the data source, how you push code to the gateway, and how do you actually collect the results, especially in a scale where the numbers actually grow. But the real challenge, number one, is how you actually ensure trust, traceability, repeatability, transparency in that process. One of the biggest challenges in IoT is security. How can you actually validate that as a legitimate source? How can you actually validate that you actually got an analytics that was done on real data and on factual data? How do you trust the source? How do you know that the data was not tampered along the way from the edge to the core to the cloud? The second point that we actually really didn't know what you were getting ourselves into was the fact that when you calculate in a federated manner, it completely changes data science. So let's say you want to calculate the average. You cannot ask each data zone to calculate the average, and then you calculate the average of average, because any five-year-old will tell you the average of average has no significant value, and the average of average is not a global average. So how do you actually do it? You ask each site to send you the sum and the number of items, and then you calculate the sum of the sum and the sum of the number of items, and then you calculate the global average. In an average that is very interesting and easy, try to do that to calculate the medium or the percentile. And then we found that there are actually, there is a group of scholars working on private preserving algorithms, which I call data science to DORO when data science, one DORO, was not even a day of procreation yet. But we actually have to start thinking in a totally different way, what if I want to do deep learning when all the data is not in a central location, how I break the analytics process completely into smaller steps where some steps are done at the edge, others in the path, and others where the destination is. And the last but not least, which for me was one of the most interesting findings, is that if we actually stop to think, what is the abstraction that we computer scientists have actually used over the last three or four decades to actually locate address and access data, is file. Right, we base everything and say, you give me the name of a database, or you give me the name of a file system, give me the name of files, and then I will ingest this data. In IoT, when you have one trillion, one trillion devices, how are you going to manage one trillion names of files and the sensors are also ephemeral. You have a sensor here, the sensor breaks, you put another one, is this a new file name, you put a new brand, is this a new file name, and so on and so forth. So these are the four things that we actually set out to investigate, how you distribute computing, how you federated data addressing, how you federated analytics and learning, and how you ensure transparency. And this is the name of the project, Worldwide Heard. If analytics at the beginning was represented by Hadoop and one, Hadoop is one elephant, if we want to have a collection, you end up with a herd. And that's where the name got started, is Worldwide Heard, and Worldwide Heard addresses those four in the following way. Number one, we integrate with blockchain, so that gives us trust, transparency, traceability, repeatability. Every time a data zone actually does a transaction on behalf of a federated analytics, it enters an entry in a ledger. Number two, we actually make provisions in the framework so that analytics can be broken down into iterative parts that are done at the edge and at the cloud, at the edge and at the cloud so that learning gets into a continuous learning mode. We address the issue of file names and file abstractions by actually creating a metadata layer. And I'm going to talk a little bit about later how we integrated that with our own gateways and EdgeX Foundry because we are giving data scientists for the first time the ability to say, I want to do this computation on all the measurements of vibration sensors. I want to do this computation on all the measurements of genomic data. And the way that it actually works is that each data zone does a local search which is completely self-contained, nobody outside has visibility, decides which data sources should be eligible to participate on a federated analytics from some trusted sources, attaches metadata tag to those data sources and when data scientist says, I want to actually calculate these on a vibration sensor measure, we push computation to all of the gateways and when the computation gets into the gateway, it connects to EdgeX Foundry or any other system that already abstracts the data source into some higher level type of information and then we actually connect to that data source and do the calculation. And last but not least, we actually, Delta Technologies, one of the members of the Delta Technologies family is VMware and we have extended VMware, we realize platforms, so now you can deploy containers anywhere in the world, it can be Docker containers, it can be VMware containers at any place in the world. I would like to actually talk a little bit about a couple of things. Why IoT is unique? Why do we have to think different about IoT? And I come up with what I call the three S's, they scale, the system of systems and they're streaming. The first aspect of the system, they scale is based on the symbiotic relationship, this positive feedback that exists between IoT, the ability to scale, analyze and derive value. You actually start with some very, very small sensors and then you analyze the data, you demonstrate value. When you demonstrate value, generate revenue, generate return on investment which will allow you to increase the number of sensors which will increase the scale and as you scale, you'll be able to do more defense than analytics to the type that you'll be able to truly do pattern recognition and deep learning with data that is generated at the edge. So we need, it's our duty, your duty and mine as computer scientists to actually accelerate this journey. We must accelerate this feedback loop to create a scale that is required for deep learning. The second problem is, the second characteristic I wouldn't say a problem of IoT is this notion of system of systems that we cannot think of IoT at the sensor level. Why? Because that absolute value of the data is absolutely irrelevant in my humble opinion. That data needs context, needs business insight and needs to have an ecosystem of other datas that make the other data that makes that data relevant. It's the idea that we will need to analyze IoT vibration sensor data in relation to weather system data, in relation to the business data, in relation to the supply chain and the performance of whatever device you're building as it is on the floor. So we have to learn how we create a system, an ecosystem where data can be analyzed by multiple parties. So the whole concept of this federated analytics is that you make your data available for analysis by other entities as long as you can also analyze other entities' data and actually derive insights. So please think different. And the last but not least is the streaming nature of the data. We are not talking about discrete data. We come, I come, I punch, punch, I punched cards and I use the main frame. So I'm fully qualified to say that I was from the previous generation, not a cloud native. I have to tell you. I was a pond native where the data center was across the pond for security reasons. But you have the streaming nature of the data. That is something that we need to learn. IoT, you need to think different. There is an IoT evolution, there is this scale, there is this streaming, a last but not least, the system of systems. So how do we address all of that with worldwide heard? So first of all, we consider that these data zones are connected to gateways. One gateway can be a data zone. One gateway can also be an aggregator, an appliance that is connected to multiple data zones where each data zone is a gateway. You also have the ability of actually data flowing as streams. So one important thing here is that if we had the intention as data comes in a stream fashion to just propagate the stream all the way out, we couldn't scale. So what did we do? We found ways of actually decreasing the density of the streams by dividing it number one in batches, analyze the batches one at a time, those are windows of the stream and actually sending just the results of the data. And that's how we scale to streaming. So let me explain to you here. When you have a stream of data, we split at each data zone, they stream into batches. We analyze that batch and we feed the results of that batch and we synchronize batches along the way. So this is our vision. This is what we want to create. We also actually look how we integrated, I don't know if you know, Delta Technologies was one of the pioneers on Ajax Foundry and we take that very, very seriously. So our federated analytics is fully integrated with Ajax Foundry, it runs on the gateway, it uses a lazy approach that it asks all the information to Ajax Foundry at the moment that the analytics is done. We actually have a, we run on the gateway, but also we have appliances for streaming. And as I mentioned, we also integrated with blockchain. This is actually very, very important. Every time we do a computation, we actually register into the ledger who requested the computation, what was the analytics that was done, what was the code that was executed, what were the results, who what else, what are the data zones, have we requested that most importantly, we keep a hash of the data. Now keep in mind we are also a data storage company, so we encourage you to keep a copy of everything. So, but what do we do is that we look at the data, we calculate a hash and we put that hash into the ledger. So should that type of analytics be subject to litigation, let's say in data sciences, genomics or clinical trial, you can actually go back and repeat the results by doing that. Let me give you some final thoughts. WWH, the Worldwide Heard of Federated Analytics, was actually inspired on solid foundations. First, it comes from Greek methodology, not Greek mythology. And it is inspired, and I urge you to look up her name. It's Pythagoras' wife on the concept of golden mean. We are inspired on the worldwide web principles. This is our foundation, this is our 10 commandments that were reduced to five because we economize and we take them seriously. I don't know if you heard, but the World Economic Forum actually had a charter for 2018 was how to create a shared world or shared economy or shared future in a fractured world. That was the speech. And I like to say that Federated Analytics was inspired by that. How can you actually do analytics and learning at worldwide scale when data is fractured throughout? Now some final, final thoughts. I promise I will finish. I would like to finish by telling you a story. As I mentioned, I'm Brazilian. And at the end of 2014, there was a documentary in the salt of the earth about a famous Brazilian photographer. His name is Sebastião Salgado. And his oldest son, 21, and actually did a documentary of his father actually going on exploration. And that is a very famous scene in the movie where they are about to shoot sea lions in Alaska. Very, very cold. And then white bear comes along. So the son turns to the father and says, daddy, please take a picture. We have to run. And the father looks at the white bear, which by the way is in no hurry. And walking very calmly, but to us, their direction. And he says, no, I will not do it. We'll find a tent, we'll hide. And to make a long story short, 10 hours later they came back and the father did his job. After the bear had left. So the son asks the father, why didn't you take a picture when I told you to do? To do so. And the father looks at the son and says, I don't take pictures. I really don't. And the son says, what do you do for a living? He said, I actually, when you take a picture, you are actually capturing a moment or you're documenting history. To take a picture is to capture a moment or document a history. I actually photograph. And photograph comes from the Greek word, photo is light, and graph is to draw or design. I design, I draw, I invent with light. And these are examples of the picture that actually the father takes, right? Or maybe he doesn't take pictures. That's what he designs with light. And when I actually saw that movie, I told my husband, now I finally can explain digital transformation. Now he knows better than Argyo. So he agreed and moved on. But I actually got out of that movie and I put this together. Look, if photograph is to draw or design with light, then I, Patricia, can come up with the term technograph. Tech for technology and graph to draw the design. And technograph is this ability to draw or design whatever you want with technology. And if you agree on that definition, then I will give another definition for digital transformation. Digital transformation is the way that we are going to imagine our future. We are going to technograph the way we walk, live, and think in different ways. That is a very famous movie, Ferris for Buller Day Off. Have you seen that? Well, maybe some of you, for the young ones, don't waste your time. But that is a very famous saying where he says, life moves pretty fast, correct? If you don't stop and look around once in a while, you could miss it. And this is what I fear every day. Ladies and gentlemen, there are three types of people on earth. People that make it happen, people that watch it happen, and people that do not know what happened. And we are in the dawn of a new era. We are the dawn of the digital era. In the dawn of this new era, what do you want to do? Do you want to take a picture, a digital picture of your present and your past, which is even worse, as it is? And make it repeatable by automating it? Or do you want in this digital era to imagine and technograph your future? Let our digital transformation begin. Thank you very much for your time today.