 We have a big effect on the practice of law, the structure of law, the goals, and probably a different view than most of you have. So I do a bunch of things that are unusual. I'm on the advisory board for innovations for the American Bar Association. It should cause you to be deeply worried. And I'm on the board of the United Nations Global Partnership for Sustainable Development. So that's like the data side of the United Nations Sustainable Development policy. And I also help people like the Secretary General but also the companies think about how they handle privacy law, data ownership, things like that. And we have a group here at MIT and only does research in there, but we also build software. And the software is called Trust.mit.edu. It's about giving people ownership over data about themselves. So it's actually important for the privacy laws in Europe and thinking about how you can operate in the presence of localization, localization laws, and privacy laws. And I think we can change the direction in where we're going to. This is the world, right? We have data everywhere. So data cell phones, data credit cards, data off the doors, the cars, everything. And the data increasingly runs. So things have a little bit of software. They use the data to cause the cars to run, the building to open, all sorts of stuff. And the question is how are we going to use this? Not just for commercial things but for public good. How are we going to govern ourselves? So this is really sort of a very top level. And over the years, there's no person to blame behind a number of experiments. These are country-wide experiments particularly where we get data from the credit card companies, the banks, the telcos, the government, and we stick it all together, the whole country. And then we try to aggregate it and renounce it. So it's more like census data when you're done. What you can see is you can see things like where do people spend money, and what do they spend money on, and where do they go during the day, on average. So think of sort of census data, but alive and dynamic, including private data from companies as well as government data. What we've been able to show is that, and we've done this in places like one of the first ones was cooked-of-law, we were able to show that we could not only make their transportation systems active because it turns out the buses didn't know where the people wanted to go. So we could actually make the bus system a lot more efficient, but we could also help the public health system because most infectious diseases transmit a lot of people. So if you know the patterns of movement of people, you know a lot about the patterns of transmission of disease. You can also look at that. And it turns out four people act differently than rich people in ways I wouldn't go into, but it doesn't have to do with money. It has to do with their sense of ability to explore their environment. But it's extremely reliable because you can actually make census maps of poverty or wealth very simply from telephone data, from credit card data, and their amount is accurate as a census data, which is really quite amazing. And what else are we also looked at? You can predict where development will happen, where you're going to need a better electric grid, where you're going to need more sewage, et cetera, et cetera, by looking at these patterns of behavior within the society. And that, incidentally, is why I called this, let's book on social physics. Social physics is the idea that you can use statistics and data to be able to have better governance. It's why we have a census. It's an idea from France in the early 1800s, when statistics was developing for the first time. And I said, oh gosh, we could actually make government scientific. And that's why we have a census. And it really is. Except there's always so much data you can have, because it's all survey data, and it's expensive and crusting and stuff. So now you have all this sort of data and questions. Can we make data a lot more scientific? Well, we want more based on data. So we've done this in London. We showed we can predict crime, not about criminals, but areas that have had social tension, social problems. And those are places that government needed to pay more attention. Not just for crime, but also for social services and things like that. Again, by times of behavior, changes in time. It's not the way, I'm not going to go into all of this. Essentially, I'd have to sort of brainwash you all to think about people in a different way. Maybe to really sort of understand why this works. But suffice it to say that people have a lot of behaviors that are very fundamental, biological behaviors that we have in common with water animals. And you can sense those. And they have to do with things like how much exploration and predictability you have in your life, how much regularity you have in your life, things like that. So you can do all that stuff. We've done it in different places. A sign that we've done well is a couple of years ago, we were able to convince the UN to adopt data goals as part of the Sustainable Development Goals. Everybody knows the Sustainable Development Goals, right? 15-year goals for the United Nations, reduce poverty, inequality, you know, the greener planet, etc. Well, those include using these big data sources, telcos, banks, Uber, whatever, to be able to look at patterns of human behavior, be able to assess things like poverty, like inequality, like forced migration, etc. In fact, there's 170 KPS. I think people know what a KPI is. Business speak. You've got to know key productivity indicators, right? So the Sustainable Development Goals specify 170 things that every National Statistical Office in the world is supposed to measure regularly. More than every 10 years, at least yearly, using sources like telco data, banks, data, stuff like that. And the reason is fairly straightforward to do that is that it occurs to them that the goals that you have rarely are achieved, particularly when goals are not achieved, unless you can measure progress towards them. And it's also motivated by the fact that the donor agencies, countries, long degrees and so forth, are tired of dumping money into places and not knowing what happened. So there's a demand for greater transparency and accountability. I want to be good, right? I mean, in every government, in every government or country in the world, they sign up to this, okay? So they sign up to measure 170 different things about their country regularly. So that when they talk to the World Bank, they can say, well, see, we want these things. Okay? There it is. Objective data about your own inequality. Okay? But think about that, right? Because what we've said now is a lot of government policy is going to be not exactly dictated, but heavily constrained by AIs or machine learning or even data. Because that's what this is. So there's going to be data that comes from telco that is machine learning thing that prints out estimates of poverty, and that's going to determine whether the government gets along from the World Bank. So what we're doing is we're saying, look, government is going to be more transparent and accountable, but the freedom to do legislation, the freedom to do policy is going to be driven by data. And it's going to be driven by machine learning on data in particular. And there's lots of things underneath that that have to do with validating the data and validating the numbers are good, and you can imagine lots of sort of stuff. But this really changes, in my view, the legislative process. The legislative process used to be one where it is currently one where people say, my intuition is this country, and based on no, except what my grandmother told me, we're going to make this law because it's the morally right thing to do. Sorry for being a little... And we're moving towards a law, a world where it will be normal to have data about lots of things. And the regulations will begin to build in measurements like this sort of data. At one level, you're talking about, like, Skynet, right? In other words, it's sort of a big AI that's like majoring poverty everywhere and telling people what is best money. It's going to be a good Skynet, and a good and bad Skynet, but it's closing together. Everyone know what Skynet is? Skynet is the AI that went crazy in the Terminator movies and started the nuclear war. This is sort of a joke. I guess you have to know what Skynet is. Anyway, so we see violence becoming internationally, becoming much more data driven, and through my AI machine. So that's interesting. And then one of the questions is, okay, so this is good, but I think it's good to have the same accountability, but it's also a little spooky to have more and more data driven processes. And so the question is how do you use it safely? And so we've been thinking about that a lot in my group. I've been thinking a lot about it. Last two months ago, I keynoted the European Union's presidential opening in Estonia because they liked what we do, and they wanted to harangue all of the essential ministers. And what I was supposed to harangue them about at the end was how can you cancel data safely? How can you get people in charge of data about them? How can you localize data? And what's interesting is that the key idea is very simple, which is that you don't think of data as being something that's, first of all, wants to be free. That's sort of a trope. But also something that is without ownership. Whereas for this money, which is against terms of 1s and 0s, it's something that has definite ownership and auditing associated. You take your 1s and 0s to the bank, put it in the bank, they're always supposed to do certain things, they go to audits, what they do with those 1s and 0s. So it's like that maybe. And the key thing, the key observation is that you could build a system where it's a little bit like banks where when somebody wanted to know something about you, you would give them, you would consider it, and allow them to know something about you. Now we'd give them a license to know that thing. So they would have a license to have my 1s and 0s. And if I decided that was bad, I could take that license back and you'd have to get rid of the 1s and 0s. And I could verify that you went back and did what I said you were supposed to do with my 1s and 0s. Just like I do with banks. That sounds like fanciful, right? Or how many of you believe that's fanciful? Yeah, honestly. I think it's fancy. Yeah. For almost 20 years, this is the way it works in Estonia. I mean, for real. Whole, whole tiny country. So what happens in Estonia is everybody has a digital identification. And whenever you interact with the data, with the government, this is mostly government at this point, okay? The identity is checked to make sure that essentially what you do is you give permission to the pieces of the government to use your data and that's a lot. A lot. And you can see what the government, what each element of the government does with your data and why. And you can retract that license if you want to. And you can now do the same thing with most companies. You can pay taxes, you can buy food, you can do stuff. And the transference of data about you is something that you control and can see. And the government can check up on you. So the most amazing thing was, you know, I was in this restaurant and I talked to this guy, he's a technical officer. And I said, well, you know, how does this work with medical stuff? And he left off and said, see, this is my medical record. Can you do that? No. Then he said, and put, this is everyone who has accessed my medical record in the last two years, their name, why they had their justification for doing it, right? Especially their license for doing it. And what they got at. And so he can say, oh, this doctor looked at this because I complained about asthma because he was like, I forgot what it was. But not only could he see his record, he could show me who had looked at the record and he could object to them or he could restrict people. So it sounds fanciful, but you know, there's a place in the world where it works. And in a very sort of odd way, the same was true in China. It's just not forgotten. It's people like Tencent that have all your change. And you have a little less control about it. But it's a similar sort of ubiquitous permission control regime. Anyway, so what that means to me is that you've got this world where the use of data, which means data controls AIs. All AIs are methods of transforming data. If you control the data, you limit what the AIs can do. So if you can log and audit everything that happens and control it and turn it on and turn it off, you can have a potential veto power that can be done. But of course, constraints from society, you really would like to go to the doctor so you probably ought to tell the doctor what your name is. Some things are more forced than others. But the point is, you can get to a world that's a lot more governed by the government of France to set up these systems in Colombia and Senegal where they're going to be a regime like this in those two countries. And one of the things that we're going to do is we're going to have a system like this that we are put together, software, electronic system, including now private data in other countries. And one of the things that people say and I think it's absolutely correct is you don't want to just set up the system and let the normal sort of legal processes act on it. In other words, you don't want to have to have a lawsuit to be able to say this is discriminatory and fight it. You don't have to text them on it. You don't want the judges to be able to decide it either or you don't want to have the data to be able to make a decision again. Well, we get with this little system which is a way of handling data, deciding things and will be a thing that as we set up a human ethics board that is able to consist of all the stakeholders in a place like Colombia that's really interesting and important because we really have more of a stakeholder and what that group is able to do is look at the function of the system on a day-by-day basis and use analytics tools to identify unwanted behavior like discrimination or results that are achieving what they said they should achieve in social. So what you've got is you've got something that is you have sense like an arbitrary process so you've got data coming into a public facility which drives policy and you've got a stakeholder review board that's able and is chartered to regularly audit the function of these data-driven regulations using tools that you know let them do it immediately with a long process and then that of course can feed into the normal sort of little challenge process if you want to but actually they can turn things off they can demand that things be changed etc. So it's to me I look at that and I say this is really interesting because now you've got law effectively law, data-driven law which says okay we want to reduce poverty so we're going to do these things and incidentally we're going to be able to measure whether they're working daily everywhere in the whole bloody country that's like a really different way to think about that and then we have the operation of those things being overseen or unintended consequences by humans who are just relevant stakeholders we sort of patch to this together it's not like this is the formal answer but it's people from the guerrillas from the city, from the NGO from the church to be able to evaluate it which begins to look like an arbitration or appeal process that's now super-driving or automated really different and you can imagine now that there's this whole sort of structure that grows up where a lot of the function of the government a lot of the function of commercial then is both driven by what I quote objective the best you can do, sort of objective measurements, those are audited all the time to make sure that they're okay by the variety of stakeholders and the function of the government is to some fair degree automated by data measured from all the different stakeholders audited by all the stakeholders and that's a really interesting future okay? I also don't see any way to avoid that future except there will be places that don't have humans in the world that's pretty bad but what we ought to think about is as we begin to move into this world which with reasons to believe that we just have to do that challenges and strains and the capacity what we're doing is we're putting humans in the legislature in the role of programmers because they set up regulations if poverty is less than this then the money is more than we know that that's a law and the law that poverty everything may be things that are driven by measurements from telcos and banks and stuff like that that's pretty interesting and then overseeing in a way that sort of now adds to the judicial system like it says there one day every day looking for variations from the intended intended sort of causal functioning of this then where they have a full audit record available to them for automated inspection so the discovery process fully automated you have everything you need to decide constantly that's the goal and the reality will be close that's really interesting and now you can have automated tools because it's done in a way that's beautiful so you can ask what is this discriminatory how discriminatory is is this actually achieving the goal at the desired rate so I'm going down I'll just stop there hope I'm giving you a sense of if there are people looking at this there's a little thing I'll just point out the quote from where I said you know so this type of system makes it possible to use a main in other words the sort of thing that I'm describing is designed not to violate ownership or localization or proprietary or privacy regulations and it does that by only operating the way encrypted so data is not exposed and the only things that are exposed are aggregate things that have pre-agreements which has to be reached of course a pre-agreement that certain indicators that are you know for district level, city level, country level will be made available from your data you think of it as a sort of a tax on data you're going to have your personal data but you have to pay a little bit of the tax which is like today you have to contribute to the census right? same idea you have to contribute to the control knobs that rule in society you don't get to not contribute okay we'll see who she did first she was more skeptical than you so don't know what the position is in the US but while in Australia you can't contribute to the census but there's nothing that forces us to answer truthfully so there's a libertarian skeptical and that's a nonsensical thing because I don't like data so I wonder yes the world that you depict sounds like a good thing but I wonder if it's a world that a lot of people have watched so who people if their data is being recorded maybe they'll just not go to see a doctor so in everywhere that I'm aware of it's already recorded you just don't know and you don't have access to it your doctor records it the nurse records it the drug company if you get a third party payment they record it but they don't show it to you but if your system is based on paper it goes into like someone's useless type of arm they don't necessarily aggregate with that so we have protection by inefficiency so we're so we're reaching a world where we will not have that and the reason is healthcare costs too much certain sorts of things like pandemics require comparing data but the other quote up there is from the CTO of US Department of Health and Human Services it's like the healthcare could be so much better than it is now so much for responses so much everything if in fact you could share data among hospitals and stuff like that and one of the main barriers according to hospitals is privacy regulation and paper but of course they also don't want to get up compared to their advantages but if you could have a world where that was happening you would be able to have tremendously better than cheaper care and I think we're going to be driven about that by the trivial amount of the economy but may I have a friend to that so yeah thanks so I can't help but notice your delightful accent from Australia and I wanted to point that out because I think the basis of your question was what if some people don't want to live in that world with the type of arrangement that Sandy was just describing a type of arrangement that given a lot of thought we could arrive at to successfully transition to a digital age you could call it an application of a social compact in a way to somewhat new conditions and there's a balance there that he's proposing I would just like to reflect not as a challenge right now but as part of what we're doing here October 30th and 31st we want to convene a proper discussion with the legal industry is what if people don't like this particular balance it's a good idea it's one of many we have heard or thought of all of the ways that one could balance the inevitable transition can you think of other ways that it could be balanced I remember I think I was required at one point to accompany all my friends to the polls in Australia when it was time to vote there's a mandatory voting in Australia so there's a balance that has been chosen there and so there's more than one way to slice this the real question is to answer the question what if people don't like this or that flavor how shall we balance it what role ought the law to have in refactoring the social compact for digital age I think upon deeper inspection I encourage you to really take a look at what Sandy is saying because in some ways it's so simple there's a lot of thoughtful balance there but maybe you can think of a better one one of the realities is that there's this very strong pressure for participation because of the public so what happened everywhere but you had a question yeah I was wondering you were saying how the tax on the personal data could it be a default that everyone's data would be fed into a system and then it just popped out I was thinking such a barrier to entry of someone actually going into the system and saying yes I agree that I want my data surely or it feels to how it's tied into you have some other kind of to be just because there would be so much progress lost if you didn't have access to huge amounts of data that push forward and all those so there's a number of issues there so opt in versus opt out is an example in GDPR in Europe you are supposed to have to opt into things but the government thoughtfully exempted itself from that so yeah right in countries like Estonia and India there's a lot of efficient processes that require the digital identity of being walked and so those are pressures that make it very difficult to quote unquote live off the grid and then that brings up another question which is well so if everybody's data is available to the government what happens when the government is an authoritarian government okay and I reminded of a gentleman who said my father was the head of the national statistical office at the beginning of World War II but he got up in the middle of the night and burned down the ministry his own ministry because he didn't want those records to fall on the hands of the Nazis so in a world like that it's fine if the government is a positive government perhaps but it's also really scary if it's not and is there a way to pull the plug not just individually but almost like credit unions incidentally are chartered to manage your personal information and personal identity I don't know why it seems like a completely bizarre thing but that's in the law but it is in the law and so you could imagine that all of your digital data is deposited with your local credit union that manages it for you in the best possible way according to community boundaries or something like that and is able to in some times pull the plug on things that it doesn't like which incidentally is sort of implicit in the GDPR so in the GDPR you opt into things but you can opt out and they have to get rid of the data and you can move the data to somebody else and you can get explanations for why things happened so you can get an explanation of the algorithm that went through in human understandable terms so that you can make decisions like that so what this is doing again is this analogy of money where today you deposit your money ones and zeros and there's pieces of paper and things on the web you can see about what's happening with your ones and zeros your money ones and zeros there's a lot of expectation about it and you can pull it back out and give it to somebody else so the idea is this is a little bit like that for all the rest of you since you sent your name out but not on a federally charted credit unions we did a project to explore this in here next and what we found was astonishing which is that it's much better than what Sandy realizes in the United States not only are authorized as depository institutions checking you know savings as linking as well but also like other private banks you can have a safe deposit box now we have a direct corollary to my personal property that is not currency it's property and a safe deposit box the jewels whatever the special poems whatever I want to put there it's personal property not real estate and there if I do share that's the type of duty that they have towards it so now you have a puller of the plug that actually has our guards and like marble and security and auditors and examiners and staff and whose duty it is under the law and you're backed up by the full faith in the United States in various ways to protect our property and they can make deals on our behalf so there already are accepted business activities that regulator and CBA allows any U.S. credit union to do which includes a virtual safe deposit box you can find one of those at DCU so it's like the componentry of it is there regulators have allowed for a division of personal data ownership and identity ownership and if you just imagine because of the plenty of things what is a credit union with identity good question but if you think about it practically when I came to realize if you're going to allow home banking to be bought in, if you're going to require know your customer what are they checking against and how do they keep a record of that if you're going to allow a mobile wallet at some point there's a digital credential that authenticates, identifies against a record and a bank in this case a member that's a co-owner so they're in the identity business the digital identity business and they already operate as a collective on behalf of a co-owned entity it's the thinnest little nudge a new deal on data along the lines that Sandy suggested probably worth a try I think we have two more questions sir in the back Sandy's got to move and we've got some more people to hear from after that how would you see this council that manages data right now the balance of power for data as you mentioned is very biased towards companies and the big players and it seems we don't have the necessary governmental leverage especially with lobby to actually turn that around so how do you see those councils actually turning into something that effectively can manage data and let's say enforce something that will be followed by or Google or Facebook or are the big telecom companies so here's an A I don't know what we did is we patched something together from concerned NGOs government representatives and stuff like that but I don't know what's the answer I think it's a really interesting opportunity because today's political systems are broken in various ways and it seems like they are not the best candidates for overseeing something like this is there something that's a broader sort of more responsive set of people I think that's a really important question how couldn't we do that I just point to I'm not an expert in this area I don't have a strong opinion but I'll just notice that for instance when the internet was a governance of the internet was set up ICANN ICANN was not government people I believe it had government representatives chartered by the government and given a role for oversight but it was not of the government and it had people I don't know how it selected people originally selected a bunch of people from you know concerned stakeholders and as it developed that stakeholders changed which was not as critical and then there are other institutions that are interesting so for us this international communications sort of funny cross-government organization that's supposed to control interoperability between digital information and communications so I think there's a lot of options that are open but that's something we ought to think about now is how are we going to set up something that is has the sort of responsive aspect that you like because you want things to happen fast it's really important in the digital world and you want things to happen in a way that really represents stakeholders not you know the sort of longer term confrontational process that you see for instance in legislation which was last one so I have a plan so I'll show you a vision of the future I think it would be a fantastic example of control over data but I feel like this particular vision is going against two strong forces one in which we might be looking at the American sports market as a witness program which are usually somewhat designed to mask people's identity but eventually was thrown out in favor of just a mass collection system and so I wanted to know your thoughts on that and whether particularly the Internet community is favorable to masking information and secondly the other reports is one in which you know the international each country obviously has its own rules and regulations around scoring information and so you know how would we be able to comment on I suppose the regulations on data access internationally that look like more of some way how the Internet and then as a structural how do you think we're doing on the inequality side are helping with that so you know the first thing is not to let the president get in the way of the future yes there are things wrong with the president but again I'll point to GDPR GDPR is very unpopular with companies but yeah it's going to happen and it's not going to be completely watered down right so it's possible to move things back and a lot of that has to do with the fact that a lot of the important industry is highly regulated and licensed so the government gives them permission to operate that's true of people like AT&T that's true of the city there and so those people have to actually read some data where the regulators say yes there's lobbyists and stuff like that but don't kill our people the real I'll say from putting the question we have three questions so I think that a world where you don't get permission for the data to be used in that law is possible we think actually in the end the thing that pushes that is concerns about security we've got enough hacking things like that we've just begun to see that it's going to get a lot worse that you need something that's far more secure basic architecture and one of the fundamental flaws is protecting all the data in one place that's not that's like in the 1500s they decided to stick armies inside castles and I didn't work very well at all because it only took one guy on the inside to lose the whole war the fence in depth is what people use now you scatter things around what they have to say is not to be attacked lose everything and then they want to have as quick and sure access to things so if you have a fully logged, fully permissioned system not only protects you but given the correct mechanisms to accord that gives them assured access to things because it would be a key that's the court ordered key to get at things and all that log data would be there so it actually might be a better world from the airplane but notice that you can now see when the key is used so we could monitor say the FISA court maybe not trivial but it's at least possible to log it raises other sorts of questions but concerns about security and the goal of producing a more vibrant economy will trump the sort of simplistic big hammer things of oh yeah I just put all the data in this spot and then you talked about inequality controversial to be here fucking very controversial very controversial so there's a bunch of studies recently that show that in the United States at least one of the primary causes of inequality is that people like you marry people like you okay so it used to be that educated people married uneducated people rich people poor people not anymore it has to do with physical segregation which has increased dramatically not racial but income and people of one class just never meet people over and over again if we went back to the way things were in the 1960s our inequality would drop to sort of the bottom of the OECD ranks that's how big an effect it is that's the way to do it so that's the other thing that gets me out when I got into doing things in developing countries and so forth it was very common to say half the people in the world have never been at home guess what 90% of the people in the world now own a phone the digital phone that lets them send text messages and receive data and in most parts of the world it's cheap if not free and that's in 20 years that's the fastest change in access ever in all of humanity and if you read the UN statement we put it up World Accounts makes the point that until as little as 10 or 15 years ago you saw cases where a pandemic would kill a million people and no one in the capital city would know just think about that a million deaths and nobody would know why because they're out there in the back they're out there that's what people say that's not possible now we know we know what's happening you're confusing things there are two words about it we're not saying Puerto Rico is good but a million people did not die and are unlikely to die in Puerto Rico and there's TV news cameras I'm talking about people literally not knowing that a quarter of a million people die in Ethiopia my first job was because oh there was a crop failure in India and 2 million people have already started to death before people noticed 2 million that's orders of magnitude different don't trivialize that in the US we tend to be concerned about our lattes and stuff like that rest of the world is not like that the babies die the transformation of digital access has been incredible the number of babies who died has dropped by something like 80% because of the access to information and access to health so folks think about that that's millions of babies a year look at what he's doing look at what he's doing Puerto Rico is bad 10 million babies did not die okay let's not confuse the two okay I'm sorry to jump on your language in the US we tend not to pay attention to the fact that we are a tiny segment of the population enormously privileged and the babies die everywhere else okay I'm a lot more concerned about the babies dying sorry can you bring us home so if you I said I don't know oh you don't know so what effect does Andy Pentland from the past when when asked by way of transition to Dan Hartle when asked what do we do within the law and regulation and governance is in and judiciary to successfully transition to this digital age where the sea change difference is that you're outlining one of the things that you mentioned is the law has to start to express itself as digital information and the criteria for compliance needs to become more data driven and this needs to be part of adaptive type systems well it just so happens that the next two speakers are going to talk a lot about how we do that Dan Hartle's got a terrific general API for verification methods of data so we understand data so we can have some integrity of the data that we're relying upon the law we're going to find out more about what it would look like when all the law itself expresses itself as data and is service from Hartford Law School so I understand you have to rush off but in spirit let us continue exactly in this thing and then at the end you'll have some announcements about the next class that Sandy, thank you for laying down the gossip thanks sorry I have to so I mischaracterized significantly Dan's API and what Dan's got to talk about so you might be introducing yourself and just highlighting what you'll show Dan Hartle I'm CEO of a company called Context Labs I've got about five slides and a little bit of a demo to show you in some sense what we were doing at Context Labs had a bit of a seed here I rolled back to about four years ago and we helped organize a conference I guess Identity Trust of Data and for me this is about my eighth and ninth company it really lighten me the next big thing that's what we're trying to work on here that's where I learned about blockchain this preceded the block the digital currency initiative I think this was really the seed for what the DCI became at that conference and it set me off on the deep dive into this particular company so I'm going to talk about that and the reason I'm here is because we're interfacing with Michael Casey and Neha at the DCI the digital currency initiative also predating something called the Open Music Initiative which we co-founded and we did an event two weeks ago called Hack for Climate and we talked about this thing called Proofworks so Daza asked me to come and talk about it so I'm going to talk about that just give a short overview on what it is and then we'll get a little bit of a demo so basically Proofworks is a component that we built in this platform really quickly so I'm not going to go into detail on what that is but basically one of my takeaways on the Sandy view of the world is how everything is data even physical things are data now and we sort of formed this point of view about supply chain that everything is a supply chain intellectual property is a supply chain but the dissemination of an idea which becomes a pattern is a supply chain just as well as physical things are in a supply chain so we essentially decided to build this platform that tracks things through a supply chain and because of the point of view that we've got also that is the internet has done some harm to the planet I think and I referenced this as I have to use this pendulum where it swings back and forth and over on this side it's where inventors or artists or musicians live and it swings the other side of where consumers live and what's happened with the internet is there's players in the middle now who take a large percentage of the margin so I don't think that's right so this is sort of combining these identity trust and data elements to sort of fix that problem if you think about it if you've been in the software business for a long time like I can question why does Apple take 30% of all I have right now but if you're an underpriced you might think that's just the way it is I think it's still unfair so I think this kind of technology can change that so what we decided to do is build this immutable platform and what I want to I'm really here to learn a lot today about the legal side I'm really here to understand about the aspect of economics we didn't design specifically for anything that we use legally we focused on environmental data publishing it can be deployed for patents electronic discovery use and fake data I think it could actually change how facts and circumstances are the only bunch of companies that pay a bunch of lawyers to do a bunch of stuff so I think there's a way to validate what's happening as a fact and circumstance so I'm going to talk about what ProFORCE is basically it's this way to have this irrefutable immutable record of the existence of something and it also folds back in if you've heard what Sandy talked about another thing that just plugs directly into it is his social physics work where the world's an interconnected entity it's this being that's interconnected and the reason we call it company context labs is because it's a context to everything and if you think about a given document for example it can exist in space what really matters is that document's relationship to the person who's using it so that's the edge connection between the two so what we're doing in our companies intersecting graph theory is watching so there's a component to re-read the data if I think about big components of what this is and how it applies to the legal profession there's a massive ingest engine bringing all the data consider a proof work to be this filter on this ingest of data so that the data comes in as a velocity and a province of data as it's tracking time one of the reasons this evolves is because you've got a client who's concerned about fake data if you look at what the EPA has done for example data has disappeared how do we know when it comes back is to re-read the data okay so I think about yesterday there's a bunch of fake news all over the place there's no accountability for that because there's no system like proof works to provide accountability for this is like the reason behind we built the thing building it so it's an API it's a tool and we can register what we call multi-point proofs so there's one way to think about this is proof on a document because existence is to sort of a time stamp that says okay it exists but that's for a node so I think the way, maybe the way to think about this is let's just talk about a document for right now okay because it could be other things, it could be design files, it could be images it could be all kinds of things but that document exists in a context so what a multi-point proof does allows you to actually build those edge connections to those contexts and then that is what you would cast that like entry proof files which gives you what we call a proof idea so I'm going to show you how you do it now the thing about these is uses standard cryptographic hash technology ZJ-256 which you know what are you talking about not here but we're going to talk about this later so it gives this verifiable through the record of what exists when it existed and the key thing here is you track at the time I'll show you the environmental defense bond site where for example a file it's sort of like signature this is the same for example does it post a file and then I'm going to do it I want to like manipulate it and post it later if I just change the name of it it's still the same file because it's the same but by hacking the cryptographic hash should be different so you can identify manipulation of data and this has uses in so many ways it's just crazy in a lot of ways and by the way when we did the company we didn't think about it in the legal environment we were saying but we didn't think about it in terms of providing records documents of records for the data so then it gets registered on any number of ledgers I think it's a blockchain ledger you can also post it on social media Twitter, Facebook, etc so the document is so for example yesterday the LAPD should have said Tom had his die it didn't happen there's no accountability there's no data integrity etc so it's just kind of like a real example of what's happening so this idea so this connected graph of veracity and provenance so there's this proof of integrity and that is when you look at this sort of interactive graph of proof what we've built is sort of this configurable proof engine where you can basically say who actually could be sort of the good house could be sealed and proven for the given data but for environmental data for example media you think they're a reliable source of data so that's one component the other one is the proof of existence you pull all these things together the graph shows the nodes and the edges being pulled together we register to the blockchain then there's, this is a component of immutable so I'm not only going to talk about proof works because this goes into discussion about what is blockchain a lot of people call it a database I'm going to give you a sense it's probably not a database it doesn't have its own structure but there are components of it that could be used like a database so if we have the demo so here's how it works the demo is amazingly underdeveloped because it actually is an API you can bolt on to massive datasets which we've got basically you drop the data into the proof works engine and basically it gets so two examples environmental defense fund, there's a new publishing thing I'm just going to show two things I'm going to show the EDF and I'm also going to show this proof works by itself so to do that the first thing I'm going to do so this is just a UX for the proof works API so I'm just going to, here it is so it's not launched yet as a standalone API for customers that we have so for example we've got a customer that's about 800 million books a year they're all in crypto okay so they don't sit there and do 800 million drag and drop as an example I'll just take a file there's your proof okay so see how underdeveloped this demo is it's not sexy at all but if you're a cryptography guy or something so it takes so we can actually configure who our proof points are so right here for example that moment in time we use these other these other sites as sort of validators in the proof chain and you can see what they have these are all arbitrary truths it's actually a time stamp validators that's what you need yeah at various places so you get that graph context it's not just it's not just a single context it's the aggregation of all those and it says okay I'm real they all sustain I'm real I exist and if you use me again I don't have this cryptographic hash I'm not real it's the fundamental thing about so not a sexy demo but the utility of this is great I didn't get that part yeah yeah so I'm doing like the short version like so this happens designing who, what, when, and where so when I talked about immutably that the supply chain can track data and things so like books do supply chain so another thing we've got is this what we call a basically it's a configurable context state engine so you build the state machine that describes where things are going and who, what when, where is this location okay so basic thing one other thing I'm going to show you is how does this manifest itself okay so we're just going to EDF this is a site a platform we're working on with EDF and I'm just going to there's a whole bunch of stuff I'm not going to show you I'm going to go to sources so remember going back four years identity trust data open data how do we track the and the problems of open data this technology is being used right now just there's three levels of consistency check this is only scratching the tip of the iceberg number one can you can you access it number two it's got the same bite signature basically number three is a deep consistency check so imagine like an Excel spreadsheet with 600,000 cells okay so each time that file appears you would check it's hash and it's the same thing Sandy mentioned you don't need access to the data there's privacy on the data you can insert obfuscate the ownership of the data you can anonymize the data but you can always have this authenticity about that's my, that's my, that's my lines okay great you got it so yes and now could you take the microphone and your brain most importantly and sit right here and help me begin to talk about how we can apply this directly to the law itself with a Harvard law Harvard law someone from Harvard law school don't dare call me a professor I can't say the same thing so oh actually I'm sorry we should actually move that so this here so we've asked Adam to join us to talk about a Revolutionary Project that is something I dreamed of and people laugh when I said it here in this room it's why can't all of the law be accessible in a digital form that we can use why well for you it can and that's what you're doing to understand so if you could introduce yourself and then join us for a dialogue about how, what would be the components in order for any civilization writ large to rely upon this what kind of verification and what kind of social compact would be part of sounds great bless you so my name is Adam Ziegler I'm the managing director of the legislation lab at Harvard Law School we are a group of developers and designers and lawyers that are working there with me for a second to in a lot of different ways but one of the things we're doing is making legal data and in particular we're focused on case law hopefully you all know that legal the law is expressed primarily these days in three ways as statutes as as court opinions and as regulations and we're very focused in our group sorry just for the the display here on judicial opinions judicial opinions, yep so our project is the case law access project the first map yeah the first 166 parts scale again it just does yep no it does see here comes and if you keep talking I'll help us okay so if you're talking about judicial opinions you got and you're trying to create you trying to transform judicial opinions into data you've got two basic problems to solve one is at least in the US one is all the data we've had in the past all the court opinions we've had in the past which as you know are all in paper they're locked in books they're locked behind paywalls owned by flesh law Lexis and Exis and others the other problem is going forward what are courts doing every day as they generate new court opinions yep what are they doing they keep putting them in books believe it or not so the first step of our project is to take all of the historical court opinions all 6-ish million of them that have been published over time and digitize them and I don't have the video to show you yet but basically over the last three years we have been very rapidly scanning every court opinion every published in the United States but state and federal court at a clip of about 100,000 pages per day and we're just about complete we're just talking about complete with that process it'll be done by the end of the year it's all court opinions whether well court opinions themselves are public domain they are edicts of the government publishers as your question anticipates like to embed in court opinions certain copyrightable content like head notes, annotations, things like that that's what we do is we scan an art poll context labs technology so so close so close get back in the other boat and just show slides it's a PowerPoint part it has nothing to do with art very much yeah all right so so we take these books we have 40,000 of these books in our library basically comprehensive set of all court opinions ever published and what we are doing is scanning them first we pull them out of a giant warehouse and we take them apart remove the binding I'm going to give you a real quick sense of the process here very quick we guillotine them to loosen the pages and I do work in a library and this absolutely drives most librarians crazy but we don't do this for rare books we only do these for the ridiculous ridiculous books we run them through our scanner and this is not sped up it's about 100,000 pages a day we then preserve the book once it's once we finish scanning it using a meat packing machine and we ship the sealed books out to this is our library but we ship the sealed books out to Louisville, Kentucky where they are stored long term in underground limestone mine just in case we need them someday so we take the images the 40 million or so images produced by this process and we do three things we do OCR to extract all the text from the images and to extract each case into its own file we apply a standardized schema to give structure to each quarter and then we redact the head notes or other editorial content that each case contains and then what we have after that process is a couple of different types of data one we have an XML file for each volume we have an XML file for each independent case we are all structured as of this morning it's about that many cases we also have an XML file for each page that's used as the ALTO standard which is a page based XML format that includes the coordinates on the page of each word which allows you to use really powerful things and we also get images and the images are redacted of the content that a publisher might claim are copyrightable because who needs that stuff we just want the course text alright so that's how we make this data for the historical data set and I'm happy to talk in more detail to anyone if you have questions afterwards now what are we going to do with this data this is where it gets really exciting we are going to share this data publicly for free exactly so we're partnering with a vcback startup or used to be a vcback startup in rabble purchased by LexisNexis but we have a great contract with them that ensures that this data will see the light of day as originally intended and we've got various mechanisms to do that but bottom line is we are going to make this data and rabble slash LexisNexis is going to make this data available to the public for free for search for api and for bulk access the bulk's the trickiest piece because there's a little bit of a time limitation on that but if you're a researcher in an academic environment you're interested in bulk access to this data come talk to me and I can definitely make that happen with a little bit of lead time I'm interested, awesome so we are working on this right now some of the data is publicly available already through rabble other data will be available through our apis which are kind of at alpha stage right now so if you're testing take a look at that and reacting to those we'd love your feedback but then I want to get that's sort of past history right, a lot of work a lot of money, a lot of time truly important for us to capture as data the entire history of US case law but let's talk about a little more relevant to going forward is how do we stop, how do we get to the point where we don't have to scan these damn books anymore to get our data out of them and that is all about more digital this is the second phase of the project we're working on from an advocacy and design and partnership standpoint with the courts we are trying to partner with courts and in some cases with legislatures to help incentivize courts to change the way they publish opinions to publish in a digital first format this is a quick screenshot of a super lightweight prototype we built for the Massachusetts Supreme Judicial Court and the Massachusetts appellate courts which are the two appellate courts in Massachusetts basically they have this system they've been using for a long time where they create a paint, the judges create opinions in word, they finish those opinions they ship them as attachments to an email to LexisNexis LexisNexis editors put them in a page proof format, email back PDFs of those draft page proofs then folks in the courts then hand mark up in red pen those page proofs, email them back to LexisNexis, LexisNexis makes the changes and so on and so forth until the courts are happy with what they've got and the result of that process is a finished opinion then goes into a book a bound volume that very very few people buy very very few people have access to I mean Lexis in Massachusetts is Lexis they changed about five or six years ago but current contracts with Lexis, California is also with Lexis, many of the other states are with West, but generally this is kind of a model for the way it works in most states, some states are ahead of this but I can talk about that but basically all this work is done to produce a final opinion that gets put in its official form on paper in a book or behind a wall what we want to do is to flip the switch so that all of these courts are publishing their official, authoritative verifiable opinions to the web as open data from the inception and so we're working with the courts, Massachusetts is one of them to try to help them understand how to get there try to help them answer some questions about what to do, one last point on the federal courts because I mentioned state courts so federal courts are actually a little bit closer to getting this sort of right there's something called the government publishing office which has a system used to be called FDCIS it's now called doveinfo doveinfo.gov and they actually have a pretty decent repository of mostly PDF but they're digitally created PDF of most of the opinions from most of the federal courts it's not comprehensive, it's not mandatory sort of a voluntary opt-in thing for the courts but it's getting there and so we are actually advocating through Congress now to try to get all the courts incentivized to put all of their opinions in this system and to make it all available for both downloads so we're getting to the point where we can see all court decisions published in this country on the web as for digital data which would be very exciting it would be a whole new world for anybody who's thinking about access to law I think Daza shared a link to an article I wrote a few months ago talking about the key characteristics or attributes of a digital first publishing system so if you're interested in getting involved in this and helping with this supporting this take a look at that article and you might see some some ideas including ideas for digital signatures and verification which I think some courts would say needs to be part of this system for them to be comfortable in adopting it not all courts would say that but certainly it would help to have that that module as part of this system we don't know each other I want to shake but this is like a hugging moment so this man and Harvard Law School just to be clear when we're talking about transition to a digital age in a way that's proper and legitimate from a legal perspective some of it's just painstaking refactoring and not missing things that you shouldn't miss and carrying forward that which must be carried forward with all the complexities and challenges that you mentioned like just one by one you stare it in the eye and you're walking arms with people of goodwill and making the negotiations and churning this through when you see that moving we're watching the wheels of history process us from an industrial to an information agent you're churning a critical one you know and I just think I want to just thank you for what you're doing thank you for that but it really goes back to a lot of work that mainly librarians at our library have done for centuries in collecting these things, preserving them recording where they are I'll tell you one quick, one really quick story I showed you that warehouse where all these books are, that's the Harvard Depository it's this enormous warehouse out at Southborough sort of south of Boston there's millions and millions of volumes maintained by Harvard University and all of the books that contain these court opinions are in that depository and when we started this project we thought okay we kind of know where they are it would just be a matter of pulling down a few shelves and no big deal we'll just ship them all in a couple of truckloads turns out that those volumes of reporters over the years were used as filler in trays every time they got a tray from anywhere at Harvard University they needed one or two more books in it they'd pull one of these off the shelf and stick it in so they were scattered all over the warehouse for good reason because nobody expected anybody to pull them back so we had to really send people up on a chair to take these things down basically one by one over quite a long period of time outstanding so we're going to I'd like to ask a timing question and I think as far as I'm concerned it's a proof that Sandy kind of went on a little longer I learned a few things question how many people need to leap at how many would be let's just start with that, who has to stand up and walk on a 230 is there unanimous consent to go to 240 so that we can properly gain the wisdom can you stick with us until 240 it is I think there's you you and then you why can't we all hear the question so based on just that description of the volumes and how much there is do you think that digitization of data might make change the way precedent is actually found is surely nobody could have been reading all of this and might this have like knock-on effects in terms of how the legal system is living directly I think my sense of that is yes but obviously we don't know precisely how the answer is yes but we don't know precisely how one of the interesting things about the legal world is how little exploration has been done in terms of analytics in terms of like deep data driven insight simply because the data wasn't available the best you could do was read documents sort of one by one as somebody outside one of these large publishers you couldn't even get the data to do large scale processing and so that's what this project and other projects that are part of this effort are going to make possible so I had no doubt it's going to lead to new insights maybe we learn you know in the law there's this pattern you see of courts or scholars describing things as the majority rule and minority rule which is a intuitive sense that most of the time courts go one direction and the minority rule sometimes they go the other direction that's not based on data that's just based on probably people's sense of authority and reputation we may find those majority and minority rules are reversed or flipped or there's some other rule out there so a lot of interesting things we're going to learn from this data for sure but we'll see what it is and notice that would be an application of data science influence network and some other analysis so so you talk about turning the corner and going forward with the United States of the courts legislative bodies were publishing everything uniformly and digitally there was a woman here from Australia my understanding is for over a decade the Australian legislature and the Australian courts have been publishing integrated sort of an HTML way do you know anything about that is that a possible model I don't know anything specific about Australia I know it's certainly the other countries who are ahead of us Canada for example does a great job with its court in terms of making those available as data since it's Australia's ahead of us I don't know the specifics the integration between legislation and court opinions to me is not essential I think it would be fantastic if we had it but in many ways at least at the federal level legislations do it pretty well actually the data.gov efforts have improved things quite a bit on the legislative front and one of the arguments one of the attempts at persuasion we're using with the courts and with legislatures trying to compel the courts to do this is to demonstrate that the legislatures have already made the switch I don't know as I said the legislatures have already made the switch to publishing their law as data so should the courts be doing it too that's kind of the argument one of the arguments we're making to try to urge this on then one more question before we go into tying things together and then forward to some of your projects and then we're close to that thank you very much it's been a pleasure to have you both here today so thank you for coming I had maybe a quick question with regards to the processing you you said about the time limitation for the bulk access was that just it takes some time to get access or is there only a limited period of time which bulk access is available and if that's the case is that contractual with LexisNexis is that something they're trying to maybe close down access or refine it further before they release it to the public again and then with the redaction of the information was that because Harvard Law School was sued or is that do you not consider this a transformative kind of use of the notes on the cases no suit it's just a recognition that what we're focused on is the words of the court the law itself and while institutions and editorial heads have great value in some contexts and that's really not what we're focused on we're focused on the law itself that's why we in terms of the restriction in the time period I've put out some blog posts and summaries of this that give a little more detail so take a look online to get the specifics or find out on our website and come talk to me and I'll explain it the short version of it is that from now until 2020 what? Rabble paid for this project they also committed to making the data publicly available for search and for API access for non-profit developers and they committed to creating a market for commercial developers to give to this data we have to enter that market in exchange for those commitments they received a short-term exclusive commercial license to the data so that only they can commercially exploit this data through 2024 come 2024 all the data will be completely unrestricted and we can give it to anyone we want prior to that we can give bulk access to any researchers that's why I mentioned affiliation with academic institutions and so that restriction will be impediment to scholarship to research to moving the ball forward in addition in some ways this is the most interesting piece that ties all together as any state or federal court makes this switch turns this corner from what they've been doing to what we want them to do their historical data from this corpus also becomes unrestricted so if six months from now California or New York or Massachusetts switched to digital first publishing that would effectively unlock their entire historical corpus right away rather than a time to wait for 2024 so we're going into these states into these courts and saying not only can you do the right thing going forward but you can also unlock for your lawyers and judges and clients and citizens and entrepreneurs the historical data as well so worst-case scenario by 2024 the data will be totally unrestricted dynamite so to start to wrap this together one of the things we're hoping to begin with this class Gaben Nostine, Sandy, all of us is a dialogue and engagement on refactoring the law for digital age and so part of that is just not much true what happens what is the authoritative law and who has authority to publish it and what if you rely upon it for example something I want to point out as a thought something else is but I had a person called UELMA which is the Uniform Electronic Legal Materials Act thanks largely to law libraries and other constituents who wanted public sector legal materials like statutes regulations, case law executive orders policies to a lot of the authoritative version to be published on our website and there's three criteria in Uniform Law one of them is authenticated and verifiable widely accessible like when we post something in a newspaper and those are the two most important ones I would like you to just think about I know there's two or three projects that directly relate to publishing legal materials and dance of API which we want to utilize on a few projects carbon registry coins and public notices and a few other things provides a mechanism for this how could you utilize how would you meet the requirements of UELMA to publish law going forward so that a legislature, a judiciary a regular agency does have a repeatable workflow what business looks like for a service or a product design it's right there and I guess to put it all together you were asked how would this be different well one thing you can imagine is if the law was accessible you took the headers out shameless craft and open go foundation crowd sources headers what do most people think this section is about and for citations and references and analysis and cool workflows that help people comply with laws and discover them and manage them all these things that could actually be much more open we think when the law is this is discoverable and exists digital information that we can rely on so we just want to put a few of those things together and ask if you can join us again on October 30 31st and help engage a broader discussion about the project and then get more people involved in using it and maybe starting to think about models and projects and maybe new companies that could propel adoption and then really boost it going forward with that yours has a few announcements about the class and next steps alright thank you everybody it's been a delight to have you here for the handful of open sessions for future law this is the last of those open sessions the next thing that's a general accessibility will be on October 31st the future law legal symposium, legal forum also here in this location but for those of you who are taking the class for credit please send an email to me you know one of the instructors should be on the seller side selling out your tent of working title for your project number one number two in the remaining three weeks before the 31st it's a vacation period here it's essentially conflict next week's vacation so we have no public forum sessions but what we do want is for those of you taking the class for credit to meet with us the instructors and that's tuning in on your final deliverable projects okay so it's been excellent having all of you here as listeners and as participants those of you doing credit stuff there's a little extra does he have that there we go and this will all be on the seller website and sell an email I said earlier today thank you okay great so those of you that wanted to come after us to have a little bit more chat and have some more questions if you haven't completely run your clock so it could take one or two minutes at least to exchange and oh you might have lights and going forward while this is the last class there is I do have a commitment thanks to several of the active students here to to keep broadcasting the tutorials and the presentations that arise from this class and that come out of this next engagement with the political forum in the future semesters on the slinerlaw.mitidu YouTube channel so if you want to keep going online you're welcome to do that and then stay in touch on your projects so that we can assist you with them and then perhaps present them at the legal forum that would be the perfect outcome okay so thank you very much we'll be right back