 We are recording so it's to fun whenever you are ready, you can start. Great. Thanks for joining another episode of the Summer of Open Data. My name is Stefan Verhulst. I'm the co-founder of GovLab. I'm delighted today to have Audrey Tang join the Summer of Open Data. The Summer of Open Data is an effort to really look into the current state of open data and data collaboration. Specifically, we are interested to learn how traditional open source principles are being applied across the open data ecosystem. Also, very interested to learn how we can unlock private data assets in order to inform public interest decisions. Obviously, how can we do all of this while ultimately establishing a new sense of data responsibility and data equity? Today, we will discuss some of those issues with Audrey Tang. I have the pleasure to ask Audrey Tang to introduce herself. I mean, Audrey, you have a long list here of titles including Minister Without Portfolio, but that might have changed since the bio was shared with me. Audrey, perhaps if you could explain very briefly what your current position is, what you're seeking to do, and then we can dive into the Summer of Open Data questions. Okay. I'm Audrey Tang. My current position is International Advisor Board Member of the Gov Lab. Well, I'm a board member of Digital Future Society Radical Exchange, the Council of Democracy Foundation, and my day job is Taiwanese Digital Minister in Charge of Social Innovation and Open Government. Great. Brilliant. Thanks for indeed alerting everyone that you're, and we are so delighted to have you on the International Advisory Board to Gov Lab. Let me start with COVID-19, which is of course on everyone's mind. If you look at the coverage of Taiwan and COVID-19, quite often we see the headlines that Taiwan has hacked the pandemic. What do we mean by hacking the pandemic? Specifically, what was the role of open data towards hacking the pandemic as such? Yes. I think it's been more than 100 days since we have the last locally confirmed case. We're firmly post-pandemic now. What we have done is to make use of the open data ecosystem to make sure that people can build their own applications to give participatory accountability. For example, I just went to the convenience store, collect a pack of nine medical masks, and each pharmacy currently distributing these medical masks provides updated, like every three minutes, accounts of how many adults mask, how many children's masks do they have installed. If you go to the pharmacy, swipe your national health insurance card, you can with more than 140 visualizations, apps, voice assistants, and chatbots, ensure that the person queuing after you will actually help you check the validity of the system because after a couple of minutes, they can just refresh and see that you're an adult. The pharmacy stock reduced by nine in the adult section. If you're a child, then it reduced by 10 in the children's section because that's a quota every two weeks. Of course, nowadays we've also opened up a free trade purchase of masks on the market, but for a while during the pandemic, it was rationed and trading was banned. Great. You're also really known for actually really tapping into a collaborative spirit, which somewhat gets referred to on the open-source way of going about solving a public problem. Tell us a little bit about how you applied, having spent four years in the open-source movement, how you have applied some of those open-source principles, two-by-three, the current job that you are holding. The main insight of the open-source movement, when they first started forking from the free software movement, now it's merging back in. But when it started to fork around the turn of the century, the main insight was that it doesn't have to be a human-right argument. It could be an economic argument. For example, Nascate Navigator, at that time a proprietary private database and co-base. If they publish both as public commons, then it enabled not only the Nascate Navigator team to implement features, but also would enable a lot of extensions, a whole ecosystem of people gathering around the code base, which the open-source community called Mozilla. Nowadays, nobody uses Nascate Navigator anymore, except in emulators, I guess. But we still use Firefox, and in fact my phone still runs the KaiOS, which is a direct descendant from the original Mozilla and Firefox code bases as the operating system for the Nokia phone. And so the main insight is that it reduced costs for everybody, and it can generate unexpected applications. And both are present in the open-data, data collaborative landscape as well. If you open up the data from the private sector point of view, you attract people who improve the quality of the data, who would then go to you and say, if you apply these principles, these tools, you can have a higher quality of data. And that's the hardest part. I mean, it's not hard to increase the velocity of data or volume of data, you just buy more sensors. But the veracity, that is to say the quality of data, can only be improved by careful curation. And like with the open-source saying, there's a saying by Linus Tovats that says, when there's enough eyes looking at it, all bugs are shallow. And that is the first thing is about the veracity. And the second thing is about enabling unlikely unforeseen applications. For example, when we convinced the pharmacies to publish those data, nobody would predict that people would build analysis systems that basically analysis the over-supply and under-supply in selected districts. Whereas most of Taiwan, we have a pretty good supply demand match. In certain areas, there's no such match. And people would then investigate and get to the bottom of why. For example, people work very long hours. And so by the time they got get off work, all the pharmacy gets closed. That's why we need to partner with convenience stores. And for example, there may be unequal distribution, in which case we'll have to enlist more health institutions, other than pharmacies in that particular area as well. But all of this is built upon this kind of like distributed ledger. People seeing every 30 seconds at a time which pharmacies are indeed suffering from over-supply or under-supply. Cool. And so focusing on data collaboratives, Audrey. So last year, you engaged in an experiment around a presidential data collaborative initiative. And so as you know, GovLab has done a lot of work around the concept of data collaboration where we really try to understand what are new mechanisms and models to match demand and supply. And so can you tell us a little bit about the initiative that took place last year? What was the value for, for instance, also how you went about COVID-19? More importantly, how do we scale this? Which is always one of the biggest challenges is to actually really make this sustainable and systematic. Do you want to scale it up, scale it out or scale it deeply? Because these are different strategies. Okay. Whatever you feel is needed. Okay, okay. Right. The presidential hackathon, which is the third year running now, we are holding the presidential hackathon. The international track just started a couple of weeks ago. The domestic track, we're down to the top 10. But it's an interesting configuration of an annual three-month prototyping session with all the three sectors that always results in working public sector improvements. And so the idea is that every year, our president, Dr. Tsai Ing-wen, gives five trophies to five teams. And the trophy is a micro projector. And with a shape of Taiwan, I think it's glass this year. And if you turn it on, it projects Dr. Tsai Ing-wen handing you the trophy. So it's very meta, it's a self-describing trophy. And it was the promise that whatever you did in the past three months, will become our national policy, our national priority in the next 12 months. And so it's executive branch binding power as presidential hackathon award. And so this prompts a lot of people to propose very interesting ideas. For example, there was an idea around making sure that people who measure their water quality can do so very easily using a zero gene network and a device called Waterbox that is solar powered. And they proposed that last year because we have a new law last year that says if there's any in the arable land in the agricultural lands, if you have a plant, that is the industrial plant, that pollutes the waterways of plants, that's an organic plant, then the central government can counsel your water and electricity supply. And so the manufacturers who live within the arable lands are all very eager to prove that they did not pollute the waterways. It's actually their upstream they say, but it's hard to know without continuous monitoring. And so the idea of such a data collaborative is that not only farmers, not only scientists, but actually the people who are in those factories who insist that it's not them, it's somebody upstream will be prompted to purchase such very inexpensive boxes that rise to this distributed ledger continuously with the main three pollutants in the water. And very quickly then we can have a map that maps out the water quality and find out which river is actually being polluted and handle the spikes in a much more timely manner. And they built upon the community called Airbox, which is doing the same thing for air pollution for around five years now. Again, it was very inexpensive, less than 100 US dollars per box in the primary schools, in their balconies and so on. Again, right into a distributed ledger, again showing the government where are the pollutants? And the government actually entered into a negotiation because of the presidential hackathon and the president really liked the idea. And we also worked with the industrial parks so that they would agree, well, they couldn't disagree, but they would consent to that we use those microsensors on the lamps. So it's a smart lamp, but it doesn't do face recognition or anything like that. It just measures the air quality so that people can complete a piece of puzzle because the primary school teachers probably cannot break and enter industrial parks to install such air boxes. And so this community sensing then prompts new applications, novel applications, for example, advisories for people who wouldn't work well if they go outdoor in sport, like running and so on. But if they have a condition that a high PM 2.5 or a heat or whatever would interfere with their system, they can check it beforehand and so on, there's many applications once you have this very fine grained data around air pollution and now water pollution. Great, wow, that's a great example. And I do wanna go a little bit on the technical side here because you've mentioned now twice already distributed leisure technology. And in many cases, blockchain has, is quite often still positioned as not ready for show time. And so it seems like you already have both at the case of the masks and the pharmaceutical industry and then also in the case of now the water pollution, you already have experience on how actually blockchain or distributed leisure technology can actually be applied to instill a certain kind of trust in the system. So can you tell us a little bit more about that because quite often we actually still looking for compelling examples on how DLT and open data provides for a solution. Sure, I say DLT instead of blockchain precisely because Airbox I think initially use EOTA which is technically speaking a cyclic graph, not really a chain and so to me like DLT and blockchains like search engine and Google, this is generalized idea and this specific implementation. So I usually say DLT and we use DLT really just for its most mundane use which is this multiple writer and immutable ledger and that's it. And you can easily implement pretty much the same thing using Git and Git in that sense is also a distributed ledger. If you have sufficient number like more than 100 developers hosting their own Git mirrors of the Musk availability, that's a distributed ledger because if people want to rebase the tree or force push something the other people would know. And because of that and Git is as good as a distributed ledger as say Ethereum for that because we're all developers and we all keep each other honest. Of course, there's no smart contract or so on but I guess you can implement that using commit hooks nevermind. So in any case, there's no smart contract and so on. This is purely a database that anyone can host a mirror and people can see for themselves that nobody can go back and rewrite history without every other people's permission and that is the most mundane way of using DLT and that's exactly the way we're using them. Great, excellent and thanks for that explanation. So let's go to the other and connect that also with open data, the other innovation that you are well known for which is this new way of engaging citizens as well. And so we have Taiwan V as a model of how to conduct deliberation and then really achieve a certain kind of citizens input. So question to you Audrey is how can we engage citizens? You already gave an example actually but how can we engage citizens in an ongoing way around open data? So here at GovLab we are trying something around citizens assembly around data in order to really engage with citizens around the use of data for COVID-19 in New York City but eager to learn other examples from your end what's the potential of deliberation in the context of open data with regard to what questions one should seek to answer or with regard to the data that all the expectations of people about their data. Yeah, so first of all, I think their data is an interesting angle, an interesting beginning point. When I talk about mask, about rivers, about air quality there's one thing in common, is that it's not their data that the mask doesn't have a subjectivity. Last I checked, hello. Mask would not say that they have personhood and that they would refrain from sharing how much of them are still in stock. They would not resist being profiled. This is a mask of this design. This is a mask of that design. They wouldn't mind, right? There's no bias injustice when we're talking about amount of mask or about air quality or about water quality. And so that sort of, which I would call a public use data or environmental data is one thing and personal data quite the other thing. And so I think the main Taiwanese insight is that we do not confuse these two. We say numbers for like statistics for evidence-based policymaking. We say Shu Chi, literally numeric evidence. And then a personal data, we say goods, personal data. And these two pairs of, sorry, these two pairs of like dual syllable words do not have anything in common. It doesn't share a same root at all, right? One side is called evidence and in one side is called personal data. And so my main point is that just this distinction needs to be deliberated. There's sometimes people would say, oh, but for the locally confirmed cases, surely their travel history belongs to the evidence part because everybody would benefit from learning of where they have been. But then our central epidemic command center has to push back saying that no, even if we just put publish a set of histories that is to say places and dates, it's easy actually for people to then re-identify and find out who actually that have been. And maybe they would feel pressure. They will feel that the society doesn't like them. And so if one or two cases like that happen, then the next one who develop a symptom would not report to a local clinic. And that will actually put all of us in danger. And so they insist that this is personal. This is not statistics. It's not evidence for policy making and so on. And so we had a deliberative conversation around a tool called Polis that will just ask everyone, what do you think is the norm around this particular area of data? And in co-hack the TW, we asked five broad areas of questions about people's norms. For example, the one I just mentioned is about contact tracing. But there's also another one around say hospitalization ICU capacity and so on. And so when people propose a statement like there was a proposal that said, I feel that we need to develop a tool that triage people who go to ICU not by the time they arrive or by the severity, but by their estimated remaining contribution to the society. And then we only treat those with a higher remaining estimated contribution to the society. And that's very divisive. Half of people who participated, usually from the American side, said it's a good idea. And pretty much everyone from the Asia side thinks it's a terrible idea. It's against the law in Taiwan, by the way. But I mean, that's fine. I mean, every jurisdiction have their own norms. But the great thing about Polis is that instead of this developing into a flame war, which I see a lot of potential, there is no reply button. So the only thing you can do if you disagree is to click disagree, see yourself, your avatar move toward a different camp a little bit, but then see what unifies that comes together and then propose your own ideas that then works with the different camps. And the Polis automatically summarizes the conversation. So by the end of it, we have a set of ideas that are, I would say it's universal norm. No matter which part of your own, you think it's a pretty good idea. And so one of the five winners of the Kohak hackathon is someone called MyDataTaiwan, and they developed Lockboard. The Lockboard is a app that collects your whereabouts, your temperature, symptoms, and whatever, but it works only in airplane mode, meaning that even if you have Bluetooth and Wi-Fi and 4G and 5G, it doesn't transmit any data anywhere. It's only used for communication is that when the contact tracer do come to you for an interview, you can with one click generate the kind of evidence they need to do their work without revealing any private details of you or your family. In a way, it protects your own best interests of privacy while working minimally with the contact tracer and it also saves you from having to remember where you have been for the past 14 days. And that's a privacy enhancing technology. And this can only be done after everybody who participate agree and settle on a norm that is actually privacy enhancing. Great. And so you mentioned there were five areas. So you had contact tracing, you had ISMU. That's right, so yes. Were there other areas that you feel were enlightening to get a norm setting exercise? Sure. All right, so the five settings is the first one is how to manage community resources. That includes ICU and also PPE, which I already talked about. And another one is about establishing proper data-driven risk communication, predicting future pandemic outbreaks, supporting frontline staff and essential workers and then protecting the vulnerable groups. Also, oh, I forgot one. There's one that's saying, now we're post-pandemic, how do we make a smooth transition to a new norm? Cool, cool. Great. And so the other elements, and we will definitely check these out as well, Audrey. The other element that we quite often hear in addition to the data equity, data responsibility, which anyway, we believe this should be part of actually the third wave of open data to have actually a more sophisticated kind of conversation about, which of course you have alluded to by making that distinction between the different types of data as well. But the other area that we quite often hear is the need for new skills within government, new kind of positions in order to really understand what the role of data could be in order to transform how decisions are made. And so I don't know, Audrey, what the situation is in Taiwan with regard to thinking about the public servants and especially thinking about, is there a need for a new skill set that can develop a more data-driven kind of decision-making? And perhaps that's something that you've worked on as well. Yeah, we're now currently working on the second term of Dr. Tsai Ing-wen's presidential terms or digitalization plan. And there's four pillars that correspond to the new skills that we would like to share within the public sector. And the four pillars, which spells DG, by the way, are digitization skills, innovation skills, governance skills and inclusion skills. And these are the kind of four pillars really anything around digitalization, not just data, but of course data is a sure part of order for pillars. And I think it's important to understand that there's different lengths going on here. And instead of just buying into a particular length, for example, innovation and inclusions perspective, I think it really pays to make sure that we can take all the sides. Like when there is a need, for example, for people to share their data around their symptoms, which is what we are seeing in COVID, there's this innovation branch that says they want to have access to as much insights as possible across as much medical screenings as possible so that they can, for example, train a assistive intelligence to tell whether one is COVID positive just by looking at a x-ray scan or some sort of other scans. As that is, of course, something that's never been done. And so you need a lot of new, sometimes raw data and that's the innovation branch. But then the inclusion branch would say what we really need is to include more community practitioners into understanding how this virus works and empower them with not only communication material, but also affordable devices such as musk and soap and alcohol, hand sprays and temperature checks and infrared and things like that, which individually doesn't reduce the R-value that much. But when all composed together may have a actually higher impact on the R-value containing compared to any one single magic AI technology. So that's a different branch of arguing. And I think within the government, depending on their ministry, because every ministry represents a different value, they already can instinctively argue from one of the four pillars or more. But the point of presidential hackathon is to empower people to work within their own ministry with different professionals and also across different ministries and with municipalities and cities and also with the social and private sector. And so all the top 10 teams in this year's presidential hackathon are cross sectoral and we coach them to have at least one person from the social, one person from the private and one person from the public sector so that they can look at the same thing, but from three very different perspective. Of course, they participate in the hackathon, the digitization is kind of a given, but that's the point. And so I would argue that we use problem-based learning look at real world problems instead of training problem which tend to be oversimplifying and then check all the four pillars and see that their stakeholders are present and swap their positions until they can argue at each and every side from each and everybody's positions. That's very interesting because, and I do like the problem-driven approach because one of the areas also that we are looking into around open data is to become more purpose-driven as opposed to supply-driven because we've seen a lot of the open data use cases or use cases that started from the data. Oh, now we have access to data. What can we do with the data as opposed to actually have a clear understanding of what are some of the problems that are crucial to address and what's the potential on then using data to really address that problem. So tell us a little bit about how problems are being defined because that's already quite often where lack of certain groups of society to actually be able to play a role in the agenda setting. So how does this happen within, for instance, the hackathon but also within your branch? Yes. The presidential hackathon begins with a long multi-week period of a wishing poor where people can just type in whatever they wish to see happen. And the only thing we require is that they need to label their wish with one of the seven, one or more really but one of the seven sustainable development goals. And so from SDG one, which is a problem that really needs to be solved which is no poverty, like we need to solve poverty and two which is to solve hunger and then third which is to ensure health and fourth education and so on, I can go on. And so all of those 17 are already well established vocabulary of talking about world scale problems. I mean, that was what everybody agreed back in 2015 as the 17 most important topical areas to work on for the next 15 years. And so it's a good taxonomy and we just use that. And so you can type anything. For example, people would say, I wish that people in cultural institutions such as museums who will visit and if they have blindness or have a difficult in seeing things, I wish anything that slighted people can see gets translated into voice narration so that people can enjoy the same sort of immersion into cultural events and exhibitions. So that's a very concrete idea and it can be filed of course under reducing inequalities or under sustainable education or things like that. And so once they do so, then we make sure that people frame these questions in a way that could be solved by data. And so any team can then propose, oh, I'm going to tackle this wish and pull idea this one, this one and this one and turning those very vague ideas into very concrete data collaborative blueprints. For example, there's some frontline workers, public servants in the Ministry of Culture who think is a really good idea, but they did not have, for example, the means to produce or to host a lot of such data. And so they would write a blueprint and identify the missing stakeholders and the missing professionals that's needed to make this data collaborative a reality. For example, you need to have museum actually collaborating, right? If they keep saying that whatever our visual data is proprietary and we would not allow remixes and that's a non-starter. Is the connection still going? Otherwise you're frozen for the moment. Are we back? I think we're back. Okay, so I'll redo that segment. So, yeah. So for example, a public servant of a frontline worker in the Ministry of Culture would say, oh, it's a great idea. And because of my expertise, I know exactly what needs to happen. The museums need to share their visual data. The narrators need to be using this pipeline so that they can narrate for one blind person but it's being recorded and it's remixed and hosted here so that people can then use a phone to access that previous narration and maybe comment on it and so on. But they did not have the expertise which the private sector would provide on the technical parts of the data pipeline nor do they have the connection to mobilize a lot of narrators and people with blindness to try out and absorb this first batch of materials into something useful, data veracity work basically. And so the presidential hackers on people would just coach her to work with the larger pool of talents, often by people who propose their own solutions but did not make to the top 24. And then we repurpose those three radicals into a new team that would ensure there's cross sector collaboration and that they would be able to build their minimally viable product for demoing to the president within the couple of months. Wonderful. Great and I'm sensitive to your time so I'm moving on to the last question, Audrey, which is if you would have or basically last question is if you would prioritize making progress with regard to what we discussed both in terms of increasing the use of open data but also increasing data responsibility, data equity and ultimately moving into a data collaboration as default position as opposed to an afterthought. So what would be your priority that you would tackle one or more that you feel could be transformative if we've managed to actually get that established? Yeah, I would focus on data competence and that is like media competence. It's a term that we use in our K-2-2 of curriculum that is default starting from the first grade. And that's why we have so many primary school teachers holding air boxes because that's a great way to teach data competence. And by competence, I don't mean literacy. We use that term competence for media specifically because we do not want our children's feel that they're merely media literate, that they're only consumers of media, consumers of data, the consumer of digital creative products. I would like them to think that they are producers and they really are. I mean, many of them have more Instagram followers than I do. So obviously they are media producers and if they host their own air box it's like stewardship, right? They have to steward over the veracity, the truth of the data, like it's not being blocked by a high humidity spraying device in front of it or things like that. And they have to really think about the contribution they're making, the trade-offs they're making of hosting more devices versus strategically placing devices and so on. And once they think from a producer's point of view then they are in a position to negotiate. And that is what the data democracies or data cooperatives, data coalitions, data unions, there's many ways to describe the same thing which is that there's a huge amount of data being produced by a huge amount of producers. Can they bond together? And instead of relying on a single arbitrator to pull all those data and benefit all from it without giving anything back, which is like, I don't know, corporations before the invention of cooperatives and labor unions, we would indeed treat this air box as our means of producing data and look at it from a producer's perspective and form new forms of organizations, associations, co-ops, you name it, that can collaboratively determine where the data goes. How does it protects not only our interests but our communities' interests and how to work with the best or at least better practices that's coming out from Gulf Lab? Great, thanks so much, Audrey. This was brilliant. Always delighted to talk to you, always very inspiring and definitely provides for a lot of material to deepen and explore further during our summer and ultimately what we believe is the third wave of open data. Thank you so much. Thank you and live long and prosper.