 I would like to give a floor to Mr. Ronald Jensen to moderate this session. Thank you very much, Chair. I'm very happy to be chairing or to be moderating this particular session. It's at the heart of what we try to do at the Statistics Division of the United Nations. It's a session about data quality, big data and open data. And just from that title there's already some tension between big data, data quality, big data, open data, open data, data quality. So I think I hope that in this session we will get those kinds of issues to the fore. And I'm sure that our speakers will go into this. A few words before I give the floor to the first speaker. As you've seen this morning there is a lot of questions, a lot of discussions, also the questions in this very last session. And most of them have to do with issues of methodology, issues of having the right definition, issues of operationalizing certain kinds of concepts. And I think all of that has to do with quality and quality assurance. So in that respect I'm really happy that we will have a presentation on a data quality assurance framework for ICT indicators. Further we've discussed big data quite extensively yesterday. And I'm happy that we also have a speaker on big data and open data. And then two presentations on open data and what does open data mean and how does open data actually fit with our work on statistics, our work on big data and our work on providing information to society and government. Let me introduce very briefly the speakers. We will have a keynote by Michael Collage. He's a longtime statistician. Had been in Stats Canada for almost 20 years. Michael Collage has also worked for Australian Bureau of Statistics and these are two of the most renowned statistical offices around. And he's lately being a consultant among others. He's also a consultant on business registers and that's how Michael and I actually also know each other. And I'm very happy that he will be able to say something about data quality assurance framework for ICT indicators. Then we will have a speaker, Mr. Kyosho Mori, Director General for International Affairs and Global ICT Strategy of the Ministry of Internal Affairs and Communications. Yes. Mr. Mori has a master's from Harvard. Oh, Kennedy School. Yes, Harvard Kennedy School. And I also read that you worked for METI, the Ministry of Trade and Industry with whom I work with also from my background in trade statistics. So you might have come across some of your colleagues in the past. And then we have two further speakers. We have Ms. Karo Kimura from the World Bank. And World Bank is one of the first taking the challenge on open data. And we will be very happy to hear about her work on that area. And then finally we have Olivia Toussaint from the government of Moldova speaking about e-government and open data. And other more information can be found in the biographies which are in the brochure you have. But let me start right away with a presentation by Michael Collage, who will talk about the ITU Data Quality Assurance Framework. And Michael, I give the word to you. It's a great pleasure to be here and to address you all. I have come from Australia. It's a long way to come for a conference when it's not my particular area. But Susan said we'd like you to come and make the presentation. And I thoroughly enjoyed it. The papers have been excellent. So thank you all for being such a good audience and producing such good papers. I'm going to talk about a Data Quality Assurance Framework for the ITU. And during the course of the presentation, I'm first of all going to give a brief introduction. The context for a Data Quality Assurance Framework, whoops, what its components are. I'm then going to deal in some detail with two of the components, the statistical principles on which Quality Assurance is based, and the quality dimensions. Now, unlike most of the talks that we've had so far, this is going to be classical statistics rather than ICT. It is a framework which takes account of the context, but it is all about quality assurance for statistics. So those are the four areas that I'm going to be covering. First of all, let's deal with the context. ITU's main function is not producing statistics. Its main function is to do with standards and administration of telecommunications. Statistics is an important part of its function, but by no near the main function. That's why we refer to this Quality Assurance Framework as a Data Quality Assurance Framework, to make sure that nobody thinks that it's a Quality Assurance Framework for the ITU. It's for the Quality Assurance Framework for the ITU statistics. Now, statistical activities are primarily undertaken in one division, the ICT data and statistics division. There are other areas that work on statistics, but the framework is designed for the ITU as a whole. But most of the context for my discussion and my paper is to do with the work done by the ICT data and statistics division. Now, why have a framework? Well, because like every other organization, ITU statistics faces some problems and challenges. And I've just listed some of them in the slide you see in front of you. There are no particular order, but they're the sorts of things that you might expect to find in any statistical agency, especially this one keeping up with ever-changing data demands due to the fast evolution of the ICT sector. So it's in order to sort of have a framework to deal with those problems that one develops a Data Quality Assessment Framework. So what do you expect from having a framework? You expect to have, first of all, a systematic mechanism for ongoing identification of quality problems or possible actions for solutions. You have a basis for promoting a quality culture within the ITU. You have greater transparency in terms of processing by which statistics are produced. It reinforces, if it's made public, it reinforces ITU's image as a trustworthy provider of good quality statistics. And it provides reference material for training and a framework of ideas for exchange with other organizations. So those are actually the reasons for a Data Quality Assurance Framework, more or less, in any organization. What are the components of a Quality Assurance Framework? Firstly, it starts off with these four components. The first one is to have a set of statistical principles on which you base your statistical operation. Now, you don't have to invent them from scratch. They've already been invented. The UN committee established international statistical principles in 2003 or 2004. And they're available to be adopted. And those are the ones that will be used or are used in the, what I'll call the decaf for sure, the Data Quality Assurance Framework for the ITU. And incidentally, if I don't remember to say it at some other stage, when you're developing a Data Quality Framework, you don't start from scratch. You start off with what other people have already done. And in this case, there's quite a lot of work been done in the last 20 years. In particular, there's a thing called the National Quality Assurance Framework, which is developed by the UN, which is a template for any national statistical office. And then there's frameworks like the OECD Quality Assurance Framework and Quality Guidelines, which in fact is the model that we have adopted for the ITU purposes. But the ITU statistics is basically a small operation compared with, say, your stat, or compared with OECD statistics. And so the framework has to be correspondingly geared to the size of the organization. So you don't want an elaborate framework for what is in fact quite a small, relatively small statistical operation. So using, you take advantage of what's already there. And so we have some underlying United Nations statistical agency principles. We use them. Dimensions of quality. Several agencies have come up with dimensions of quality. These particular ones that we're using are also used by OECD. And I'm going to go into, in some detail, into the quality dimensions, so I won't talk about them yet. Then part of the framework is a set of quality guidelines. This is a separate document. It's 30 or 40 pages. It's a lot of good practices. And I won't be discussing that today. And finally, the fourth component is to have a quality assessment program. That is to say, have a basis under which you regularly look at quality, and then you decide what's wrong and you decide what actions you're going to take to improve. And in this day and age, and especially in this sector, if you stand still, you fall behind. You have to constantly be looking for opportunities to move on. So let's have a look at the first component, the underlying statistical principles. And I'm just going to put them up on the screen. I won't necessarily read them all out. There's ten of them. And I think you'll agree with me. They make all kinds of good sense. And they cover the spectrum of things that you would expect a statistical operation to do. So first of all, they should be accessible for all. And they're a fundamental element of global information systems. To maintain trust in international statistics, their production has to be impartial and strictly based on professional standards. In other words, the agency is not expected to slant the statistics in any way to suit anybody. And that is actually the basis for their credibility and it's the basis under which the ITU's present statistics have such high credibility. The public has a right to be informed about the mandates for statistical work of the organization. This is especially important for a large organization, but it's certainly important for ITU. You should know what statistics are being produced and how. Concepts, definitions, classification, sources of methods meet professional scientific standards and are made transparent to users. Well, I think you can hear that that's the principle that's certainly being followed from the fact that we have the two working groups reporting back on doing exactly this. Sources and methods for data collection are chosen to ensure timeliness and other aspects of quality to be cost efficient and to minimize reporting burden. Well, that's related to the two of the, actually two of the quality dimensions are mentioned there. Individual data about natural persons or legal entities or small aggregates subject to national confidentiality rules are to be kept strictly confidential and used exclusively for statistical purposes or for the purposes mandated by legislation. Now at the present moment, ITU doesn't collect any individual data. So why leave that in? Well, because first of all it's part of the international guidelines and secondly in the advent of big data on the way it could very well land up collecting some individual data in due course. So the next principle, erroneous interpretation and misuse of statistics are to be immediately appropriately addressed. This is very important for a statistical office not to let the press or somebody with some axe to grind misinterpret the information that's published. Standards for national and international statistics are developed on the base of sound professional criteria while being practical. Coordination of the international statistical programs is essential to strengthen the quality and coherence. In other words, all the statistical agencies, the international agencies, should work together and finally bilateral and multilateral cooperation and statistics contribute to the professional growth of statisticians and to improvement of statistics. In other words, there should be conferences like this. And this conference is an example of one of the ways in which ITU is assuring the quality of its statistics. Alright, so those are the principles. As they're the UN principles, there isn't much need to debate them. Now's the question of how do you decide what is good quality? It isn't just being having correct data, having the numbers right. It's more than that, as we'll see in a minute. Data quality, of course, is what it's about. But first of all, are you producing the data that people want? Are you producing all the data that people want? Are you producing data that people aren't interested in? Is it relevant? Are your data accurate? Of course, this is fundamental. Are your data credible? They could be accurate without being credible. You have to have the reputation. Are your data coherent? Meaning, if you bring these data together with other data from other areas, are they going to make sense? Are the data timely, as timely as possible? Are they delivered according to a schedule? Are they accessible? If they're not accessible or not easily accessible, they might as well not be produced. Are they interpretable? Do people understand what's behind the data? Can they understand what the statistics that they're reading? So those are the seven dimensions of data quality. Of course, it could be eight if you wanted to separate timeliness and punctuality. And added to that, we have two other dimensions of quality, sound methods and systems, and cost efficiency. And these are because you can't just talk about the quality of output. You have to talk about the quality of the processes that are producing those outputs if you want to make an assessment of how well, how good the data are. You have to look at the methods. And cost efficiency is not an aspect of data quality, but it's a very important aspect of a process because the more efficient you are, the more data you can produce. The more resources you've got to spread around what you're aiming to do. So those are the dimensions of quality. Now I'm going to illustrate each of those dimensions in a bit more detail with an example from the ongoing or almost completed quality assessment that I've conducted of the ITU statistics. There is an executive summary about to be published and the full report will be published shortly. But what I'm going to do in other words is just illustrate these dimensions with some examples from the actual, the assessment that I've done. I should add, my contract had two aspects to it. One of them was develop the quality shorts framework. The other was to do an assessment. So relevance is the degree to which data served to address the purposes for which they are sought by the users. So here's an example from the ITU quality assessment. I've set in the report the expert groups in the ITU pay great attention to the choice of ICT indicators collected and disseminated. That's obvious from those expert groups. Consideration is given to the utility of each indicator and to the corresponding response rate. Indicators with little power in discriminating well-developed systems from poorly developed systems or based on poor response rates are eliminated and that certainly happens. So that's a remark from the assessment. Here's a recommendation. Undertake a review of current statistical products along the following lines. Now bear in mind what we're doing at this conference in a sense is reviewing the products. But this is not the only forum and this is not the only place in which they can be reviewed. For example, ITU receives a number of requests for information it can't satisfy. It's a good idea to log those requests because they might point out the direction for the future. It should. It's a good idea to analyze its users. Not just the working group but who is most important amongst the users of ICT statistics? They could well be within the ITU itself. The most important users may be in there, they may be outside. It's very good to identify them, talk to them, find out what they're using the data for that might give some indicators of which direction to go in the future. Conduct another user satisfaction survey of those users that purchase access to the WTI database because you can be pretty sure if somebody pays money for something they're going to use it. So it's a very good idea to find out what they think of it. And I'll have a bit more to say about that, the database in the future. So there's an example of firstly an observation about relevance and secondly a recommendation of how you might improve relevance. Accuracy. The degree to which data currently estimate or describe the quantities or characteristics they're designed to measure. Here's a remark. Probably the most, well certainly actually, the most probable errors, sources of errors are the operator and household response errors and the national regulatory authority that's NRA I've used in the slides and NSO National Statistical Office data collection and processing errors. In other words, ITU may make some errors during processing but they're much less likely. And so there's only limited improvement that ITU can make in the accuracy of the data by improving its own processing operation. The biggest improvements in accuracy are likely to come from better data from the providers. So what does that lead to? What leads to a recommendation? As a basis for decisions where to target improvements in accuracy and where to invest training resources. Conduct and analyze the incidents and impact of the errors that actually occur, that you know about, that you fix during the course of processing. And classify data providers which means the regulator authorities and the national statistic offices according to their capacity and willingness to collect data and report the required data and the likely accuracy of this data. So my suggestion here is to actually look at the sources of errors and classify the providers that need the most help and then provide them with that help. Credibility. Credibility is the confidence that users place in the data based primarily on their image of the data producer and the product, i.e. the brand image. So what does ITU's credibility depend on? What gives it the big comparative advantage over other sources of IT statistics? Well, basically it's because it's a UN agency and it's got the reputation of an honest broker and professional behavior. And it has direct access to official government statistics. So those are the reasons why it, one of the reasons why you might imagine or I do believe ITU statistics are credible. But a recommendation is to publicize the ITU's a professional approach to collection, harmonization, editing and dissemination, all the things that they do when they receive the data. And refer to the occasional concerns that there are with some countries when the ITU in order to harmonize the data might adjust the figures and say, well, at the end of the day we're just following this set of procedures. These are our procedures and we have the authority to make the final decision on what we publish. So that is important for credibility. It's very important that nobody gets the impression that any country could possibly influence the professionalism of the ITU. Coherence, the degree to which data projects are logically connected and mutually consistent, especially comparisons over time and across countries. Harmonization of the data that's received by the ITU is the key to coherence of the data across countries. That's why they have to adjust the data from time to time. It's the major source of value added. It's what they're giving. Bringing the data together and harmonizing the concepts is what they're doing to add value to those data. And it's well done. Recommendation, here's a slide off the cuff recommendation. As NSOs, National Statistics Offices, are in principle responsible for the coordination of all statistical activities within a country, the ITU should suggest because they can't demand that NRAs inform the NSO in their country about what data they're providing to in response to ITU questionnaires. Now, it seems a very simple thing, but if you're in the National Statistical Office and you don't know what's happening in ICT statistics, that's not a good situation if you're the national coordinator. So this is something the ITU can encourage their colleagues within countries to do, to improve coherence. Oh, dear. Shut down the presentation. Oh, good. Let's come back again. That's a relief. I feel slightly naked with this little button here. I'd like to have a computer that's on the desk here. Timeliness and punctuality, the length of time between the availability of data in the event or the phenomenon they describe, that's timeliness and punctuality is the existence and adherence to a data dissemination schedule. So from the assessment, an example, the timeliness of supply and demand side data is largely determined by data availability from the primary providers, those are the operators and the households. There isn't much ITU can actually do directly, accepting courage people to report as fast as possible. My recommendation here, or a recommendation here, this is, by the way, there are 30 recommendations in the executive summary, based on the review of data availability from primary providers, an assessment of the time required for the national regulatory authorities and the NSOs and ITU processing and ITU resource implications, consider whether the current schedule can be advanced. Of course, you always have to get more timely. It's just pressure. If you don't get more timely, other people will. So you're constantly aiming to squeeze maybe a week, maybe a month in terms of timeliness to bring the data, make the data as close to the reference period as possible, subject to the natural limitations of those data. Accessibility, how readily the data can be discovered, located and accessed from within the ITU data holdings. So ITU gives good exposure to the data it produces. Major publications are very attractive. We just got one today or yesterday, very professional in appearance and content. ICTI provides instant access to selected indicators for any country. The external website, external WTI database contains a lot more information and that's available also by CD-ROM, but it requires a subscription. So the recommendation here is consider free distribution of all data, in particular making the external WTI database really available. Why not? Well, you lose some revenue. So you have to establish the magnitude of the revenue you're going to lose and compare that with the benefits of having more users. But it's going in the direction of open data. Interpretability, these with which users can understand and properly use the data. It's sometimes called clarity or understandability. So a remark is the sufficient metadata on the website and in the publications to be able to find the data and to be able to understand what it means. But in terms of a recommendation, there isn't much you can find out about how the data is produced in the first place and it would be good to have a sources and methods document describing the procedures the ITU does to collect, process and analyze the data. But somebody who's going to make this their bedtime reading, they won't. But they'd like to know, especially the sophisticated user, would like to know how much they can trust the data. Is this data imputed for three years back or is this data today, data from a survey that's conducted within the last six months? This is the sort of information a sophisticated user would like to have. Sound methods and systems, the use of international standards practices for all procedures and systems. So ITU is a world leader in terms of developing standards and methods for collecting ICT supply and demand data. There's two publications, one of them is quite recent household indicators and the other one is 2011, it's going to be revised shortly. The second one hasn't got much methodology in it. The first one has a full set of methods as well. So recommendation from the point of view assessment, extend the content of regional training workshops to include provision of additional metadata by NRAs and NSOs. In other words, the only way the ITU knows how data are produced is to ask the operators, not the operators, the NRAs, how do they obtain the data? And that should be reported. Now there is a place to report in the question as there's a notes section, but it's free format. So more data could be collected, more metadata could be collected about how this particular country obtains its, say, demand side data or its supply side data. And encourage them to complete a quality self-assessment template if they already do that as part of their national statistical operation. That's good, but if they don't, then it would be a good idea for them to periodically be jogged into thinking, am I doing everything that I can do to improve, to maintain quality? So finally, cost efficiency. Remarks from the perspective, any request for additional resources should be part of a business case. It's no use saying we're too, we've got too much work to do. We like more resources to do the same. There's always got to be a good case. And in this case, there's so much change taking place. It's hard not to suppose that there wouldn't be a good case, for example, expanded use of big data if and when somebody finds the methodology that's appropriate. So additional activities and outputs that follow flow from the resources and a business case, in fact, talks about the benefits you get for the cost you incur. Recommendation from ITU quality perspective is in tandem with a review of the evolving needs for ICT. Formulate and analyze the options for re-engineering the current statistical systems and procedures with the aim of making it more efficient. Okay, so those are the dimensions with some flavor about what's in the quality assessment report. What about implementation? Well, the decaf has been formulated. It's close to completion. It includes statistical quality principles and quality definitions. It has broad quality guidelines and it has a draft quality assessment template. The quality assessment program is commencing, starting with the assessment I've done and there will be periodic ITU self-assessment and something that ITU will be encouraging is raising awareness of the need for data quality self-assessment by the providers, by the NSOs, by the NRAs and probably providing material to help them do that if they haven't got material already in-house. Thanks for your attention. Thank you very much, Michael. Given the quality dimensions, I would say very relevant, very credible. Not very timely, though. We're running a little bit behind time already. But I think the assessment is very relevant for this forum and so that's why I gave Michael all room to go through the whole list of quality dimensions and the assessment of ITU. We will take a few minutes to see what your reaction is to this, to the ITU data quality framework, to the various assessments you have seen, what your opinion is. We're very eager to know what your thoughts are here. So I open the floor for a number of questions. I see Brazil, Korea, and Jose. Please, Brazil. Thank you very much. First of all, I would like to congratulate ITU for this initiative. And in Brazil, both the National Statistics Office and CETIC as a data producer are very much engaged in this process of data quality. This year in our annual workshop, we invited UNSD to provide us a short course on their quality assurance framework. As a country, we support ITU in this task of establishing a proper framework to guarantee data quality in these statistics. Thank you very much. Korea, please. Thanks, Mike. My question is just dilemma between policy makers and statisticians. Because one of my team composed of statisticians, they stress on validity or reliability or regression analysis. They stress on that point. But the MSIP, MSIP is Ministry of Science and ICT and Future Planning is Korean Ministry. The policy makers require request live data. It's timely this data. But sometimes these two values conflict it. And then, for example, five years ago until five years ago, the MSIP required to measure YMAX, YBRO, actually was invented by Korean people. But nowadays, they do not want to anymore or YMAX. But the statisticians ask me, we must collect the data, YMAX data because it consists of they accumulate the history and amount of that one. And then how can I deal with the request of live data from the policy makers and statisticians who keep the regression analysis, something like that? Thank you very much for that question. And then I go to Jose. Thank you. The work you have done, Michael, is very important. I think that the work on quality is, of course, a second step by the necessary step after the production of data. So the efforts for producing data especially by the European country have been very important until now. So it's very good that in the European Union the member states have adopted the European Statistic Code of Practice. And together with this, they have also adopted a quality assessment framework and even European standard for quality reporting. So there is recognition by all the countries of a certain quality framework which is used by all countries in all domains. In the particular case statistics, for instance, the core list of indicators has been adopted formally by the UN Statistical Commission. So I wonder if these conferences could recommend that the countries adopt formally this kind of data quality assessment framework at the highest level, at the level which is probably the UN Statistical Commission next time. I'm not representing any country here so I cannot speak even for mine. But I think that this is a good forum for supporting this initiative. Thank you. Thank you very much. And then I give the word to Michael to give a reaction to this. Please. Okay. Firstly, as regards the how to balance the requirements of policy makers against statisticians that are driven by the notion that relevance is the most important thing and you satisfy your users. The policy makers rule, what they want is what you should be producing as best you can. But you have to persuade them at the same time, you have to persuade your users at the same time, that if they jump from one topic to another year to year, they'll lose any time series. So there's always some kind of balancing act to balance the dimensions of quality. You want to stay relevant, so you want to introduce new things, but you want to stay comparable over the years, so you want to retain some things. You want to be accurate, so you want to take a long time to process the information, you want to be timely so you want to take a short time to process the information. So the trick in the statistical office is to balance the dimensions against one another. Whatever they may be, are very high on the agenda because they're the users. In terms of the quality is the second step. Yes, of course, the statistical code of practice, for the European statistical code of practice is a remarkably good document. It's like nearly 10 years old now. It's like the almost biblical as far as I'm concerned. It's the most comprehensive set of principles. But it's for the whole of the EU, whereas what we've got here for the ITU is with its relatively small number of people is something that's much simpler than the code of practice. But nevertheless, what that discussion I think, or in the fact that the EU countries all have the code of practice, they all have a quality assurance framework and they have a reporting mechanism. What that tells the ITU is that they have to treat their providers differently. They don't really have to worry about the quality of countries that are already subject to a quality assessment. They've got to focus their efforts elsewhere. I think that's the message that I get from that. Could a decaf, some form of decaf be adopted at a high level, given that the regulators probably don't have any form of data quality assessment framework. I think that's a reasonable thing to aim for, though it will require some work. Because what's developed here is for the ITU. But with adoption, with some transformation, it could be something that could perhaps be presented to an audience like this in a year's time as something for regulators. In the case of the NSOs, they will likely or they may well have their own quality assurance framework which applies to all their surveys, including their household survey, which collect ICT data. Thank you very much, Michael. I did see some more questions, but I will have to ask you if you can keep the questions till after the next presentations, otherwise we will not have time to get to that. So please hold on to your questions. I've noted you're down, both Egypt and Japan, but we'll postpone the second round of questions after the next round of presentations. Just to conclude, I think there was good support here for this ITU data quality framework. And yeah, so my suggestion following up on what Jose was saying of bringing this to the UN Statistical Commission is I know the partnership reports to the Commission at every two years, and I think it would be appropriate to have the framework and the data quality assessment in that report. And then I'll turn the floor and I will ask Mr. Mori to make his presentation on efforts for big data and open data in Japan. Please, Mr. Mori. So first of all, I would like to thank Georgia for this great hospitality. I was really entertained every day by good food, good wine and cha cha. Under this situation, I wonder whether I could finish my presentation, but I'll try. The first slide shows the arrival of data utilization society in a worldwide basis. Through the rapid evolution of ICT, many kinds and large amounts of data has been generated, distributed and accumulated. These data can be easily handled by companies, governments and individuals. Using of the data is very, very important in a worldwide basis. I would like to talk about big data and open data today. In the field of big data, many kinds of large amounts of digital data exist in society, market, and which can create new values and which can contribute to the resolution of social issues. In the field of open data, it is very important to open up public data owned by the nation nation and local public bodies to the private sectors in the format suitable for secondary use. Which can create new businesses and services and which can realize public services through by public-private collaboration. So for considering big data and open data, as Mike mentioned, data quality is a critical issue. Mike Wilson elaborated in more detail about the elements of data quality is importance of data itself, accuracy and reliability of the data, and consistency and compatibility potential of the data, et cetera, et cetera. In order to increase the quality of data in the field of big data, first of all, we should have a picture of how much volume of big data we distribute. So there is a wide variety of data. It is very difficult to grasp the full picture of the big data of the current situation, in the current situation. In the field of open data, accuracy and reliability of the data itself is relatively high in many cases. However, data of various fields exist in various data formats. So the important issues are standardization of data formats, rules for secondary use, as I said, and API, which I talk later. So, please look at the next slide. We Japanese government endeavored to calculate, measure the amount of big data. This slide shows the definition of big data. In a broad sense big data includes human resources and organization data scientists, something and analytical technology. But in the narrow sense we calculated the amount of big data in the narrow sense. The narrow sense is data itself. And there are three layers in the narrow sense of big data. One layer is structured data. For example, customer data, sales data, something. And the second layer is old but unstructured data, like voices, radio, TV, newspaper, and books, et cetera. And for those two layers corresponding statistics exist to certain degree. And the third layer is new unstructured data which includes blog, SNS, sensor log, access log, et cetera, where no corresponding statistic exist. Please look at the next slide. We picked up 21 targeted media, targeted data for estimation of the big data distribution. For example in the structured data customer data, post data, medical receptor data, accounting data, et cetera, et cetera. And in unstructured data business diary in the text style, and medical data and CTI voice log data, sensor log, traffic congestion information, security and remote monitoring camera, et cetera. And the result is this. 13, in 2013 13.5 exabyte was used for big data distribution. This amount of data distribution is estimated to be more than 150 times larger than the amount of data distribution through mobile phones in Japan in the same year of 2013. And especially post data consumed much data in the structured data. And in the field of unstructured data as you can imagine sensor log and security and remote monitoring camera consumed so much data. And in the industries we can found out construction industries and transportation industries and commercial industries consumed so much data as a big data in our calculation, in our estimation. Next, I would like to talk about open data. Our ministry is conducting a demonstration experiment for environmental improvement related to open data. We aim to, number one, establish the common API, application programming interface. Number two, develop rules for secondary use of data. Number three, visualize the benefits of open data. Please look at the bottom of the chart. There are various information services for example, local government information, social capital information, tourism information, disaster prevention information, public transformation information, statistics information, hay fever information. We would like to apply common API for information circulation and the sharing platform. And from this platform we would like to issue various fields. For example disaster prevention information services and public transformation information services and ground general information services. And now we are trying to make various governmental blue books including telecommunication blue books are good example of open data which will be done next year. I have talked about the relationship between open data and big data but because each data has a different target and has been provided by different entity, there are many parts that overlap and also many parts that do not overlap. At the same time although the limited definition of open data is the information provided by public institutions, private companies can also contribute to the open data under certain rules. In this case we should consider how to ensure the consistency and reliability of the data and more importantly we should think of how we utilize personal data. Lastly I would like to point out two points. I would like to point out two issues are the challenge of data utilization. Number one is utilization of personal data. From the side of enterprises you know the data this data is high value of data related directly to users but from the side of consumers there will be a possibility of privacy infringement and concerns of exploitation by the third party so a balance between protection and utilization of personal data is very important but at the same time it is very difficult to handle. Next is the need of data scientists training securing human resources useful knowledge from data and use it to solve problems are very very important in every country however in Japan actually speaking there are few such human resources data scientists. Not only in Japan but also it's a worldwide issue. I'm sorry I don't know exactly how to nurture data scientists and I would like to find out some solutions during this world telecommunication indicator symposium. Thank you very much for attention. Thank you very much for this presentation really nicely put together quality big data and open data and also here are job applications for data scientists in Japan. So we will I will hold on please to your questions on this topic and we will go further on two related presentations. The first one from Ms. Kimura from the World Bank who will tell us about how the World Bank ventured into open data and what it currently is doing there. Thank you very much. Please go ahead. Good afternoon everybody. I'm really honored to be here with Ms. Kimura. Thank you so much for your open data experience at the World Bank. Here is a general of my presentation. First I started with the introduction and then talk about a little bit about the definition of open data and next I will talk about open data experience at the World Bank and then after that I will take you to a quick tour of the open data and lastly I will talk about the country of open data and statistical support program that we offer. So what is open data? Actually compared to the big data open data has specific definition. Open means everybody can freely access, use and modify and share for any purpose. In that mean the data to be open the data should be satisfied with the two criteria. First one is data need to be technically open which means findable and available in standard and editable electronic file format such as CSB, XML and JavaScript or any other machine readable forms and then the second point is being legally open which means explicitly release under license that you can use freely use and then free distributions and then World Bank data terms of use provide that legal framework for the data opened by the World Bank and then the next one is three main common reasons for releasing open data. The first one is for transparency which allow the people to monitor and then data publishers to make sure that the second reason is efficiency and innovation. Let the data open and then let the people to create some new idea or present data or analysis. And then the third reason for the open data is participation and engagement. So the next I wanted to talk about how we start open data at the World Bank. So I wanted to talk about the open data partnership decision to provide a public good and then make it accessible to the widest possible audience. In April 2010 our open data main portal which is data.worldbank.org launch featuring the data from the World Bank development indicators which is our most popular data is not limited to our statistic data itself and then we move forward and then it's getting like an institutional wide effort bringing together to make the data accessible and then available. And then for instance we already disclosed all the information about our financial status and then our project and operations information and then our research publication which is basically all publication by the World Bank and then our some procurement information. So it's been four years since we launched the open data portal so far we got significant number of access to the data website and then over four years over 54 million people visited our data site and then over one third of the traffic goes to the World Bank website actually goes to the data site and then the data site is the number one reason people visit for the World Bank website and then the World Bank has seen 10 times fold increase in the use of our data. So now let me take you to a quick tour of our data. This is the top page of the data and then this website is provided by the five languages and then the top page from as you can see from the top page you can browse the data by country, by topic or by indicators and then this is a sample page for the browsing the data by country and then this is a sample data for India. As you can see in addition to the five languages in particular case in India we have we have some local language data is available in Hindi which the Hindi icon is displayed next to the Chinese icon and then the top page of India snapshot is summarized as a snapshot of the website of the country and then the top page of the country and then also which is already integrated some World Bank project data and then in this case as you can see some country partnership strategies for India is directly linked from this statistic data page and then this is a bottom page of the actual presentation of the actual project and operations and then this map shows where exactly the World Bank ongoing project located historically and then ongoing and then this is another way to search the data which is our data catalog data catalog is more recommended for the users who actually looking for some more detail indicators of the World Bank development indicators or doing business and then you can browse the details and then this is a sample page of the World Bank development indicators online tables and then actually the format looks exactly the same as the hard copy or printed version and then you can learn the custom query based upon whatever you want and then you can look at the data in data set by EXA or CSB format so now I like to a little bit talk about the data quality assurance framework which is related to the first speakers no actually let me start with a legal framework for the open data so basically open data we have some set of the terms of reference but the terms of reference mainly I'm mentioning that people really use our data and then there is a minimum requirement or minimum for the use of our data and then the backbone of the open data initiative and then our open data initiative is strongly supported by the access to the information policy which is a World Bank wide new policy launched in July 2010 and then basically this policy allowed public access to any of the information that the World Bank has and then the default setting of the information that we have is open and then we only have 10 exceptions categories including some data or information including some personal identification or personal information or some safety or environment things or some confidential policy dialogue between the client but the exception category is very minimal and then we really try to open all the information on data that we have and then the data quality assurance system as Michael mentioned the World Bank already set up some solid systematic mechanism for the data quality and then make sure all the data we disclose is accuracy of the data and then we have in-house topical export such as micro data or micro data or demographic data or external depth those kinds of topical experts have trained and then they are able to manage data and the metadata so every step through data collection to the distribution and then aggregation and then dissemination so we have good skills in general but in addition to the open data initiative we expand quite a lot of more cross-cutting bank-wide review process which includes some specific sector not only sitting in the data group as well as the expert who are dating with the legal basis operation in that sector and then also we extended client service communication team and also feedback we got from the people to be handled properly and then incorporate to the data set or information we are disclosed online as soon as possible and then the next is some linkage toward the big data as a part of this as a part of the open data portal or we disclose some data set to the developers to use our APIs and then currently we offer three different APIs for developers one is statistic data and the second one is a project and then operation data and then the third one is wordbank financial data and then some application already developed using the wordbank data including Google and then there is some example of the Google public data explorer just typing some ggp carrier and then the chart comes up with the wordbank data showing some time series and then there is another example of public data explorer which shows the cross country data comparison through the time series and then all the data is basically coming from our data set and then lastly I just wanted to talk a little bit about our ongoing support providing to the countries in the area of open data and then basic statistic capacity building program and then this is the open data the open government data toolkit which is available everything our website and then we we listed up so many resources and links from 1 to 10 we can learn from the day 1 about what is open data as I explained in the beginning of my presentation to the best practice and then what is open data policies or learning resources and then this is example of open data scorecard which is done by some request basis and then based upon this technical assistance and then government can move with some project which is the open data can be implemented as a part of the World Bank project or they can do based upon our technical assistance do it by themselves and then so far we supported more than 20 countries in the area of open data capacity and then technical assistance including Moldova and then Kenya, Tanzania and Mongolia so this is the this is all my presentation thank you thank you very much gives a very nice overview of open data as the World Bank engaged in it and what they present and how they help countries and as was indicated Moldova was one of those countries and that's why I would like to thank I.T.U. for the opportunity to deliver this presentation and sharing the experience of my country Moldova with regards to open data and yes the open data initiative was developed in Moldova with the help of World Bank and I would be happy to share with you our achievements but also challenges and challenges that we have faced during the implementation of this initiative. So there are many reasons that the governments of different countries choose to implement an open data initiative as previously mentioned by my colleagues one of them transparency and accountability of government activity other reasons are but also improving and enhancing civic involvement in judging or maybe being informed about the activity of the government that is delivered to the citizens so the citizens based on this data that is open can take decisions can easily see the efficiency that the government can be involved in shaping the policy that affects directly them so in our case in the case of Moldova transparency and accountability was the main reason that our government started the open data initiative and with this reason I would like to present you this code that is very similar to our decision is that the biggest issue now is corruption but there is a vaccine to it and the vaccine is called open data sets that the governments are trying to open in order to be more citizen oriented in order to be more open to the citizens and be more efficient in their public delivery activity. So the journey started in 2010 2011 with a donor driver for open governance and open data and the government of Moldova which took strong commitments towards opening data and to both aligning their institutional framework to be more open in this respect and to be a more open government. So in 2010 at a high level round table president of World Bank who was visiting at that moment Moldova raised the interest of our senior government management towards open data by showing the advantages that open data has for the government but also for the citizens. So in 2010 2011 everything started by having a strong political support towards open data from a high level management management of our government. This is the portal of open data portal of Moldova. It's called data.gov.med At the moment we have 782 data sets open for free that can be reused by developers while they are developing applications by citizens by data journalists by civil society organizations by the way. So this is the portal of open data users. So these data sets are freely open for use on the website. They are published online. So they are accessible and the database varies from minister to minister in time. Some of them are published in current data. So there is various data concerning with regards to timing. Also there is a network of 30 plus open data focal points in each ministry that are taking care of open data initiative in their ministries by publishing data from their ministries and the institutions that are responsible for their activities. There are also 19 open data apps on the portal that are created both by developers but also by public institutions like National Bank of Statistics and others. And also the portal registered 160,000 downloads in 2014. The first steps were to open data initiative in Moldova in 2011 with the launch of open data portal which was technically based on a WordPress platform. Since this was a new initiative in Moldova, there was a need of both institutionalization of open data within our government but also a training for those who were responsible for opening data in Moldova. So this is why a portal user's manual was created but also our Prime Minister at that time launched a directive that was saying that all the ministries should open three data sets a month and publish it on the open data portal. However, this directive didn't quite work so this was a process where we were able to publish an open data initiative in the ministries not only through Prime Minister directive but it should be by default procedure in all the ministries. This is why in 2011 this open data network of focal points in the ministries was created and this activity opened data focal points in the ministries and assuring the publication of data sets on the portal. The next step was launching the boost database which is the database of national public budget of Moldova including national incomes and expenditures and the database dates from 1995 until 2013 so it's up to date and the users can see the data at the high level of granularity starting from central level and ending up with local level even from the city halls from villages in Moldova and can analyze how the budget is spent, how the public money are spent, which are the primary directions, what are the priorities, what the priorities should be in and also in that year several other events took place in order to raise awareness of an open data initiative in Moldova like the tech camp, the visit of Hans Rosling, the famous statistician that also aimed to raise the interest of end users towards open data and at the end of the year a second version of a portal was launched based on the information learned from the previous year. Also in 2012 Moldova joined the open government partnership and one of the main priorities in the commitments that Moldova made while joining this partnership was opening priority data from the ministries which at that time were the data from the National Bank of the Ministry of Health and Ministry of Education. These priorities were publicly consulted with the civil society in order to align the priorities with the demand from the sectors in order to have really high value and usable data in this regard. There's the other series of... One more minute. So it was hard to define what government data should we open because we have two way of collecting data. First it's via information system and secondly, manually collected data. So the problem was which one to open first. This is a list of several open data applications that have been developed through the years until now and the latest progress is in 2014 in August we managed to launch an open data policy within the government of Moldova that sets open data by default policy for the government. This document sets several principles which I will tell about later. This is a legal framework that we needed in order to establish legal institutions for the open data initiative and this are the challenges that we face. First of all, the challenge was about capacity of end users both on demand of data and supply of data. The legal issues, the demand and reuse that is still low and the data quality which I'm going to speak further. As it was mentioned before data quality is not about only correctness of data. It's about accuracy, availability, completeness, timeliness, validity of data issues related to quality. The open data policy in Moldova came to set up a minimum set of requirements related to the data and to the quality of data that is open on the portal. Here are the principles that are set by this policy in order to assure the data quality at least the minimum set of data requirements on the portal. It's important that the data should be open proactively and also important to protect sensitive data especially personal data or any other state or commercial secrets. It's important that the primary data is open and it should be open online. It should be published in time manner and it should be published in automatically processable formats or machine readable as we call it. But also the data should be open and it should have a representative description of that data set in order for the user to understand what is the data about. The problem here was with the manually collected data because there is not yet centralized mechanism of data collection and the standard that should be followed. So each ministry does it in his own way. There is a difficulty of handling the data collection in the region but also there is a need to think about open data when you collect it and it should be open by design and then open by default. But this challenge has launched a lot of opportunities for Moldova. So first of all it raised the question to develop information management systems in the ministry in order to have more efficient data collection and monitoring. So the Ministry of Education, took this opportunity and they are now in process of launching their own information system that will further upload data sets on the portal. Also the feedback from the users is another source of increasing data quality and last but not least the interoperability platform that Moldova is now developing that will assure quality of data among the users from the institutions. Thank you for your attention. Thank you very much my excuses for rushing you but we have little time left for discussion and we still want some discussion going on. Very nice presentation also it shows open data is just not a slogan. It is a lot of work at all levels of government and I can applaud the government of Moldova for taking very thorough action there. I will open the floor for discussion. I think we have time for one round of discussions. I had Egypt and Japan first and I will note now Saudi Arabia, Iran and Benin. So I have five and UNECA. I got six questions please Egypt. Thank you very much Mr. Chairman and thanks for all this language presentation by distinguished speakers. Actually it is an intervention and not a question it is about the quality of the data. I think that the quality it is not started after the production of data. It is before even the production of data. For example when the data is collected and when the data is collected. I think the sample design there is process for make sure that the quality of this sampling for example the sample size and the relative weight satisfaction and taking all the factor and the dimension of the sample this is very important concerning the quality of the data. It is important to understand the terminology especially for example for ICT sector this is not easy for anyone especially for the NSO who are not specialized in the ICT. In addition to this as well I think it is the questionnaire itself in my point of view there should be pre-tests for the questionnaire which should be for example the interviewees and get their perception in order to modify this and after this when the questionnaire or the survey in the field there is another process for quality and after getting the data there is statistical process in order to make sure that also the quality is in a proper manner. So this is my intervention I think we should think about it before the data collection. Thank you. Thank you very much. I think it will be Michael later on to answer. Because we are running short of time as we have done all of it today I think. Please very short in your questions in Japan. Thank you Mr. Chair. We thank all the panellists for their excellent presentations and my question is about the data quality. Mr. As Mr. College mentioned there are several aspects to be considered in data quality and each factor is very important and at the same time as the delegation from Korea pointed out having balance is very very difficult and my question is what aspects is or should be most important for ITU or World Bank for each government and of course it should depend on the situation but I would like to know. Thank you. Thank you very much. I have Iran please. Thank you very much for all panellists and excellent presentations. One comment about the data quality as we confirm that it is needed that data producer at the national level to adopt data quality as an assurance framework and we are in the process to also adopt data quality assurance framework and revise the next core indicator of Iran based on the framework. It's very interesting for us and we are in the process to adopt it. One question from the World Bank and do you have any program to allocate a separate category for ICT indicators because I think as I remember there are some indicators related to ICT in the infrastructure and do you have a plan to add more indicators and also create a specific category for the ICT or telecom indicators and another question from Mollavi what about the applications if you name some sample applications thank you very much. Thank you very much. Thank you Chairman. I have just one comment regarding the issue of the big data which has been presented by a gentleman from Japan. While we are admiring the efforts that have been done by Japan or other countries and developed countries I don't think that this issue is ready for the developing countries due to some challenges that they will face starting from analysis storage difficulties processing something like this and for them that the big inherent risk of failure the bottom line that's the big data I think it's subject to the availability of appropriate conditions but I don't think these conditions are ready for the developing countries Thank you. Thank you very much. Benin please. Ok, thank you. I will express myself in French My first question is addressed to the representative of the World Bank and I would like to know what is the process for the countries to benefit from this data My second question is addressed to the representative of the World Bank I would like to know whether the citizens have an opportunity to present their offers, their proposals to apply to their respective governments in order to take in view for instance if the government is not enough, how then does it happen that changes on the open data website Finally, UN ECA Thank you very much moderator Congratulations to the panel for an interesting session. My questions are the issue of open data and how do you work with NSOs especially in issues of quality control and analysis There's been some concern about issues of ownership and other legal issues concerning open data and I'd like to hear your views on how some of these things have been resolved and the other is what are some of the challenges in terms of accessing certain data such as spatial data in this instance and spatial data in the context of open data and big data Thank you Thank you very much So we have a good round of questions I will ask our panelists to not try answering all of them but pick those which are in your area and I start from the right side Ms. Kimura from World Bank if you would like to give some answers Thank you for the comment and then great questions regarding the quality of data or the selective data to be uploaded as an open data portal As you may know there are a lot of indicators there's a lot of data coming from UN specific agencies including ITU and then when we try to pick up the when we select the list of the indicator put it on the WDI portal which means it will be disclosed to the public so we need to make sure that our partnership has a disclosure and then in the case we what is the criteria it's related to the question from Japan so how we can choose an indicator to be selected it's based upon the data availability because for our case time series and then also the cross-country comparison is important for us so we try to choose the one has a large coverage as well as the relevance to our business and then in terms of some questions from Ilan is there any specific category for ICD database or recommendation database it's related to our partnership ITU and then as Michael mentioned that the ITU basically for the dataset is for subscription and then we only are allowed to upload five basic indicators so that's a sort of limitation and challenge for us to put in the work dataset. Thank you very much. In order to answer the question to my colleague in Saudi Arabia I would like to present one example in Japan at the time of earthquake three years ago the government could not understand which load can be used and two companies volunteered censor the automobiles moving and collected and let it open to the public and five or six companies gather together. This is an example of overlapping of personal data open data and big data but for those issues of big data I definitely think developing countries also can use it thank you very much. Thank you very much very concise answers I first go to the right side and then I will finish with Michael to wrap this up so Olivia please. The first question was about open data applications developed in Moldova there are several list of applications that we managed to develop in these three years of open data initiative the first one I would like to mention is budget stories it's an application a website budget stories.md that illustrates in infographics several subjects topics related to data that is disclosed for example where the public spending expenditures going in this year is it education health or other sector this website was built in order to give a more user friendly vision to the citizen because viewing an Excel file is not very user friendly to some citizens so for them it's easier to see this in an infographic that is more colorful and gives a general insight of a data set that is not only number it's only representative pictures that gives more accessible information to them. There were several open data applications that were built before like open law registry now all the legislation in Moldova can be accessed and consulted online we are collaborating a lot with the national bank of statistics that they have their own data bank application that allows users to select their criteria and export the data in different formats there is also a portal with open aid information that collects the information of the development sector in Moldova and there are some other list of applications that I can tell you further and I will be happy to share with you somewhere for example at the coffee break related to the citizens and their inquiries of data yes on the portal there is a button that says get involved there is a form that has to be filled in with a description of what data set is needed a short description or maybe more detail about the data that I don't know a journalist or civil society organization needs for their project for their analysis so these inquiries can be left on the portal and then they are collected by the administrator of the website and transmitted to the institution that has this data or can deliver this data so the citizen involvement is crucial for open data because if the data sets are there on the website there are no use of them but when they are used the real value of open data sets comes with the usage of this data by doing analysis building projects and so on thank you very much and then I give the last word to Michael you have about 2 to 3 minutes to wrap Michael thank you very much I have a question from Egypt I couldn't agree more that the quality of data starts with the design of the initial survey to collect it and I guess when I said in a slide that the ITU the problems are with data collection I was talking about from an ITU perspective from the ITU perspective the problems are upstream from them and that's why they should focus on it and produce the manual for the household survey and a handbook for this collection of administrative statistics so yes design is very important the question from Japan what do you do about the balance between getting a balance between what's good for accuracy what's good for time what's good for relevance and what's good for coherence well he answered his own question it depends on the context that's where the statistician comes in the statistician has to at the end make his judgment or her judgment as to what is the appropriate balance to be struck as regards data production at a national level having a quality framework to work within I think that's a really good idea and it might be especially useful for the in the case of the supply-style statistics where the collectors may not be so familiar with having a quality assurance framework as the NSO does but in both cases I think it's an excellent idea thank you very much I think we had a very informative session covering all of quality big data and open data and I would like to ask for your kind of applause for our presenters and I'll give the word back to the chair thank you very much thanks to Mr. Ronald Jensen and also Michael Gioshi Mori, Gauru Kimura and Livia Turcano for their interesting session we have now the half in our coffee break and please be ready to continue our session in half an hour at 4.40 and we'll continue with a session about the international coordination of ICT measurement 10th anniversary of partnership of measuring ICT for development thank you very much