 We've got another great day lined up today. We're going to start with a panel on data sharing and transparency followed by a panel on data modeling and analytics and then after the lunch at which we have a great keynote speaker, Karis Stein from the SEC. We're going to have a panel on data integration and visualization. These are all, you know, obviously really important topics as we think about how to improve access quality and accessibility or sorry, quality scope and accessibility of the large data sets that we're all interested in. What struck me about yesterday was, you know, we've got a lot of wonderful ideas. The more we can get specific about how to implement those ideas and to focus on specific cases that interest us and share successes and share obstacles to success across disciplines, I think that's where we're going to make some more progress. And we started to do that, but hopefully in this day and the follow-up to this day we'll be able to do some more of that. So that's what I'm going to be looking for in some of these areas as we go through the day. So I think Michael wants to say some things about a new FCC rule that just came out about data privacy. And we'll do that and then we'll move on to our panel. Thanks. Thanks, Dick, and welcome again to the second day. I'm just going to say a couple things about yesterday which was delightful. And as I said, yesterday made my brain hurt in a good way. The discussions around the conference and in the breaks have also been just really exciting. I'll just say a couple things about some of the panels yesterday. One is that some of you may have seen the news today, the FCC yesterday, or maybe it was released actually this morning, issued a new privacy rule suggesting that consumers actually do own their data on the internet which is a big deal related to this space and to the issues we were talking about yesterday. Also, a couple of the panelists yesterday mentioned that wouldn't it be interesting if firms had legal liability for the refrigerators, for smart refrigerators they sold in relation to the service attacks? But it actually turns out that there are quite a number of suits that are in the process of being formed and filed that I learned about yesterday. So that would be an interesting development to watch and we'll see whether in fact there is such liability going forward. So I'm very excited about today's events that Dick has outlined and without further ado, I'll turn it over to Matthew Shapiro. Thanks. This has been a great conference and I hope that we have this panel as just a continued discussion of all the interesting issues that came up yesterday and will unfold today. So I'm Matthew Shapiro from the Department of Economics at the University of Michigan. And I guess I'm here in a couple of roles. First is a macroeconomist but probably also more as a data hound. That's what we do at Michigan and it's just great that we have three panel members who are engaged. We'll both tell us about principles for getting data out there and the value of getting data out there but also are engaged in actively in the good work of disseminating data. So I wanted to kick this off by first talking about some of the benefits of having data sharing and being transparent. Because I think that data is the quintessential public good that everyone wants it free and free in many ways. No one wants to pay for it and that we have that as a huge problem with the internet economy. But there are lots of households and firms who are reluctant to pay the cost in terms of providing data and some of these costs are real in terms of time, effort and money. But there are also concerns about privacy and confidentiality and business purpose and agency which are also out there. So I think having a panel today and more generally the overall conference where we talk about different ways in which data can be shared in a way that's beneficial to the whole society but in ways that do recognize the compelling private interests and costs in sharing data. But let me mention some of the benefits of having better data out there. First the one which is maybe closest and nearest to my heart is just measurement of key national indicators. I was browsing the Dodd-Frank statue yesterday and actually OFR explicitly is authorized or maybe impelled. I'm not, I'll leave it to the lawyers and to share data with the Bureau of Economic Analysis and that's the agency which does GDP accounts and measuring the financial sector which is growing dynamic is an enormous challenge in terms of both conceptual barriers, what are financial services and how do we measure those? And it's a lively area of research and data such as collected by OFR and by regulatory agencies could be extremely helpful here. So that's an important benefit and getting GDP right is one of the key elements of the information value of it. There's also a huge benefits for research on households, firms and in particular how they interact. So that and that interaction is something also this panel will address because we're talking not just about getting data out there but getting data, one data set connected to another. So it's, there's been enormous research on say how financial shocks are transmitted based on say banking relationships. So is a bank that has a remote financial shock maybe emanating from a housing crisis in one country given cross country lending that could have ripple effects in other countries and there's a lot of research which has established this and it's only possible to do credibly by tracing out the borrowing and lending relationships among households and firms. So that's just one example of what we can learn. Having better data is totally critical to the evaluation of public policies. There was a lot of good discussion of that yesterday but their countless proposals for having interventions or not having interventions and having these informed by sound research which is increasingly in economics and other social sciences driven by data at the individual level, whether that's a firm or a household is absolutely critical. Data is critical for informing public policy decisions given the current framework. The Federal Reserve would benefit by having more timely data on financial and economic conditions and making its interest rates decisions if we had had better data in the fourth quarter of 2008 when the economy, well, I mean, I don't, there was a lot of two percentage point downward revision and GDP in the last quarter of 2008. And basically many indicators are highly inertial and extrapolative because data become available slowly over time and the key indicators do particularly bad in the, for turning points because that's when inertia really hurts. And I think it's probably an uncontroversial statement to say that had GDP been plummeting the way it is now measured and had that been widely understood in 2008 as the Obama administration was coming in and formulating the stimulus package, it would have been easier to argue for a larger and maybe a longer duration stimulus package and that's a key example of where having better and more timely data is valuable. And some of the opportunities in terms of big data, automatic feeds, monitoring disparate sources of data including things like being done at the Bank of England using social media as a short run economic indicator could be extremely helpful. And then at the core of OFRs, mandate and the purposes of this conference is the role of having better data for understanding financial stability. That I think was one of the key driving principles behind including OFR and the Dodd-Frank. And there's some trade-offs, firms need to, financial firms in particular need to share data that's potentially quite sensitive and timely because what matters for financial stability as we've learned or I hope we've learned is the interconnection of firms and potential cascades from one firm having problems making payments or being insolvent. And part of the architecture of what OFR this mandate to do and is indeed working on is to trace out these interconnections and that's extremely valuable. And it also I think we have to acknowledge extremely sensitive because we're asking about short run core business activities of very large corporations who have a substantial amount of this is bread and butter at least for major parts of their operations. So it's sensitive. So I think it's important to have continued discussion of this and to explain and continue to emphasize that there's some value to this. We really don't want to have the kind of unraveling that we've seen repeatedly and work being done by people in this room can potentially address that. So there are a number of I hope I suspect there'll be some lively discussion but I hope what's coming out of this panel and the conference in general is that there are there are a lot of opportunities for making progress particularly with the leadership of folks in this room. And it's time to figure out multiple ways of sharing data and the benefits I think clearly out outweigh the risks but the risks must be managed. So there are various modes for sharing data for getting academics and other interested researchers getting access. There's a huge reservoir of economists and social science and legal scholars who are eager to do research with this. This is in a time of tight budgets. Agencies should be looking to harness the free essentially voluntary work of the academic community focused very much on their various mandates. This works extremely well in the statistical system where there's a system of research data networks, research data and research data centers which allow under quite strict rules access to academics to non-public data in secure environment. That's a potential model. There are other models having bringing academics in as special sworn or temporary workers that can be considered. But I think we need to expose sort of wider the sources of resistance to this and help to provide solutions. And I'm very glad we have a panel of individuals who are thinking and actually not just thinking but doing along these lines. So I will stop there. I'll have more later. I wanna introduce our speakers in order that I'll ask them to speak. So Debra Lucas is the Sloan Distinguished Professor Finance and Director of the MIT Gallup Center for Finance and Policy. Again, she's a doer. She has substantial government experience including Associate Director of CBO with an attention on issues of credit in particular that she will be speaking about and has specific proposals for making progress. David Ballot is a senior analyst for Advanced Analyst Divisions of the Bank of England. The Bank of England as part of its strategic plan is thinking and doing many things that involve big data, social media data, data on households, transactions, data on vacancies. There's just a huge amount of work being done there and some of it is in cooperation with other entities including the OFR. And finally, Matthew Reed is Chief Counsel of the Office of Financial Research at ESA. I'm sure an enormous portfolio but part of it is trying to figure out how these problems can be attacked and how data can be brought into the agency used by the agency and I hope also used by other scholars and interested parties. So let's kick it off with Debbie. Thank you very much. It's really a pleasure and honor to be included in this. I'm learning an enormous amount. So what I'm going to talk about is at first blush going to look very specific which is the case for sharing loan level data from US federal credit programs. But just to put this in context, the big picture is that we don't have a private sector financial market. We have a mixed economy financial market. The government has an enormous footprint in financial markets and a lot of the data about finance and the functioning of those markets resides in government agencies. So the case I'm going to make is in fact we have at the moment much less disclosure from the federal government of financial data than we do from the private sector and this is really an ask to bring the federal government disclosures up to something comparable or even beyond the level of what we get from the private sector and the reason for that is that these are extremely essential parts of the financial market. So that's the big picture. So I'm going to try to change maybe the way you look at the federal government and convince you that you should think of it as actually the world's largest financial institution and this is because of the size and scope of its credit activities broadly understood or aggregated together. Importantly, it's the largest conduit of credit to US households. So if you take the mortgage markets and the student loan markets, that's the biggest credit markets out there. You have credit cards on top of that but that's way, way smaller. Okay, so I want to talk about what sorts of data the government has, what sorts of data it collects, what it shares and what it doesn't and potentially what could be shared, what the benefits of that are and I'll talk a bit about the impediments though I think I'm going to leave the impediments to my fellow panelists because as Matt said I think this is so important that the impediments pale relative to the value of finding ways to get this information out there but respecting the issues that are there. Okay, so this is going to be like unpacking a Russian doll. I'm going to start from the top. So on this issue of the US government being in fact the world's largest financial institution, if I look at the federal government in that big bar, I would say there's about $18 trillion, $18 to $20 trillion of assets or insured obligations so this is the sum of things I would put into the broad bucket of credit related activities. It doesn't count social security or anything like that. So compare that say $18 trillion to what we think of as the giant financial institutions at the OFR is concerned rightly about following J.P. Morgan, Bank of America, Citigroup, Wells Fargo, Goldman, those are all institutions in the $1 to $3 trillion is kind of size range. So you can see that kind of an aggregate the US government is definitely up there. So where did that $18 trillion kind of calculation come from? Well, the data's a little old 2014 because it's hard, the government doesn't release data in that timely manner but in any case the biggest single category is deposit insurance, then there's Fannie and Freddie. There's pension guarantees, which cover defined benefit pension plans which is like a kind of debt. Then there's what I would call the traditional credit programs which includes many mortgage programs, student loans and the like. I'll talk more about that in a second. There's a federal home loan banks and there's a foreign credit system. So those are the components of that $18 trillion of federal credit activities. What I wanna focus on today is what I would call the traditional government credit programs and let's dig down a bit into those. So those are on order of $4 trillion of obligations outstanding and those programs have grown rapidly, especially rapidly through the financial crisis compared to the private markets but they've grown over time and they continue to grow. So what do we have here? Well, that lower blue bar is the sum of direct federal loans and guaranteed loans which go to housing. So basically mortgages and these are mortgage programs that do not include Fannie and Freddie. So the biggest of these is the federal housing administration that guarantees loans to first time home buyers. Another big component of that housing bar is the veterans administration and then the third big component is the rural housing service that provides mortgages and rural areas. The next biggest chunk and the piece of federal credit programs that have grown the most rapidly are the student loan programs. So that's the orange that you see there and that's grown to over a trillion dollars outstanding of federal student loans. The federal government also has a reasonably large footprint in small business lending and farming which is the green there. They support international lending say through the Exim Bank and then they provide credit to all kinds of smaller things to energy, to health. So there's a lot of credit programs scattered throughout the government and in fact, even though it's relatively small sometimes these small programs get a lot of attention. So it was for instance embedded in the energy category, Cylindro which is something many of you have heard of is sitting there. Okay, so again, this is four trillion dollars in aggregate now of credit and these programs are spread across many different agencies in the federal government. So every single agency has at least one or two programs, some of them have many, many credit programs. In total, there's more than a hundred separate credit programs that run fairly independently often from each other. So these programs collect data on the loans that they guarantee or make directly and exactly what they collect varies a huge amount across the programs and the quality also varies a lot but in terms of what they choose to collect some of it is going to depend on the program goals the rules of the program and so forth. Okay, but things that are pretty universally there in the government somewhere is that they're keeping track of the loan characteristics at origination. You're going to know the maturity of the loan the size of the loan whether it's a fixed or floating rate what the interest rate is what the various fees involved are in the case of guaranteed loans and by the way I should probably define a bunch of these things but a guaranteed loan is one where the government doesn't directly provide the funds but they take all or some of the credit risk. So if there is a private lender involved or guaranteed lender you would know the lender ID and then there's all kinds of potentially options for prepayment and so forth. Okay, something else that the government keeps track of at origination is something about borrower demographics. So if you are a student taking out a loan we would know what school you went to we would know your age we would know various things about you and then there's also data that has to be maintained about the performance of each individual loan over time. So you're going to see how much has been how much has paid back each period what's delinquent what the recoveries are and so forth. So as I said this data resides in different federal agencies and then it's pushed up to OMB because I would say that in some ways the principal reason that the data is collected is for budgetary purposes. So where the data winds up being summarized and what the public can see is basically what comes out in the federal budget or specifically in annexes to the federal budget. For the budget geeks among you you probably know about this volume of the budget called analytical perspectives and it's really this beautiful thoughtful discussion by the federal government about kind of the difficult things in the government definitely credit programs are some of the difficult things of the government. So there's always a chapter on credit and insurance and this just strikes me as hilarious I know if it strikes you as hilarious but it's chapter 20 of analytical perspectives and it's less than 30 pages long and it covers that over four trillion dollars of federal credit programs. There's one or two tables at the end there with very high level summary statistics on all those credit programs. Another place where you can get information about federal credit is the federal credit supplement and this is about a hundred odd page piece of the budget which has very useful tables if you're a credit geek and but again it's not information that tells you a lot about performance. So what you're basically getting well what you're basically getting is a snapshot of each year you're getting aggregate loans by program you're getting something on the new originations and you get some information on expected portfolio performance for each individual program. Remember there's over a hundred programs but you're not getting anything specific about the performance of individual cohorts and I'm gonna turn back to that in a second but just to give you an example excuse me in fact of what this information is masking within the federal government there's no standardization of something like the definition of a default. So when you look in the credit supplement you can see default rates for different programs but it's not calculated in a completely standardized way and you can get things like default rates of 120% so it's also not calculated in a way that's consistent with how the private sector would define default. So there isn't any central reporting standards but that's complaining about what they do not what they reveal so let me go back to what they reveal. Okay so what you would also like to understand is how are different cohorts of loans performing over time and the question is what data now is out there about that? Well again all of this comes back to a very budgetary perspective and so what is revealed is something called subsidy re-estimates. Unfortunately credit is accounted for in a rather complicated way but the basic idea is that the budget records credit subsidies and those subsidies are meant to represent a lifetime cost of a new cohort of loans over the life of the loans so I like to use the student loan example. So this year all the new student loans have a budgetary cost that reflects a projection of all the cash flows in and out over their lifetime discounted back to the present. So the net cost is like the value of the net losses that the government is absorbing. Okay so at the time you have a new cohort of loans you have the subsidy estimate but over time things happen so default rates might be higher or lower than you expect or prepayment rates might be higher or lower and so there's something called re-estimates and those re-estimates track what actually has happened and expectations about what will change in the future. So those re-estimates are a way of talking about how well your original estimates work compared to what actually happens in the world but the problem with this re-estimation data is it doesn't really give you very specific information about when defaults happen and what's driven by other technical assumptions and what's the projection and what's actually happened so it really is again it's a very aggregate kind of way of looking at performance data and it's very hard to interpret. So what does the government have that they could or should share? Well in order to really understand what's going on in these programs to understand the unfolding pressures what we would like to have is a time series of the individual loan level raw data on performance and this really would be big data because there's millions and millions of records. What I should say because housing came up yesterday and it will come up again is that in the area of mortgage guarantees the government discloses more information than they do for any of these other programs so really the big deal in the room is student loans the 1.2 trillion of student loans but I'm not complaining so much about the housing disclosure but in any case you do have some loan level data on housing but you don't have it on any of these other programs that I listed. Okay. At a minimum, well before I leave the individual loan level raw performance data even more ideal than having that time series data for each cohort you would like to potentially link it with other administrative data in order to have a more complete picture of what's happening to say students over time from different cohorts when they graduate. At a very minimum if you couldn't provide that individual data you would want to provide aggregate cohort level raw performance data over time and this is in contrast to mostly what's reported now which is a portfolio approach so when I say a portfolio approach I mean that you're looking at statistics on things like defaults on student loans that mix default rates on loans that were made 15 years ago with default rates that were made last year and so you can't really understand the evolution of the experience of the borrowers or of the government when you aggregate things on a portfolio level so you have to really understand what happens to each cohort. So this information actually is available because as I said before it's an input into the budgetary process so the government does need to collect it. So why is all of this so important? Obviously there's transparency issues. When you have $4 trillion of credit exposure the taxpayers are absorbing that risk. Those risks could be larger or smaller. It seems like the public should have the information available to understand what those exposures and risks are. Another issue on the transparency front and the need to bring more attention to the statistics about these programs is that when you do think about them they really change your view of what a sufficient statistic is for the fiscal position of the US government. So some of you might have seen a few weeks ago there was an article that said what's going on? Deficits aren't growing nearly as fast as the debt and what's going on with that and the answer is well it's the student loan program because the government is making hundreds of billions of dollars of student loans and they're doing it through what's called direct lending so they're borrowing money through treasury in order to turn around and make student loans. So that kicks up the debt but it creates an asset which is the student loan along with the liability which is the treasury debt. So but for the credit risk it's kind of a wash for the government. So it looks like the debt is growing this large amount. It looks like the government has spent a lot of money but they haven't really spent it. They've assumed some credit risk but it's much smaller. So the whole way that you understand debt and deficits and the fiscal situation changes quite a bit. A few months ago I wrote a paper for Brookings which estimated the stimulus effect of these credit programs and came to the conclusion that they were as important as the American Recovery and Reinvestment Act in terms of fiscal stimulus during the crisis or another way to say that is even monetary policy largely goes through debt markets and if the government's footprint in debt markets is so large you're gonna see that monetary policy intermediated by these credit programs. So just to understand the world you need to be more transparent about what's going on in these programs. So obviously also this data is essential for program evaluation. I alluded to the importance of cohort data. You'd like to know who's being served and who isn't. We have some demographics. And I shouldn't say that nothing comes out because for instance the Department of Education itself writes reports about things like non-profit schools and how the graduation rates are terrible and the student loans aren't a good deal for them but that raw data to come to those conclusions is not made available to the research community and so we can't really check to understand whether to what extent that's true or what's the subtlety about it and all the rest of it. I also think that making this data available would do wonders for the quality of data within the government. So my experience working with credit programs at CBO was I was shocked and appalled. I mean and saddened. I mean I actually have I am someone who believes that civil servants do the best they can but the best they could was very, very, very sad. So on some of these smaller credit programs this data is being collected on an Excel spreadsheet and I'm not sure it survives from year to year. So there's these kind of you know if Citibank had some of their divisions doing their loan reporting on an Excel spreadsheet it'd be pretty alarmed and so we're talking about a financial institution as big as any of these other ones and so I think that just shining a light on this data would also help those institutions find ways to improve their record keeping and all that. I also wanna point out that there's been recently in very recently in the last few years there's been legislation which has really promoted the idea of evidence-based policy making and I think that's certain for the federal government and so you could almost read these laws as requiring the government to better use this data in order to evaluate these programs. This next point is a little more I think too edged. This data would be of enormous value to the private sector as well. So one of the things about federal credit programs is a lot of people believe that what they should be doing is providing credit to people who couldn't get it from the private marketplace. It should be filling gaps. Credit which is somehow never gonna be worthwhile for the private sector to participate in but you would like to make it possible for private entrants to serve those people who they can and I think the data from these programs would be invaluable for banks thinking about well who could we serve? If we had better performance data it would possibly allow the private sector to enter more. Some of you are probably thinking well it would allow the private sector to cherry pick from the government and so actually that would be worse and not better but my response to that is I'm not sure you should think of it as cherry picking because what's happening now is kind of budgetary cross-subsidization but nevertheless you want your subsidy dollars to go to those riskiest borrowers presumably the ones that the private sector won't serve and so you're hiding the fact that you're trying to help this very disadvantaged population by the fact that you're making massive loans to the middle or upper middle class but making it possible for the private sector to come in and fairly price to that middle class segment isn't exactly that kind of negative cherry picking in any case that's my point other people can definitely dispute that and then finally and importantly for this audience I think arguably the federal footprint in the credit markets has a big effect on systemic risk obviously the housing market was ground zero of the financial crisis it's still true that most mortgage credit risk is absorbed by the government and without the kind of data that I'm talking about I don't think it's possible for an organization like OFR of SOC to fully understand the stresses that are building up in the financial system so again to go back to student loans there's generally been concern in the media that what have these large debt burdens done how are they affecting young people's ability to form households to buy houses and so forth is that a building systemic risk so it's interesting because I think that the systemic risks that come from these government credit programs are very low frequency risks and a lot of times we're worried about high frequency risk like high frequency trading but still I think there are systemic risks that arise from this program for instance if you have your primary mortgage institutions making a particular rule and that rule turns out to create a systemic risk someone should be watching out from that because these agencies themselves have a particular mission which is to serve whoever they're serving it's no individual agency's mission to watch out for any collateral systemic risk they're creating so just like the banks don't have to directly worry about systemic risks that's why we have the government and the OFR in the same way these government agencies don't have anyone at the agency looking out for the potential systemic risks that they're looking for so again data would speak a lot to that okay I don't think this is easy there's all the issues that come up and will continue to come up today are there in the case of these government programs there's privacy concerns you cannot discount the amount of angst proposals like this create within government agencies we think of private financial institutions screaming about regulatory burden in some ways you get the same kind of reactions from program administrators who feel like they have very limited resources already and that those limited resources shouldn't be squandered on accounting on data and so forth so you get the same kind of internal resistance that you do in the private sector and I think there's legitimacy to that because certainly the government is in many ways running on a shoestring and it's hard to make something like data collection a high priority beyond that there's concern as there isn't any organization that when you open things up to scrutiny you'll get in trouble the programs will be defunded there'll be negative consequences to providing that data and so there's always the tendency to want to protect what you're doing so recognizing those challenges I think again that many of these are the same or similar to those in the private sector but there's some mitigating factors for these government programs and one does have to do with proprietary interests because the government isn't doing the kind of trading where they're gonna get front run or anything like that so I think making that data public it's public data, it's a public good there's not a lot of argument for proprietary interests again there are these recent legislative mandates that are telling the government that they should use data and so there's almost a law which says well they should figure out a way of overcoming these challenges so I'll just end there by saying what seems probably obvious that I think that sharing this data is essential for transparency, for program evaluation for control of systemic risk oh I wanted to mention there are efforts underway to obtain it so there's there's various little forays sorry yeah I'll hold those to general discussion and I'll just say let's just do it, thank you. Thank you, thank you David Lollett of the Bank of England and if you have an animation I'm gonna go make sure there's a lot of people here, sure. Well good morning everyone let me just begin with the standard disclaimer though it's a serious one that my comments are my own and do not necessarily reflect those of the Bank of England let me also start by acknowledging the hosts thank you Professor Barr and the University of Michigan for having me here thanks of course to Dick Burner and the team and my friends at the U.S. Office of Financial Research it's a great pleasure to be with you and a special thanks I think is owed to Karen Edmond, Jenny Ricard, Christy Baer who've just done an excellent job from a logistical standpoint organizing this conference I know I owe a special debt to all of you so thank you very much this panel of course is titled Data Sharing and just so you know that I practice what I'm about to preach Karen, Jenny and Christy have sent out this hour to your inboxes a brand new data set that's now posted on the Bank of England's website so many of you have your laptops, smartphones you can check your inboxes right now you will have received an email with this data set and Christmas has come early, ladies and gentlemen because I have given you what you've always wanted which is the Bank of England's balance sheet between 1844 and 2006 on a weekly basis data is a non-rival risk good so you can actually forward this on to your friends and family for Christmas and give it as an early Christmas gift now what's really novel here let me tell you it's novel of course we've published the Bank's balance sheet before but it's always been on an annual basis and this is weekly so it's this frequency of the data which is quite novel and just to give you an idea of the kind of analysis that you might want to undertake with this type of data we're actually doing some of our own so what you can do is actually take a look back at the Bank's historic balance sheet go down to weeks and see moments when there might have been financial crises in fact financial crises that hitherto we didn't know about when you're looking for an expansion in the amount of notes in circulation and a diminution of the gold stock these are moments of potential financial stress and what we're doing now is actually linking this weekly balance sheet data so this is research that I'm actively involved in at the moment we hope to publish it early next year and link that with even more granular transactional ledger data so what you're seeing here for example is a transactional ledger from 1853 and it gives you the information on every single loan that the Bank is making the names of the counter parties the rate at which that loan is being made what's the value of the collateral that's being brought in whether collateral's being rejected, et cetera and we're starting to do some preliminary analysis so here is some data from the 1847 crisis I know near and dear to many of your hearts and what you can see is and this is quite typical actually with this data it's highly skewed so the Bank is making a large number of loans to just a handful of counter parties and so then we can start to actually answer old questions with new data when and to what extent did the Bank of England start acting as a lender of last resort when was it going about using badger's principles of lending freely on good collateral at a penalty rate and we can then tell we know subsequently what happened to these firms was the Bank lending to firms that were merely a liquid but not insolvent or did they end up the counter parties end up being duds and just so you don't think that this is sort of antiquarianism I will remind you that Thomas Piketty a couple of years ago did some trawling of data 18th, 19th century tax records and it's really had an impact in out thinking about inequality in the 21st and I don't think that our research will have that kind of impact but nevertheless I do think it will help us better understand the Bank of England's behavior in the past and hopefully improve policy going forward in the future so having fulfilled my most important duty of the day which was to share data with all of you I just want to make sort of three key points because the way I think about data sharing is really at three different levels so we have data sharing within our organizations there's data sharing between our organizations and the kinds of organizations I'm talking about are central banks and financial regulatory bodies and of course then there's data sharing outside our organization primarily with the public who we serve so within our organization data sharing between our organizations and data sharing outside of our organization so let me start with data sharing within an organization so I recently did a webinar for centralbanking.com on the topic of big data and one of the questions that came up was who within a central bank should own the data should it be the statistics function the chief data officer division technology and in our case so I work in an area of the bank called advanced analytics which is a data science function who should own this data and I really challenged the assumptions underpinning that question because I think thinking about and trying to identify within a central bank or another equivalent financial regulatory body the owner of the data is fraught with difficulty because it makes sort of pregnant within that language the possibility that people are going to start to see that data as theirs and therefore they're not going to share it and where we're sort of moving within the bank is away from this language of ownership to the language of an ecosystem and so let me tell you concretely what that means many of you will be familiar with BCVS 239 which was this consultation put out by the Bowser Commission about effective risk data aggregation and that was prompted of course by the fact that in the financial crisis we soon came to realize that many financial institutions didn't have a full understanding of the kinds of risks that they were running the data was strewn about in different business lines different subsidiaries, different countries and it was never really effectively brought together and that certainly applied to private sector financial institutions I'd go so far to ask the question to all of us does that also apply to our own institutions do we as central banks and this is very much in the same spirit that Debbie's comments were made do we fully understand the risks that public sector institutions are running are we managing our data according to the BCVS 239 principles as well if we are going to be proverbial physicians dispensing medicine to the patient then we should be able to take some ourselves and we're starting to take that to heart at the bank and one initiative in the last couple of years led by our chief data officer division is to create and has been to create something called a data inventory and what this is is a log of all of the data sets within the bank now you might think that that's some easy feat but in fact it was really it's really been a step change because if you're going to share data in a sense that presupposes that you already know what the stock of data you have in your building is there to share and with this data inventory now we do and so what that allows anybody within the organization to do is to go on to the state inventory and see a log of all of these data sets search for them by tags that are assigned by users of these data so you can search by operational risk or credit risk or by the names of an individual institution and you'll get back some filtered results now where do we need to go next I think we need to move from a log to actually a portal a downloadable portal sort of a one stop shop where people could actually get all of those data sets internally you know with the idea that most data sets should be opened by default and to people working within the central bank and the reason that this is important is really just efficiency as I mentioned you know data is a non-rival risk good so why do we want to impose artificial barriers to entry for people who might want to do analysis so that's sort of data sharing within an organization now about data sharing between organizations so I have a friend of mine named Rosa Loster who's a professor of law actually quite appropriate since we're here in a law school in the UK and she has a very nice phrase that sort of identifies one of the key challenges that we all face which is that the financial system is global in scope but our financial regulatory apparatus still tend to be national at scale and so one way to overcome this incongruence is with data sharing and I think there have been a lot of recent positive steps in that direction so the financial stability board getting set up and under the auspices of a data gap stream starting to collect data in a standardized way through standardized templates for example on large exposures of course the work of the bank for international settlements should be mentioned here in terms of aggregating consolidated statistics and then sharing them with the public from central banks all around the world and in particular Orel Schubert who's here has been leading the IFC work the Irving Fisher committee's work around effective data sharing and actually Dick Burner yesterday made mention of a very excellent paper that he recommended and I also recommend to all of you which is about data sharing which was the result of this task force that Orel headed up and I should also mention a highlight for us last year at the bank which is that we concluded with the OFR an MOU and Memorandum of Understanding to actually share data between our two institutions and of course Matt Reed was sort of the champion and at the forefront of doing this and again that's important because at least in some state of the world sharing data across borders could be as critical in a future financial crisis as the provision of central bank swap lines across borders were in the last one. There are no doubt challenges here obviously politics but even leaving politics aside there are IT issues about the interoperability of different systems to exchange data but those challenges while they're daunting I don't think should in any way detract us from the prizes to be won and they are great which is that we can start to take different pieces of the puzzle and put them together and come to a truly systemic view of systemic risk and we're still a ways away from that so for example one of the highly touted parts of Dodd-Frank and of E-Mir which is sort of the equivalent regulation in Europe has been to create trade repositories that now allow regulators access to seeing all of the derivative transactions that are going on whether it's four words options, swapsions, you name it the problem still is that as a regular default most regulators are only seeing their part of the picture so in a UK regulatory context that means that the bank is typically seeing those derivative transactions that are either sterling denominated or the underlying is a UK referent or one of the legs of the transaction or a UK counterparty but wouldn't it be better if all of those G20 countries were seeing all of the data on a regular basis so I think that we still have a ways to go in terms of sharing data between regulators but there has been progress since the financial crisis and then finally let me talk about data sharing with the public so last year the bank published its research agenda and as part of that I was responsible for publishing six previously proprietary data sets they can be found on the bank's website down here and they contained several different series here but anonymized individual and company level responses to surveys that we conduct also sort of long run macro time series not similar to the long run balance sheet series I showed you earlier and then on the basis of that we put out a call for submissions by members of the public to actually work with these data and we had a data visualization competition I'm showing you here the winning result which was produced by Kath Sleiman who is a researcher at Nesta which is a research foundation in the UK that looks at innovation and she created this really neat interactive data visualization and what it shows is basically the G7 countries and all of sort of the pre-recessionary and then post-recessionary periods since 1970 and what's cool here is you can click on any of these sort of buttons so if you click on depth here it basically sorts the depths of the recession so you see that the most recent out of all of the G7 recessions since 1970 the one that the UK experienced over the period 2008 to 2013 was the most severe one of the most severe out of all of these G7 countries but certainly the most severe out of all of the recessions that the UK faced in that time period and so what you're seeing there is basically the depth of the recession as measured by sort of pre-recessionary GDP peak to recessionary GDP trough so it was a decline there of 6% and here is a real concrete example of where you can start to see the benefit of sharing data with the public you can start to crowdsource these data sets and actually gain new insights but I think even a more fundamental reason that of course we as central banks need to share data is of course and this has already been mentioned the issue of public accountability because a corollary of central bank independence is that we need to be open and transparent and the central bank in the case of the Bank of England 100% equity is owned by treasury, treasury is ultimately funded by the taxpayers and so in a sense the data that we have should be seen as a kind of public good I want to give you one more example here of some work that we've done I don't know if my chart's coming up here it may not show in Mozilla no, I'm okay, thanks while that's loading let me just sort of caveat what I said about data sharing with the public in sort of two ways and that is data sharing here means that it's with the consent of the institution that we're talking about and that means that there should be a legitimate public interest in that data and we get freedom of information request act freedom of information requests for what books are bank staff checking out from the library this is made front page of city AM which is a London newspaper last year I don't know that that information directly bears on the ability of the central bank to execute its mandate of promoting monetary and financial stability so we should share but it should be consensual and the data that we're sharing should be of legitimate public interest and the other thing to say is that data sharing here exists along a spectrum it's not just a binary choice for any of us to have closed data and 100% open data there are ways when you're sharing data that you need often if you're dealing with sensitive data to anonymize it in different ways so we talked about masking a bit yesterday so removing individuals names you can talk about perturbation where you add a little bit of symmetrical noise to each one of the data points or you round the values or some sort of generalization techniques where essentially you take specific discrete values and then you actually bucket them so rather than reporting that there's a loan made to a 27 year old in London it might be a loan made to somebody between the ages of 25 and 34 in something like this and I'll give you a good example of where we've done some work in this regard so a couple of years ago I worked with some colleagues at the Open Data Institute which is I guess what you would call 501c3 here in the in the US but basically a research non-profit and as their name betrays they're all about open data and finding ways to the world a more open and transparent place and we worked with them this was at a time when peer-to-peer lending in the UK was becoming quite hot as you know it's become quite hot here in the US as well but we didn't have a lot of data on it and so we went to the three largest peer-to-peer lending firms at the time which were Ratesetter, Zopa and Funding Circle which comprised about 95% and we asked if they would let us analyze their data not only analyze their data but actually publish it and that's exactly what we did and so we took that individual loan level data so we're talking about 14 million loans so semi-big data and we anonymized it we then published it here you can get the data you can see it's all available and then on the basis of this data we were actually able to build this cartogram that you see over here so this map that actually shows different regions within the UK and you can actually drill down to individual level post codes to see which parts of the UK are net creditors or debtors within this peer-to-peer lending market at least at that time when we did the analysis so it is possible to do this kind of granular data release and so I would just close by saying that the focus of my comments this morning have been all about the central bank sharing more data but sort of Contra Debbie's comments actually I'd like to place the onus on some of our regulatory our regulated financial institutions to do something like this to take as their example what the fintech firms here were able to do in terms of publishing granular data anonymized data working with us because you can only get so much information from annual reports and accounts you really do need to understand the data at a granular level to get any real insight and I think I will end there thanks very much Thanks I will speak with Matt Reed of OFR Good morning so I am the chief counsel of the OFR and after the call to action that Debbie made and the beautiful examples of sharing and public disclosure of data that David made I get to be the skunk at the picnic and talk about the difficulties that we face but I'll do it I think in a way that can provide some ideas for solving some of these problems first I'll start at the higher level and then get deeply into the weeds one thing that I think is useful to bear in mind as we see all this data that we know exists and we can't get access to is the policy objective for having collected it in the first instance so there is clearly a need that has been identified by whoever the regulator or authority was to compel the institution that created the data or the individual that created the data to provide it to whomever whether it's the public through a disclosure law whether it's the government through a reporting requirement but that policy gets reflected in the first instance in a law and then in a regulation and it's a very important point that I think Debbie touched on a minute ago which is the institution that is doing the collection can only do so to the breadth of its regulatory authority and that becomes a very very big issue for us as we in the general of all these agencies look to try to share and gather data so what is the policy objective if it's an investor protection objective you may seek a broad disclosure of the data that's the 10 Ks and Qs I think government has done a decent job the SEC has done a good job with its interactive data work it's XBRL to try to make that more accessible the process that is to collect this data is the legal administrative procedure act rulemaking process and what that does is it alerts the providers of the data the intended purpose of the collection and again that intended purpose is tethered to the jurisdiction of the agency that's doing the collection they take comments and then they try to wrestle with what is the level of granularity and exposure that is appropriate and I just give a quick example of how government thinks through this problem through experience several years ago the SEC issued a rule on money fund reporting and because they heard from providers of the data that there could be a competitive disadvantage if they had to reveal if these funds had to reveal their holdings immediately the SEC laid over the rule a 60 day latency period for the data so it wasn't made public until 60 days after after a bit of experience they rethought that and now have decided to require that the data be made available publicly immediately so the point of that is that the government agencies that are compelling the data are thinking about the policy objective that they need to serve when they make that compulsion occur on the other side is this legal often legal requirement but clearly a policy requirement to protect the data as is appropriate and what this is really about in my view is about making sure that we're honoring the concerns that were raised by the earlier panelists around privacy around confidentiality but not just because people want it to be confidential but because there is an underlying purpose for that confidentiality and so a competitive purpose would be a good reason we want our markets to flourish we want there to be competition among investment advisors so that better products can be made available so when the SEC first issued its rules it honored that concern and later learned that it wasn't so necessary and what this does these higher arching policy objectives do is they create an inherent tension for people like me and others in the regulatory community with regard to finding the right balance where we should set the disclosure or the sharing requirement I want to talk a little bit about how that plays out in specific regulations and this is one of the great ironies to me everybody thinks OFR should be able to collect everything from everybody keep it in house and then make it available to whoever needs it when I guess we or the other party thinks they need it right in that same law where it says the OFR shall have the power to collect from any financial company information necessary I'm paraphrasing for the service of the council or financial stability monitoring it also says we shall keep confidential that data subject to things like trade secrets subject to things like proprietary interests subject to things like privacy so even within the very statute that created the OFR you find this tension and what what really matters is the way that the staff who is engaged in these activities are able to maneuver within these two competing policy objectives I think of one of the main parts of the legal work at the OFR for me and my staff is to look at confidential data sets that we've acquired from other agencies and work with researchers who are salivating over that data and wanting access to it and then wanting to produce results of their work and again this all of the higher level objectives that we've been talking about previously are carried out by people so when Mark Flood who is probably one of our most prolific authors at the OFR wants to work with a confidential data set and publish a paper it's the lawyer who ultimately has to either advise the director or make the decision to him or herself whether we've sufficiently masked the data to honor whatever the legal requirement of confidentiality was there is absolutely no upside for the lawyer to say yes and there's tremendous downside and I think that's important because this is about incentives you know people don't get called to Capitol Hill to testify about the great working paper Mark to publish because I said it was okay to publish that working paper they get called to Congress to explain why there was a data breach so this is where I think as policy makers and as members of these government agencies we need to think about how we're incentivizing achieving these data sharing objectives among the people that are going to carry them out so when we get into a conversation about data sharing I'm going to I think focus mostly on interagency data sharing we've got something like 20 MOUs with other agencies state and federal foreign governments we have a dozen or so highly confidential data sets used for supervisory purposes our experience has been that we are met initially with the question what are you going to use this data for and you know can you articulate the need and the reason we get that goes back to this initial comment about the intended purpose of the collection in the first instance Dodd-Frank did a great job of creating the OFR and the FSOC with very broad authorities but one of the things it did is it said before we can go out to the public and go directly to firms to collect data we have to go to other agencies Dodd-Frank did not in all instances in the organic statutes of these agencies instantiate a financial stability mandate we know we've seen it to some degree in the Fed at the SEC Chair White had done a nice job of looking at the mission of the SEC as one that is ultimately in support of financial stability in agencies that have been able to come to that conclusion that they own a part of the broader financial system they get more comfortable with our answer to the question why do we need to monitor financial stability because it's always because we need to monitor financial stability there was a bit of a talk yesterday about knowing only being able to identify these trends when you see them and that means sort of fishing through troves and troves of data it reminds me of Potter Stewart the Supreme Court justices definition of pornography he said I'll know it when I'll see it well there is no lawyer and a fishing expedition they're going to be highly uncomfortable just saying we'll give it to you and let you tell us when you found what we need and so we have to articulate a basis for the level of granularity that we need with the data the second thing they always want to know is are you going to use it to embarrass us and unfortunately that's one of the political dynamics that exists in our environment we stand alone a monitor of financial stability and an evaluator of policies and so that gives us the ability to without any sort of regulatory relationship with those who are regulated objectively review a policy and make an observation whether it's reaching its intended objective whether it's too broad whether it's too narrow and that has the ability to create some discomfort for the agency providing the data to us so imagine the conversation right after we release a working paper that says the Fed stress test results are too predictable and then we have to turn around and we have to say oh can you give us some more stress test data we want to write some more papers about this so these are these are natural dynamics and I think the way that you overcome the concern is ultimately through establishing trust relationships between the agencies and the staff that work in these agencies and I'll talk about that a little bit more in a moment the next major question we get is are you capable of safeguarding the information so if I own the data at another agency and we collected it for a supervisory purpose and we can get comfortable that an extension of our original purpose for collection was to provide it to an organization like the OFR which will look at the health of the overall financial system the question will be does the OFR or whatever agency ultimately gets the data have the technical capacity to safeguard the data and will we is that sufficiently important in our culture I can say that I think it is in ours we recognize that if we are failing in protecting information the flow of information will immediately stop and so for us we've largely overcome that hurdle with our other regulators but this is a weakest link problem you know the minute the data leaves the you talked about owner I think you were the one that talked about owner the owner's hands that owner who is now accountable right for having safeguarded the data having maybe it's the IT people in the firm in the institution lets that go to another institution you lose control over the ability to say that is sufficiently safeguarded so there's a lot of due diligence that goes on in that regard and then finally what we recognize is that you know a pattern of conduct demonstrating the ability to safeguard data is important the existence in statutory authority of a need a bonafide need for the data and the ability to articulate a data is very important but ultimately it's the it's the relationships with the individual staff members or the individual principals that allows the free flow of this information what we end up doing as David was talking about a moment ago is entering into these memorandum of understanding the thing is these are not legally binding documents I don't imagine you're going to see the FAD going to court over an MOU against the OFR but what they do do is they reflect in writing very clearly the understanding that we have with respect to how we will use the data how we will treat the data and how we will safeguard the data one of the things that the OFR has done and Mary Mockdenberg is in the audience she's one of my colleagues and she sort of spearheaded this for the government is put together a collection of lawyers from federal financial regulatory agencies who are hammering out kind of best practices for these MOUs to expedite the process of sharing the data I think this is a really important step to have taken it sort of clears away the clutter around questions like what happens when a FOIA gets issued for the data what happens when a subpoena lands on the desk of the OFR looking for the data and so on and so there are things that we can do mechanisms that we can put in place like MOUs and in particular model MOUs that can help move the data more freely between the regulators I think the last thing that I would say on this point is what we need to see is a cultural shift among the financial regulatory agencies which would allow all agencies and their staff to own the broader responsibility for financial stability. Once that happens then the staff are incentivized to participate in this data sharing activity because they will own the failure for having failed to share the data if ultimately there's another crisis that could have been prevented because information was at our fingertips but we failed to share it so I'll stop with that. Great thanks all the panelists for great comments let's throw it open to the floor could you please wait for the mic and give your name and affiliation the comments are being recorded so we have a question here Hi, I'm Claire Brennicki from the FDIC I have two questions the first is at the OFR or at other agencies is there any possibility of looking towards the census model of having research data centers and either piggybacking on the census RDCs as they're already set up or starting a financial version of that and the second is sort of more general how do we is anyone thinking about how we deal with anonymization and aggregation of new types of data so like as Sandhill was talking about text, photos, things like that individual quotes and individual photos can be very salient in a research paper but there's these issues of anonymization and so I was wondering if anyone has a comment about that thank you Claire take the first one on these census centers yes, we've been looking at that this is one of these things that sounds excellent in the abstract and then when you get into the details it's a little bit more hairy there are two different ways so what Claire is talking about are Census Bureau clean rooms is that what they're called so the census research data centers hold census data but parallel to that using census technology but not census data are the federal statistical data centers using census technology but wouldn't be census data so I think that's the question so there are two ways to crack that one is to have the OFR or any agency that can achieve this designated as a federal statistical agency and that gets you under the umbrella of the legal framework that overlays the Census Bureau data once you do that then it allows private researchers to come in and access the data the challenge there is having OMB designate the agency as a statistical agency we haven't explored that too deeply the second way that this happens is what Matt was just talking about which is essentially a cooperative agreement with the Census Bureau that would allow an agency like the OFR or any other agency to place their data within these centers or one of these centers so on census servers and I think they have like security guards at the door and stuff like that to get access to the data that's something that we're exploring now the challenge there goes back to one of the first things I said which is when the provider of the data so you could say the Federal Reserve provided it to the OFR they did so having satisfied themselves that we had the IT capacity to secure this information and I should make very clear that the only individuals that would have access to the data are OFR employees OFR employees could include researchers who come on detail one day a week to work for us but they're OFR employees so placing the data into these centers and allowing non OFR employees access to them would require going back to the providers of the data and renegotiating these agreements but it's something that I think we're really interested in exploring because one of our key objectives is to create this virtual research community and that's a way to make that happen with respect to the anonymization I think others will have maybe more insight but I know we've done some work in this area I pointed to Mark a moment ago you've done some papers around sanitization of data so there are some techniques that I think are very much being explored one of the challenges we have is that we may have satisfied ourselves that we've sufficiently aggregated the data I think about like CDS data we do a lot of publishing with CDS data we might satisfy ourselves that we don't think that there are any data sources available that would help identify a particular outlier institution part of the CDS markets but it's hard to be completely confident of that when you consider data held by other institutions that they may be able to combine with that data to reveal unmask the party so some of the techniques we use we go to the other agencies who might have more familiarity we often go to DTCC which provided us the CDS data and asked them to sort of beat up the data for us and that's a way of testing it but I think the ability to test whether we've sufficiently aggregated is very important there was a comment about it yesterday differential privacy was one technique that was discussed I don't know if others have comments so let me just make a couple of comments on your two questions so on the first one your question is really about secondary data analysis and how we can go about doing that and I think here we can look to some examples of where this is being done by other agencies so for example in the UK all of the big social science research projects are funded by a body sort of equivalent here to the National Science Foundation and there it's called Economic and Social Research Council and it's now a requirement for all researchers who receive funding from the ESRC that they have to make the data that they collect for their primary purpose available for secondary data analysis and there's now a UK data archive of all of the ESRC funded research so that's a good example and sort of another example sort of in UK officialdom is HMRC or Her Majesty's Revenue in Customs so basically the tax authority has set up a data sharing room where where researchers can come in and actually take a look at tax data and sort of a third example in sort of closer to home in the central bank community is that the Bundesbank has set up something called the House of Micro Data and it's a great name isn't it and again the idea is to allow bona fide researchers to come into the central bank and make use of a data that cannot be publicly shared like we share normally various statistics that are produced by central banks but data that's of a confidential nature but nevertheless might benefit from a fresh pair of eyes on the second of your questions it's actually a very live question for me at the moment because one of the research and advanced analytics is to text mine supervisory letters so basically every year our banking regulators will send out a letter to the firms that they regulate that's basically like a report card and it sets out sort of the key material risks that from the PRA's perspective that's the Prudential Regulation Authority the bank supervisory arm of the Bank of England that the supervisors see as the key material risks and what we're trying to do is to assess whether we are writing and speaking to these firms differently on the basis of whether they're a large firm or a small firm a systemically important financial institution or a credit union whether we write differently to firms across time etc and previously these letters were really only ever seen by the regulators who are writing them not even other regulatory teams necessarily have seen another individual institution's letter well the governors are all very excited about this research they want it published but it's kind of one of these situations where I'm scratching my head thinking okay how are we going to sort of publish research about these letters that contain very confidential pieces of information that on the one hand strikes the balance of being insightful for the wider research and public and on the other sort of maintains confidentiality I don't have a good answer yet but that's a concrete use case of I think the challenge that you were identifying are there other questions over here I wanted to could you give your name please I'm Lloyd Atheridge with the Policy Science Center I want to return to a proposal Robert began to make when he was head of this congressional budget office it's been around for over a decade that there needs to be a presidential commission on federal economic data to rethink all of these things and especially not just do the intellectual task but to bring together all the stakeholders and users to take a fresh look at how fast we're getting data how we get things done I have that in mind because I think we should be thinking of the day after the November elections in this country as a critical point to plan for a lot of energetic discussion about we now have a new mandate it's time to get things done I can imagine with President Clinton and Elizabeth Warren saying it's time to get this done on all these data and to just hit the ground running with the transition teams for major initiatives probably with several dimensions that actually help everybody practically to address all of these things and send messages to agencies that the emotional consensus is changing in this country and we're really going to start to make progress and one point I could bring to this discussion from a World Bank Gates Foundation discussion on how do you create learning systems in which people share data is that we need some sort of higher level agreement called WIFM which is their acronym what's in it for me and if it's just saying we're going to regulate you or maybe embarrass you without giving you more money that's not a very compelling and exciting and enrolling kind of process so in the biomedical world and in others where this has succeeded of data sharing it's a bigger upfront investment in the consensus building process but everybody says we've got to answer the WIFM question for every single person we want data from Thanks very much Are there any other questions we could Hi Linda Avery I'm the Chief Data Officer at the Federal Reserve Bank of New York and I have a question for David relating to the inventory that you had brought up before so in New York we are also in the process of standing up a catalog of digital assets we are expanding it a bit to also include work product things like mapping between the NIC data and the CRSP data that type of thing one of the interesting dynamics that we're facing however is that we sometimes are getting some pushback from people in terms of registering their assets this really plays into the data sharing dynamic in that people are afraid they're going to suddenly be overwhelmed by the number of inquiries they're going to have to face off to and they don't necessarily consider facing off to those inquiries to be to their job description so I was wondering if the Bank of England in their experiences actually faced that same type of challenge at times and again this is not everybody but we definitely hear this noise and I was wondering how has it turned out I'll save you suspense and just say well so it's turned out well but of course it's a long journey from start to well I think with any of these data initiatives you really have to have executive level buy-in and I think we've been fortunate with the current governor that he's really made data top priority so in his opening address to bank staff he actually uttered the words metadata which I believe I don't know for sure but in sort of 320 years of the Bank of England's history I'd venture to guess that that's the first time that a Bank of England governor has used the word metadata and so you know a few years ago data was sort of pushed into the background let's say relative to research there's been a concerted effort within the bank to really put them on par and I think because of that when these data initiatives get rolled out it appears on sort of everyone's internet explorer browser that you know there's this data inventory you must comply you should comply and then that's usually a message that gets sent down from in our case the chief operating officer who's got sort of deputy governor standing and so I think as with any sort of change initiative you really just have to get senior executives to buy in and push that message because it is a non-trivial task to get everybody in the organization to take the time to register all of the data assets as you mentioned I think that's a critical point that you're making the problem you're facing because there's a lot of time from other agencies who are afraid that we're not going to understand the data and therefore misinterpret it and therefore create negative market impact you know even financial stability risk and so we constantly get back giving it to you if you don't have the expertise to use the data properly we'll put a burden on us because we'll have to kind of hold your hand the whole time and there's not an easy solution other than to try to demonstrate all that hand holding so thanks very much for a great panel I'm glad to hear that you're thinking like a statistical agency maybe I think there's much to be learned from that model with taking data that's highly confidential aggregating in a way that's very useful but it does not have risks of disclosure and I think that's a very good model you have so much to do but being a statistical agency it would be a great thing whether either informally or in spirit so let's adjourn now continue this over a break and I think we're supposed to be back at quarter two so a quick break and return in about ten minutes