 I'm accustomed for the last session of the conference that everyone's really ready to go but it's great to see that we still have a full house and people are still talking, so thank you. Well, let me start by thanking Michael and the center in the University of Michigan for hosting us here and a special shout out to Karen and Jenny and Christie for doing the hard work of keeping the trains on time. This is a panel on data integration visualization which is a little bit of an odd combination but I think we're pulling it together here. I'm going to start with an anecdote and then I'll have just a couple of quick slides. Last year at this conference I delivered a quip about how in economics we think a lot about information theory but a big part of what we work on especially at the OFR is information reality and that's particularly apt for this conference where the theme is big data. It's very much about the implementation issues. The anecdote is how I got into this. My training is in financial economics and I came to DC in 1999. I spent the better part of a decade leading up to the crisis doing data modeling for supervisory risk systems and when I started into that nobody told me that data modeling was a thing or that schema integration was a thing. We sort of figured it out on our own and I remember having a series of epiphanies a couple years in because I was lazy. I would go to Google and see if there were maybe software packages or tricks I could download. The epiphany was that almost everything that we figured out on our data modeling teams were in fact solved research problems usually from computer science sometimes from philosophy or other fields. It gradually dawned on me that there is in fact this whole other universe out there and in the supervisory world where we're dominated by lawyers and economists and accountants we often don't see the data issues as research possibilities in themselves. And I think that's a big mistake. Yes there is drudgery in wrangling data but there are also really interesting, formally hard and deep problems to be addressed. So let me introduce our panel quickly. We've got four experts, a very diverse team. So we are trying to mix several different components of the information stack here. And one way to think about what we're doing is to riff off of Zach's presentation from the last panel. The visualization is a user facing part of the information chain. It's delivering the information to the human visual system. And at the other end we've got sort of the raw data that's got to be integrated. In the middle there somewhere, in order for the visualization to work well you've got to have some sort of stable abstractions that the visuals can play off of. And it's the job of data integration to get you to that point so the visualization can do its thing. So we're going to start our panel after me with Orel Schubert who is the director general for statistics at the European Central Bank. And I apologize for the typo in the catalog. He is not the director of general statistics. He's the director general of all statistics. Amal Deshpande is a professor of computer science at the University of Maryland. Peter Sarland is an associate professor at Hunkern University in Finland and also director at RISCLAB and coincidentally a co-organizer of a systemic RISC conference every year in Helsinki. And Margaret Varga wears several hats. She's the chair of the NATO exploratory visual analytics task group. I think I got that right. Visiting fellow at Oxford University and a director at CTRU which is a visualization consultancy. Oh, got to do this. Sorry. Views and opinions expressed with those that the speaker did not necessarily represent official of our positions of policy. So I'm just going to show a couple of quick taxonomies from opposite ends of the stack to try to frame the discussion and if not convince you at least get you thinking that there is some useful structure to these problems. So this is looking at the visualization end of things and there's a paper footnoted there that Margaret and I co-authored with Vicki Lemieux and William Wong. And one of the questions that the visualization folks need to know is a requirements question. What are the tasks you're trying to achieve? And in financial supervision it's not like say an airline flight simulator where for an airplane the tasks are so cut and dried that in fact everything from takeoff to transit to landing can be done on autopilot a little bit like Google cars. The tasks in supervision are much messier and so the categories here are much more general but we boil it down to four big things that are important tasks where visualization can help. So I'll just pick on the first two here, sense making and decision making to give you a sense of how this plays out in practice. So sense making is the sort of stuff that we do in the research department trying to take raw data undefined problems and make sense of it, right? So you don't know in advance where the interesting patterns lay, which data points are noise, which are outliers, which are mistakes. You need to sift through it and for visualization that implies you need typically lots of detail and lots of interactivity. The next two steps really, decision making and rulemaking are very different. So for reasons of accountability you can't have typically a lot of unstable visualizations in the middle of the process when the commission or the board or the committee or whoever it is comes together to make a decision. If there are visuals in the evidence before them, everyone needs to see the same picture and it can't be a picture that responds to individual input. It's got to be a fixed picture and so the emphasis for sense making is on sort of technical requirements of being able to deliver a lot of information in different formats quickly. For decision making it's really on getting the abstractions right and delivering the key abstractions because you're not going to be able to revise them after the fact. Another breakdown, these are just four different types of financial stability data, numeric, geospatial, network and text. You can see that these four flavors also end up with four different renderings. One thing about visualization is you've got a human in the loop by definition and humans are much less constrained in the sorts of data they can ingest and make sense of than machines. So there is a vast amount of flexibility in how you will render things and therefore a vast amount of rope with which you can hang yourself. But there's a literature on this stuff. There are good ways and bad ways to do it. Back at the other end, this is looking more at the raw data and this again, we've got a, if this interests you, there's a footnote to a paper where you can get the full blah, blah, blah. That's with Jag and Luika Rashid on big data challenges in financial stability monitoring. We boil the dimensions of financial data, at least at a high level, down to coverage, frequency, granularity and detail. Coverage means what's the scope of institutions or markets or portfolios that you're going to measure. Frequency is temporal frequency. Granularity and detail, this is an interesting distinction I think. Granularity at least as we use it is, if you think about it, a database refers to desegregation of the rows of the database. So position level detail, which is something that Orrell will talk about, is very granular and it's the way supervision is headed. Detail refers to the columns in the database. So what are the attributes you're going to collect about each of these things that you're measuring? All right. With that, I'm going to turn it over to Orrell. Orrell has been, and the ECB have been doing really neat things with financial data. And he'll talk us about it. Thank you. Thank you, Mark, and good afternoon. And thanks a lot for the invitation to Dick, the OFR, and to the hosts of the University of Michigan, Michael and his team. So it was two very, very interesting and intensive days here. Now, big data conferences, today you have almost every week somewhere around the world. And I was just thinking since the end of the summer, this is already the third one. I've been at the end of August in Dublin at the big data conference of the UN together with Eurostat and big data. Then two weeks ago in Ljubljana, the Slovenian statistical office together with Eurostat, again on big data. Now the third one, but what is really different, this time that it's now really about finance. The first two ones were basically in the area of the real economy and global positioning data of ships for balance of payments purposes, or using, as some countries do in Europe now, telecom and telecom operator data to allocate tourism receipts in the balance of payments, or things like using Google Trends for unemployment prediction, or maybe part of the CPI done now via web scraping, et cetera, so these kind of things. But this time finally we arrive also in something that is really in the area of finance. Now what I want to do in the next 14 minutes left is really to give you a little bit an overview of several initiatives which are going on in the European Central Bank, which fit to the topics here, which are, which you see, similar topics, similar challenges, related challenges. But they are very much from the practical side how we try to address these challenges which our users have for us. There used to be the saying if the US sneezes, Europe catches a cold. Now I would say if the US sneezes, we have pretty much similar symptoms, and you will see some of the things are rather similar as we have been discussing here, and I've just seen which direction. Okay, thanks. Just quickly, maybe just as a little bit as a background. Initially the European Central Bank was created to do for the monetary policy of the euro area. So there was only one purpose to do monetary policy, but it got very strong statistical function by law, allowed to collect its own data for its own purposes. Initially it was only monetary policy. Now with the crisis the things have changed. First of all, by 2011, the European Systemic Risk Board, what is the FSOC here, was created, and the ECB was put in charge of doing all the statistical analytical work for them. And so we got this new macro-prudential function. In addition, the macro-prudential function of the ECB itself was extended. So we are supplying now them also with data, and since 4th of November 2014, now also the banking supervision of the significant institutions, which is about 130 consolidated banking groups, is with ECB, and now we are also supplying them with all the data for banking supervision. So the whole scope of data collection has enormously extended, has been enormously extended with that, the work of ECB statistics. What was also crucial in monetary policy today is not monetary policy 10 years ago, because initially the notion was that this is a homogeneous euro area. So you only needed the data for the euro area for decision-making, and I remember having been till six years ago on the other side of the table, being told, you know, your national little Austrian data, you know, we are interested in the aggregates, and then you're 3% of the aggregate. So that's not really important. Now what has completely changed with Greece, Cyprus, etc., that there is no homogeneous euro area, and the heterogeneity needs to be reflected in the data. You need the data for the heterogeneity. So the averages and the aggregates no longer tell the story. You need the distribution, you need the tail, you have tail risks, etc. So that completely changed, and I'll come back to that in a moment, and what we are collecting, or it's changing, or complementary to what we used to do traditionally. We also have to do now much more what we call granular data, micro data, and I will mention some of these initiatives in a moment. So that is, I think, important to remember. So there are now these new responsibilities, but there is this heterogeneity, and also the market fragmentation. Just as one example, for instance, as we used to get the data on the country, so for instance the loans of the Spanish banks overall. In the meantime, what we found out is obviously Spanish banks. This is a very heterogeneous group. There are a few really world-class banks, and there are a lot of zombie banks. And unless you understand that, you don't understand how the transmission mechanism of monetary policy works or doesn't work. Or another case, in a moment, I'll come to that. Money market broke down. 2008, 2009, money market was drying out. The central bank took over basically at the function of the money market. But once you look into the micro data, you realize it only broke down for some, completely, for some it got more expensive, and others got liquidity more or less as before at similar prices. So that now you only know if you have this data, but it's absolutely crucial for decision-making. And that's why it's also important that we are very close to the decision-makers, to all three of them, so to speak, basically in the same house, although physically now in three different houses, being just too big for the big house. But so that the disclose-ness is very important. And so I will just address a few of these initiatives. The first one, and going again back a few years ago, first of all, I remember on the 15th of September, 2008, the question to all central banks, Lehman Brother, problems yesterday, how many Lehman papers are you holding in your country? I was still on the other side of the fence, so to speak. It was pretty easy for Austria because we had already since 1991 as a security by security database, so it was a matter of 10, 15 minutes. You just had to know who is Lehman Brother, and that leads back to LEI and these questions. But at least those where Lehman was in the name, we knew immediately what was the exposure of the Austrian banks, the Austrian insurances or the Austrian households or whoever. Many other countries said, what's the question? How should I answer the question? On what basis? We only have aggregated data. Or a few years later, who is holding Irish bonds? Who is holding Greek bonds? Who will be heard by a haircut of Greek bonds? This could not be answered. So now, fortunately I can say now we can answer it a few years later. Why? Because we have two elements. One element which was already started many years ago, but now has reached maturity, which is very good. One side is the supply side of security. So we have now a database, micro database, all securities which are held or traded in Europe. So it's somewhere up to 10 million securities, whether equity or bonds, etc. So we have all the issuers, with all the reference data, you have it here. Who is the issuer, the name, the sector, the country? Is it fixed variable? What is the outstanding amount? What's the price every day? What's the price information coming every day? So you have a very, very rich database about the supply of securities. And now, since the last three years, the fact that now we also have a security holder statistics database, where you have initially sector by sector, so how many of these Isins or how many of those on the left side are held on the right side by French banks or by Italian households or by the Irish government or whoever. So according to the sector. So you have now very detailed holder data. So both sides are now, and then we are at the question of integration. These are two silos for the moment. And the big challenge now is to bring the two silos together so that the analyst goes to the machine and can look at the overall picture. Actually, the day before I came here, we had the first pilot presentation of our colleagues trying to bring the two things together, linking it to, and I'll come back to that in a moment, to our single data dictionary, that all the expressions are the same, or if they are not the same, the analyst immediately says why is one defined differently than the other and what is the difference. So really bringing the whole thing together and bringing it together on the newly created or new to be created analytical business intelligence platform. So this is one, I think, important initiative. And we have not only, and that has to be said, not only on the sectoral level, we have the security holding statistics. We also have it for the 26 largest banking groups, secured by security. So if you want to know, not me, but the supervisor board wants to know how many bonds of a specific type, a party buys holding, we have this information now. And we will extend this to the 130 banking groups which are under the supervision of the ECB. But initially when we created this, it was created for financial stability purposes and we took the 26 largest ones. Now we need 130 because that's under the perimeter of supervision. So that is one thing big part of going long as a security by security. But security is only one part of the balance sheet and I'll have to speed up a little bit. Now a few other things. So the one is mentioned already that the money market. So the money market was assumption that the money market broke down. The basic thing was we didn't know too much about the money market. So we have now this money market statistical reporting which is on the left side, daily reporting by around 45,000 transactions every day from the money market counterparts with the LEI. What are the prices? What are the quantities? So we get this now since July and now it's very useful information already in the days. We already had it a little bit since April so around the Brexit it was already very crucial for our analysts. Coming now to data sharing, I just got two hours ago the request that some Belgium financial supervisor authorities want to have access to this data so I don't know why but we'll have to look at it. Then maybe another important area on the right-hand side which is just in development is now because one big thing of the balance sheet of the banks is obviously securities but the biggest thing in Europe are loans. So a loan by loan database which comes under credit. So really a granular database where every loan to non-financial corporations above 25,000 euros will be recorded in a common database, a common data registry from about 2018 on. So we are talking about roughly 150 million loans with about 90 different attributes from the LEI to fixed variable rate, who is the counterpart and what are the securities, et cetera. So now whether this big data or not, I just wrote an article about Anacredia called it pretty big data. At least for us it's pretty big. If you're used to working with aggregates where you had a colleague who looked at it with 18 countries and said, but this one looks strange, now you have 45,000 a day, you cannot look at it and say, this one looks strange, you need different methods and different things. So it is pretty big data. So in order to make sense of it, obviously we are involved quite a bit and we try to promote the legal identity, we're involved in the development of the UTI, the UPI. We have made our money markets that is reporting ISO compliant and now the Bank of England uses the same which we did, which is very good. Maybe just on the legal entity, I just to say how there is the EU prospectus directive in the moment, in parliament and councils or directive about what needs to be published if you are issuing a bond and the ECB in its opinion said there should be the LEI in there in the icing required which was not in the proposition of the EU commission. Now in the meantime after a lot of discussions it is now in the proposal of the parliament and the council and hopefully it will survive but the commission had not thought about it but that's just how we try also to promote the LEI. So this just looks a little bit complicated but you're much more used to much more complicated pictures as I've seen in the last two days. Here's just the idea which was also mentioned by several of you, bringing industry on board when you develop new reporting requirements and so it comes in our case under the name which we have at the bottom of the first column BERT the banks integrated reporting dictionary. So what we did now with this new anacredit this loan by loan database we sat down with the banks we invited first of all the national central banks because we are not allowed to go directly to the banks via the national central banks. Seven of them said yes we go along. So we are sitting down for the last few months with 26 commercial banks and go through the regulation and try to get a define from your transactional systems where you give a loan how do you get to the reporting what kind of data to take, what kind of transformations to do and put this into a manual which will be published soon. It's a purely voluntary thing but it's kind of I don't know, proud intelligence so it's the best, it has no legal stamp it doesn't say that's the official ECB position but it is the best guess and it will be used and can be used by every other bank it's a public good and by all software companies who are working for banks that's how we interpret how do you get from your transactional systems to the reporting to the authorities. And now the big challenge for all of these and then I'll finish for the bank is integrating all these things. We have developed in 18 years or 19 years now many many different statistics, monetary balance of payments and now also granular data but what is the value added? Now for two years we now also have the banking supervisory data which I didn't mention yet so the core fin-rep supervisory data for all the banks. So now the question is how to link the whole things together because that's really the value added and for that you have to work on several layers namely you have to work first of all like others have done and we have done this too to develop an inventory, a data inventory so two years ago I asked one of my colleagues can you find out so does the ECB know what the ECB knows? Now in the meantime we have an inventory it will be now improved so I'm finishing then you need the semantics which is a common dictionary and then you need the technology and then you need a visualization at the end where everything can become brought together so with that I can give you I think a quick overview of which we are going and the next slide just shows you this platform at which we are developing the front end but I think to come to the end you see the challenges are very very similar we are working on them we don't have a big master plan it is much more step by step not a big vision but very concrete steps but it all adds together hopefully to exactly what the policy makers need so I'm looking forward to any questions thank you Thank you Aurel so Amal and Peter both have presentations about the middle ground between the input data and the visualization end and Amal will go first Thank you Mark and thank you Michael and Richard for inviting me to this workshop there's been a fascinating workshop to hear the perspective from so many different angles about the kinds of problems that are there in finance and regulation and so I'm in computer science I'm at the University of Maryland I work primarily on building data management systems so sort of at the lower level we are trying to support the processes that are going on above the data science processes data analytics process and my goal is really to understand how to build these platforms that will allow us to simplify the process of doing data science doing it in this new world where we have collaborations lots of analysis, lots of machine learning and so on so the purpose of my talk is basically twofold in some sense some of the sort of hard computer science research problems that are there that we are working on, me and many of my colleagues are working on and also in some sense it's something for the users to watch out for people who are sitting on top and are doing these analysis there are some things that you might want to think about as you are doing your analysis as you are taking your data sets wrangling them, processing them and so on some of these problems that I'm talking about are not going to be manifest immediately there will be something that will come up like six months down the line or two years down the line when you are asked to understand what's going on so at a high level and we have seen a picture like this quite often like in today's big data world we are using data science tools everywhere in many of the case cases the process looks something like this, you have a team of collaborators who are working on trying to do something they are fetching data from a whole bunch of different sources, they are integrating them together they are cleaning them they are applying different types of models to them they are adding more data sets to it different users are doing their own things and at all this time the data is continuously evolving we are moving towards a world where we are continuously getting updates literally every minute so before you know it across many analysis steps across many users you are going to end up with thousands and thousands of data sets and that might be an oversimplification if you really think about every step of the way you probably have a much larger number of data sets and managing those data sets in a systematic manner is a fairly challenging problem so we have tons of really good work on individual analysis steps people know how to take one data set and extract really useful insights from it but we don't have very good support for the rest of the process we don't have very good support to keep track of what goes on and keep track of all the data sets that come out so the contention is the hypothesis for our work is that the pain point has increasingly become the process, the managing the process itself rather than figuring out new algorithms for analyzing a specific data set not that that's not really important but this part, the process is kind of forgotten in many cases there is very little platform support for many of these steps so if you are working on data analysis you are kind of working with a file system maybe you store your data in Dropbox Google Drive to be able to share it and generally speaking you are on your own with respect to managing those data sets so some of the problems that come up that also form research problems for us are there is massive redundancy across many versions of the data sets and this can be a real problem especially for science data sets and if you start keeping copies of those you do run out of space even in today's world and analysis pipeline so here I mean going from input data all the way to output data is typically spread across many many different scripts in some of it might be in some matlab script there might be a user interface tool being used for doing one step and generally speaking the actual end to end pipeline is not there in one particular place in most cases unless you are very disciplined and are willing to work with a specific set of tools this makes it hard to identify bugs, errors, analysis pipelines, simple things like saying okay I did this and she did this what's the difference between the two can actually be a fairly hard problem information about how data sets evolved which data sets or scripts were used to generate which data sets is often lost again if you are unless you are very disciplined it's very difficult to keep track of these things and especially if you are doing scripting with Python or R or something you are changing one parameter here rerunning the program changing another parameter rerunning the program almost no one keeps track of exactly what they are doing and this kind of makes it difficult to do what I called here forward and reverse reasoning there are other terms for it for instance if you find an input error can you identify which output records might be affected by it but I alluded to some of these problems in his presentation also another related problem is how do you explain a specific result this is especially tricky if you have this big black box machine learning algorithms which are impossible to reverse so you point to a specific output and say explain this output typically you can't really do that without doing it manually the second related problem here is that models are in integral part of data science and today we have moved from traditional simple models to really big and complex models so 20 years ago we might have fairly simple regression of the models today we have models themselves that are billions of parameters and much more and also in many cases these models are packaged together with the results in the form of notebooks or ipython notebooks or other similar tools and again the similar type of problems that I mentioned sort of come up here managing these models understanding what they are doing are very difficult problems that we need sort of more work on so our research here is sort of motivated by these problems what we have been building is this platform that we call data hub it's a collaborative data science platform so there is joint work with Sam Madan and Aditya Parameswaran roughly speaking and this is just a schematic of it we have quite a few different things that are going on in here roughly speaking data hub is centered around data sets so for us the integral unit that we work with is a data set for us a data set is immutable so if you change a data set and so for us that's kind of the integral unit at which we work with and so data hub has several different aspects to it a data set management system which allows you to import and search and query across data sets a version control system and in some sense there is one of the harder research problems that we have been working on is how do you build a version control system for data sets and if you are familiar with Git or GitHub it's a similar problem but for data sets and I'm happy to talk about why those two problems are quite different from each other we've been building what we call a provenance database system so it's a slightly separate unit whose goal is to capture provenance and dependency information across data sets and try to do it as transparently as possible in that we don't want to inconvenience the developer or the modeler and try to collect this information sort of underneath without really a workflow and then once we capture the data set once we capture this provenance or other metadata how do we do interesting kinds of analysis on it and that's been a major focus of our work at Maryland and then data hub also consists of an app ecosystem that allows you to easily borrow app or to buy apps from other people and apply them to your data sets to do interesting things which can allow you to reuse other people's work and so on so just a very brief summary so this is for a more technical slide the kinds of work we are looking at we've been looking at the issues of storage and retrieval so as I mentioned one of the hard problems here is if I have lots of versions of data sets how do I deal with them from a data management perspective and so we've been working on systems and techniques and algorithms for doing that we've been working on query language and interfaces to look at this versioning and provenance information and do different types of analysis on them and then we've also been working on life cycle management of large models with a specific focus on deep learning models and again here our goals are to understand how can we help let's say a computer vision modeler to simplify their analysis process, their modeling process so I'm happy to talk more about this at any point we are suddenly interested in hearing if you think there are some ways that we can solve your problem so you can tell us other problems that we haven't thought of I would love to hear from you I do unfortunately have to leave almost immediately after the panel but I'm happy to talk so my contact, my web address is there and I'm very happy to talk afterwards as well thank you thank you Amal and Peter is next okay thanks a lot Mark and thanks to the organizers it's a pleasure to be here Mark mentioned our conference on systemic risk analytics we've organized it for two years now and we've had the pleasure to have Mark there for both of these conferences despite having a sauna event in the social program I think we can't compete with this conference it's really been a pleasure to see the sessions that you had this morning and I think to make a second reference to Mark because that's apparently what you're supposed to do I think this conference is covering interdisciplinary research as well as Mark is in his own research so it's been a pleasure to see both the first panel on data sharing and I think there will be a few connecting points to what I have here and then also the second panel on analytics so my background is in computer science and I spent a few years at central banks and now I'm back in an economics faculty position at hunk and school of economics and I've been doing quite a lot of modeling and some visualization on the side and now I'm basically connecting to some old work on interactive visualization that we did basically at the same time as Mark and Margaret were doing their paper on visual analytics and financial stability monitoring and what we're bringing to that is annotatable dashboards so the possibility to track experts work around data and this relates very much to the previous talk it doesn't go that deep into the analytics process so the machine side but stays more on the human side of the analysis process now if we relate to what Mark was covering in the beginning to basically various categories of visualization then obviously we could categorize them into sense making decision making and so forth now basically this is another way of saying the same thing we have exploration which is basically focusing on sense making communication which on the other hand is focusing more on telling a specific story now usually in visualization we are not at either end of this spectrum but rather somewhere in the middle and I think what David was showing previously in the morning was more towards the communication part but not entirely just communication but also allowing means to interact and explore to some extent on the other hand what we have been doing and what we will be showcasing here is more towards the exploration part where you are driving insights and you don't have a specific story to tell now this is my perspective to data in general in macro potential oversight in systemic risk analysis and I think this has been covered a few times already but and again coincidence that the market slide on dimensions of the data and basically I am viewing it similarly we have entities which would be large volume data we have time which would be high frequency data we have variables which would be high dimensional data these are the standard three dimensions of a data cube and what we are adding on top of this is the fourth dimension of interlinkages across entities and of course these links then form networks the size of the network depends on the number obviously you might have time varying networks and you might have multi-layer networks so you are adding a full second cube through these networks and this is the starting point that we take to visualization now our previous work related to visualization was a Swift funded project which basically was coupling risk communication to macro potential oversight more generally and then visualization to risk communication in that context and I won't go into the details of that somewhat boring review paper but what was the end product of that was an interactive visualization platform that was covering precisely the data cube that you saw previously and covering two traits of that that we judge that have not been covered as well as we thought should have and one is interactive visualization now this project started in 2013 at that point everyone wasn't doing interactive visuals now most of us are but at that point one of the references was ESRB's risk dashboard that was a 40 page static PDF document and this obviously we would have wanted to be seen as an interactive dashboard now a second trait of the big data that we've been talking about is that we judge that we need analytical visualizations and this is yet another term among all other visual analytics and information visualization terms but what we basically mean with this is that we need ways to simplify the data that we're looking at and eventually we're restricted by the capabilities of the human visual system to process data so we can't process data even though we could visualize it and most often we can't visualize big data because we're restricted by the pixels of the numbers that we're using so eventually we need various ways of reducing dimensionality and reducing data and that's something we've been working on in the sort of machine learning arm of our work so I'll briefly go through what we have in the VIS risk platform but you can find it online if you're interested at vis.risklab.fi so we have basically three different sets of applications one focusing on standard plots there's nothing really special with this here you see an example of a bank level early warning model which provides bank distress levels or distress probabilities here you see a Greek bank highlighted you see the rest of the Greek banks in the background and if you would deselect this filter you would see all the European banks and likewise you could filter out various cross-sections and look at how cross-sectional distributions have evolved over time or look at the entity and really drill down into all the variables that are explaining the distress levels for these banks at a specific point in time so this is just slicing and dicing the standard three-dimensional data cube nothing special to that now essentially the maps category of applications that we have in the platform focuses on ways of decreasing dimensionality and reducing the volume of data so here we are projecting 14 financial stability indicators for 28 economies globally for around 20 years quarterly data with that we are creating a two-dimensional display the display represents four different financial stability states pre-crisis-crisis, post-crisis and tranquil states and based on these 14 indicators we can project these economies onto the map and understand in all these dimensions their state of financial stability and the final category of applications focuses on networks and here you can see a Granger causality-based network which looks at CDS returns and by billion others and what you are seeing is linkages across banks insurers and sovereigns and obviously you have various ways of filtering this you have various ways of rendering so choosing different visuals to the same data and at the bottom you can see you can also follow the centrality measures for specific nodes in this network and what this essentially means is using essentially all four dimensions of the data cube because the standard cells in the data cube are the centrality measures at the bottom whereas the linkages are in the top view but eventually then so we worked on these types of interactive visualization with a larger number of organizations and we have seen how these organizations work around data and among others we worked on the mentioned securities holding statistics and provided a dashboard for exploring that data set which is a tremendously rich data set but then eventually the main question after however much we are customizing these interfaces and they are not really solving the challenge so basically what we say is that it was painful to see the work of experts built up into isolated knowledge silos due to the lack of proper tools for human interaction around data and this is essentially what we are focusing on now so we are basically taking these visuals one step further to allow humans to interact around data to track that and turn that into documented and structured data so if we think of how we analyze data we are quite often sitting behind a monitor where may have means to interact the visuals that we are looking at however the analysis process is quite often also a social process now the question is how we are how we are supporting that and especially how we are tracking that so this could be a realistic version of how you are interacting with others you have various forms fragmented into several channels and you are not tracking that precisely well you might have bilateral emails and see seeing a large number of persons in your division but nevertheless not providing a structured means to document that so that's essentially what we are working on in what we call silo brain which is a dashboard that connects organizations with the experts work by allowing experts to speak directly to the data so point to patterns make annotations and discuss with colleagues and if you would sort of in the information visualization or visual analytics community you have a large number of different mantras like the visual information mantra over you first two man filter details on demand now what we would say is that interact with visuals to find patterns annotate data and discuss to elaborate and then search on demand and the search then focuses on the metadata that you have been annotating with so this is eventually how it could look like in case of a risk dashboard where you are just looking at time series of inflation in this case for 28 global economies on the right side panel you basically just have a feed that is coupled to individual data points and provides additional metadata to these individual data points and on the upper left corner you have your big key which is basically the full knowledge base around this data where you can filter and search and you have a feed that may either be based upon the people you are following or the data you are following so another application so we are working with a number of central banks on this prototype that we put together and another example of this is a London based startup called Almax Analytics that essentially is turning new so textual data into semantic networks of numerical facts so Almax is doing event and value extraction from textual data and turning these numerical facts into semantic networks which eventually then are turned into relevant indicators and are used to estimate impacts on stock prices now basically what this is creating is nothing else than standard networks and here you can see a snapshot of a semantic network which connects a few stocks of companies from the solar sector based upon their new project so you can see that for instance Sun Edison is having a project in the large project in the US and a small project in South America and you can see that there is an annotation coupled with this by for instance an analyst portfolio manager or say a trader that is getting the output of the Almax system okay so with this I'll end my part and I would be happy to continue the discussion both after our presentations and I'll be available also after the conference thanks a lot thank you Peter Margaret Forger is next it's been a pleasure to be here and thank you Mark for inviting me and OFR as well as the Michigan Law School I'm going to start by myself and mindful of that I'm the last one so I will do it swiftly I'm the chairman for the NATO Exploratory Visual Analytics Research Task Group and so today I want to show you some of the work that we do and how it applies to big data so just one slide promise our work is to work to research and promote the deployment of visual analytics techniques among NATO members in the NATO application area and so we're interested in developing and applying the genetic tool to support situation awareness for many different domains such as financial, aviation, healthcare, maritime and cyber is multidisciplinary and also cross domain cross domain is across the land, sea, air space and cyber domain and we talk a lot about reusability of data we also look at reusability we look at transferability of technology from one domain or one application domain to another domain and this is very effective and extremely fruitful that by understanding the insight knowledge about one domain we can apply it into different domains so cross domain discipline is also extremely useful so big data we all know is huge volume and high velocity they're also annoying they have a variety of different forms and there are elements of uncertainty and velocity and also variability in nature but we also need to ask ourselves what is the value to us is it relevant to us so why do we want to show data visually while we create data every day even our GoFish has data associated with it so it creates a few data science and changes how we actually conduct research understanding physics, chemistry, aviation and finance and it is a quantitative record of the real world by visualization human vision is the highest bandwidth that we think visually we do it all the time it's very fast, it's parallel we're very good at recognizing patterns we're also very good at detecting changes it's also pre-attentive it's something we do without even thinking about it and vision helpers also extend our memory and also increase our cognitive capacity so let us have a look at what do I mean by pre-attentive this is pre-attentive just using color you can immediately spot the red spot and this is hard wire in our brain we don't have to learn about it so here's another simple example using different shape why are we interested in pre-attentive attributes because pre-attentive attributes that pop out immediately if we know which attributes or features are pre-attentive applying it in our visualization display that will improve the effectiveness of our visualization we explore data is to gain insight from the data this help us make decision based on the data but also help us form or change opinion about important issues we learn from what is going on and discover new phenomena the purpose of computing is insight not numbers and Stuart Carr, John McKinley and Ben Siedemann said the use of computer supported interactive visual representation of data to amplify cognition the challenges we have are how do we transform the data into information that enable to understand and derive insight make it useful given the data how do we make people let people understand the phenomena so what we are aiming at is the light bulb moment aha, I know now so the insight is that the goal for visualization is to give us insight not give us pretty pictures sometimes I would say that you want to see something that is actually useful help you discover something you don't expect explain to you what is happening be aware of the situation so that you can make informed decisions this is some examples of the work we have done in looking at financial data in efficient safety in cyber security and sentiment analysis in maritime domain so David told me to give you some pictures so I do that so in financial situation awareness why is it important because it may help us make informed decision for maintaining stability and making risk there are various different risks we have compliance risk we have credit risk we have operational risk we also have cyber risk so we need to know the nature of the financial services the availability the confidentiality the operation and infrastructure and very often we are reactive but more importantly we also need to be proactive to make predictions of potential states and vulnerabilities the challenges is how do we analyze how do we integrate which we have discussed here in this conference and then present the massive amount of data for multi sources big data so that in a tractable comprehensible and usable manner so in order for us to maintain situation awareness for financial stability we need to know how a situation has developed how it may progress or may change and predict what might happen however traditional approaches are shown to be very inadequate to make sense of this big data being challenged so Vigil Analytics offer a possible solution so Peter mentioned Vigil Analytics so I would just have a brief introduction about Vigil Analytics Vigil Analytics is the science of analytical reasoning facilitated by interactive visual interface it actually combines automated analysis technique with interactive visualization so to make us easy to understand reason and making decisions on very big data set so it enables the analysts to fully utilize their cognitive and perceptive capability to support making very using advanced computational capability so that we can understand the data, the situation and making a timely decision it's a multidisciplinary of Vigil Analytics so this is the scope of Vigil Analytics so first component is Vigilization Vigilization makes out of information analytics, geospatial analytics and science analytics the second component is Human Factor Human Factor is very important because it makes the link between the computer and the human after all the system being the targeted audience is human so we need to understand what makes it effective how to interact and so we need to understand the cognitive and perceptive elements of the user so we have to produce some useful summary on presenting it so we can disseminate the result analytical result to the intended audience and finally it's supported by the data analysis it's exploiting and benefiting for data management and knowledge representation, knowledge discovery and statistical analytics the difference between Vigilization and Vigil Analytics is this part is the data analytic part Vigilization has interaction with Vigilization Vigil Analytics also have the data analytics part so in Vigil Analytics we are looking at three categories, Vigilization Hypothesis Analysis and Exploratory Analysis what is Vigilization our starting point is we know in advance what we want to present so we decide on what appropriate presentation we can use what we get is a very high quality visualization to show the fact that we're interested in in Hypothesis Analysis we started some hypothesis about the situation whether there's an issue about credit and then we proceeded very goal oriented to prove or dispute the hypothesis so our outcome is visualization of the data to help confirm or refute the hypothesis Exploratory Analysis is very much the Bay Taker issues problem is we don't have hypothesis about data because we don't really know about the data we don't know very well about the problems so the process is we interactively direct and direct search for some structures transit pattern what we get the outcome is a visualization of data that lead us to one step up for hypothesis and visualization so in Exploratory Vigil Analytics we explore and explore real data to answer the W questions what, where, who, when and which so that we can understand the why so we want to detect the expected and unfortunately to discover the unexpected in trans, pattern, behavior anomalies or weak points so that we can support making informed decision so the critical aspect is that we need to know our problems we need to know our data don't let that data ruin your analysis of visualization and also know your analysis don't let that analysis lead you to wrong flaws of optimal decision and you finally you need to know your user what are their tasks so we start with data follow with the algorithm and then we visualize and the user perceptions are interacting with it user will interact with both the visualization the algorithm you use and the data to decide if there is a data gap and just take time and effort and there's no magic button so we're interested in the user center of Vigil Analytics not the data-driven analytics but just on the user what they do and the decision need and the skill and the mental model in the financial visualization most user interfaces show statistic and aggregation using line chart, bar chart graph, network geospatial representation would it be a good time for us to now consider developing some other visualization to meet the different needs, tasks and decision in the financial sector so for example we have different visualization of the same situation the first one is the first one on the left is Sankey to show the flow of the information the middle one is also show the flow information using stack donut and then we tree map and also pack bubble they are showing the same thing using the same data the decision is to make is governed by the user what does the user want, what sort of data which is the most effective for the user we can also look at dependency and relationship graph as well so having said that I want to just show you we do use this new technique we have this is cyber security many people have mentioned cyber security so on the left is the Sankey look at the left hand side the green is internal hose right hand side the red is external hose incoming traffic is brown outgoing is blue there is not much outgoing there and then we also have double dual bar chart and we show all the different things so we can see that it may be useful for credit flow, debt flow other entities and so just to show that that we have used it for looking at post scan external IP scanning of one per site and people were talking about DDoS this is an example of DDoS is multiple external so I will write so advantage that I would also like to thank people we will not be able to have this interactive approaches to visually explore big data without the advances of database to allow us to be able to filter aggregate on billions of documents in near real time so last one so we need to decide which data set and sources are relevant and that will help us enhance our situation awareness applicable to multiple purposes but we also need to be mindful about the origin, the provenance, the trustworthiness the value and the relevance of external data sources and we must avoid data rich and information so as Herbert Simon said the information consumes attention of its recipient hence the wealth of information actually create a property of attention so let us focus our attention thank you thank you Margaret and we have time for some questions I'll kick things off with a question for Orel Orel you talked about the uses of the data for monetary policy and I'm wondering it must arise that the requirements for monetary policy and the requirements for financial stability sometimes come in detention how do you decide in the statistics group how to address those trade off I think Amanda the kind of data today which the colleagues want both in monetary policy and financial stability basically there is a convergence here they want more or less the same information so for instance initially for monetary policy purposes we only collected the aggregates the country aggregates and produced like M3 out of it but then with the crisis came then the demands for individual data and so we get in the meantime at least for 300 banks for the biggest ones also individual loan data etc so broken down the balance sheet more in detail because one thing I should have said that it's clear the aggregates are based on individual data but the individual data or bank by bank data is the ownership is with the national central banks we only get the aggregates but now since the crisis and we also get now what we call individual balance sheet data individual interest rate data that again that is extremely interesting also for the financial stability colleagues and what is more maybe the challenge is then simply where to put the limited resources if you have competing demands and for that last week the board decided to implement the so called data committee which then in the future will help us to set priorities so it's not always us to have to say who gets what priority but there is increasingly the demand that all analysts want to have more or less the same data like this money market data was created specifically for the market people the treasury people the treasury department but immediately the demands came immediately from economics from research from risk management from financial stability everybody wants to have so there is more and more that all analysts want to look at everything I'm going to follow up on that and then I see Michael's got a question the so I'm imagining the monetary policy folks could be most interested in the exposures and statistics on banks and the money market things like hedge funds or the securitization markets would be secondary for them is that the case there there the focus is as you say correctly not always always exactly the same what they are looking at but it's always interesting if you tell them your board member that they want to have access to the energy why in the world do they need this there is always a questioning while somebody needs something but I think overall everybody wants to look at have a more what is called now holistic picture so look at look at everything because potentially they will say the information or interesting information can be in places you didn't think about before so that in that sense and we try not to discriminate as long as the need to know principle and then the access rights legally are okay Margaret this is really following up on your last comment the end of your presentation about the interaction between the presentation of information and mental bandwidth so I think sometimes we have this mental model that the human being who is experiencing this visualization is the same at all time periods but we know that when information is maybe most needed urgently in the visualization format our brains are in a different place we are in the middle of a crisis and so in a crisis moment we have our bandwidth has shrunk dramatically we have other psychological phenomena we know that scarce bandwidth causes and other phenomena so how should we think about the choices in visualization technique in these very different time periods the semi normal time and the crisis time in terms of the psychological factors that are a play in the human brain we are working on a framework for data visualization to look at the follow the workflow of the analysts because this dashboard we look at the order of sequences and also look at simulated data simulation case studies to see where if it is under you have to now make this decision in two seconds where would they go and then identify what are the salient attributes that make them make the decision so in many ways we need to look at waiting the data not all data of equal importance under different circumstances which is more important to make the decision so that is condensing further instead of exploring is now decision oriented so that would be the goal for being able to monitor and make snap decisions because for example cyber is very quick things happen immediately so proactive and we can have predictions for people like poor scanning is of reconnaissance can we do some prediction in prepare for something like that so that's how I answer your question I have a question for Peter Salin I like the dashboard that you showed could you give us some examples of humans interacting with the dashboard and what improved specifically in the finance of such domains we are now showing results as dashboards but we don't usually let people interact with the dashboards I think if we take examples from so I could take a very concrete example I used to be with the ECB in DG macroprudential policy working on financial stability surveillance and what we spent most of the time on was writing long reports of several people who read them but in most cases unfortunately several people didn't read them and this is a way to basically move our analysis process into in a sense unstructured data whereas what I would be proposing with this is that we would be documenting our analysis process and structuring it and facilitating it directly into data and allowing then that information to be distributed in an organization through pull and push functions so either sort of vertically in an organization say from up in the pyramid pulling from the bottom or then distributing it horizontally by having colleagues working on similar data but sharing their expertise directly so I think this is the idea annotating the dashboard is more precisely just annotating individual data and that structures our knowledge I could have a short note on the previous question by Michael I think this relates a bit to the question that we had at the end of the keynote where historical relationships and anomalies were related to each other so in a sense I think visual analytics often times relates to some extent on a model and this model might relate to historical relationships and now this doesn't allow really your mind to shift in that way because in a sense you are doing analysis based on historical relationships on the other hand there might be cases where you can't rely on historical data and then you might be interested in outliers or other anomalies and that's just a different case my question is directed to Mr. Schubert what is your assessment of the compliance among European countries among European institutions specifically in insurance and finance business what is your assessment of their compliance with the first pilot of Solvency which became effective this year or their equivalent to the directive of Basel 3 if you understand correctly then you're referring now to the Solvency 2 directive which yes the Solvency 2 directive is the insurance basically the Basel 3 for insurances and in that sense it's so that the ECB and financial stability has its own needs for insurance data and so what we did in cooperation with the OPA with the European Insurance and Occupational Pension Authority we developed the reporting of the ECB in addition to the supervisor reporting for Solvency 2 so we added it we also had XBRL coding of this reporting but we only got the first reporting six weeks ago or so so we are still looking at the quality and we intend to publish it by early next year and by next month to give it first time to the analysts so we are still working in the quality area but I know from OPA that there are challenges like all completely new reporting the special insurance industry was not so used to be a heavy reporter to say it politely so it's a big change so there are challenges but we are working on them so it's too early to say anything about the quality of the data I cannot say Hey, my name is Sri Lakshmi and my question to the panel is let's say we have historical data and we are able to visualize it in the best possible way how much of this data in your experience has helped in forward looking analysis in specific context to finance like I assume we all know history is not a representation of future but the idea of visualization and model fitting is all about what can we say about the future what what would be your answer or do you think it's not useful or anything I guess I'll start I don't think it's necessarily all about the future that's certainly being able to predict is wonderful very helpful but it's not the the only assignment we've got so being able to understand the situational awareness of an evolving crisis is a crucial task in itself in the space of an hour or two hours you can get oriented to which markets which institutions are the source of the instability that could be enormously helpful and backward looking analysis is a very different assignment but also very important so doing a forensic analysis of what happened during the crisis typically involves much more detail and granularity to sort of piece together what was the sequence of events and if you ever see documentaries of the NTSB after a plane crash tiny little fragments of the airplane are assembled in a hangar they spend weeks diagnosing exactly which part failed it's a different sort of analysis but very valuable I'm not sure whether visualization really is the key concept here if we would take Mark's categorization of sense-making, rulemaking, decision-making and so forth I think we could have these objectives without visualization we could just be looking at numbers visualization is just a way to make use of the human visual system in the loop in addition to say machines and other type of ways of rendering data thank you so picking up a bit on what Michael was asking about being in the right frame of mind to really take in the visualizations many of them are very beautiful but frankly the average Joe would have a hard time interpreting what they're seeing and I think very much about some of the ways that I'm looking to use these data visualizations with the bank examiners generally coming from a very very different frame of mind and I think a little bit back to the treaty that was signed around road signals and road signs and how there was a global agreement in terms of how these rules of the road were going to be visualized and I was wondering if you could speak a bit about any emerging agreement in terms of these visual conventions that would be helpful I think in the future in terms of people learning to understand what these visualizations are about Margaret, do you want to talk about the upcoming meeting at Dayton? I'll show you an example that given the same data there are so many different ways of visualizing it and that's why it's important often by the user perspective and what the user is trying to do so only the user can decide but the problem here is that there is no guidance at the moment for what is the most effective one and sometimes it's personal preference than what they used to that's why we're still using bar chart pie chart and line chart because this is a bit weird but we want to look at breaking the mold to look at introducing new visualization that would be more effective because we measure the measure of effectiveness and that is looking at a lot of the cognitive science human factors in identifying the salient points in what make them change what make them accept it because we are human being we're very reluctant to change I'm very happy with what I do why do I want to change what is it in it with me unless the benefit can be shown to them very noticeably they would not want to change and that's why understanding the mental model and their way of working would change how visualization would be able to improve the current achievement I could follow up on that point you may have seen that the Museum of Modern Art in New York has just adopted as part of its collection the original group of emoticons created for mobile phones I guess and so my question is a couple of the presenters touched on this briefly presumably one of the reasons why we are interested in visualization is because as a matter of psychology visual intelligence is perhaps the most direct way of reaching individuals and conveying understanding and I guess the two-fold things that I would think about are first of all are you looking at research that's done by social scientists with regard to the effectiveness of visual means of communicating information and secondly is there a possibility that graphic designers might have some role to play in this context yeah so I think also relating to the previous question this is really I mean in finance we would be concerned with financial literacy in visualization you're concerned with data literacy and these standards are evolving and they are evolving say much faster than they've evolved previously but of course a lot of this that you've seen here would probably not be then for the group of people that you might be designing visuals for and that's I think that's perfectly fine I think a lot of the research Margaret might know more about this but a lot of the research behind visual analytics and it was called information visualization before this newer term but essentially the same thing but a lot of that research is biological research so it's just related to how our human visual system functions but of course there is a role for social sciences obviously as well but I guess that relates to data in general and how we interpret and understand data but I think biological research is something that we specifically need to understand if we are to understand say visual variables and how we are to use various visual variables and this you also told the element, the cognition and perception that under different circumstances they are different but not only that different people have different ways of working as well and through knowledge and experience that would be different as well We are at the end of a lot of time if there's one more question we'll take it otherwise let's thank our panelists I should have checked before everyone leaves let me just say first of all thank you again to the University of Michigan this is a great pleasure for us to do this I hope this won't be the last of our interdisciplinary conferences this particular focus on big data on accessibility, on quality and on scope I think is an important topic we've only begun to really get into some of the issues here and the important thing here is that it's not just about the technology, it's not just about the data themselves but it's about all of the disciplines that interact with the use of our data and the distribution of our data and the way that we make them available to people for use and to try to explain what we're doing either as policy makers researchers or as people who are interested in having in this case a stable and strong financial system so I really want to thank everybody for coming and for participating and I hope the next year's topic if it is next year will be equally interesting there are a lot of takeaways from this conference I know that we're going to be writing up there's a conference volume that's coming out from last year's conference in preparation and there'll be a lot of good takeaways from this conference so just thanks everybody for coming and for participating and for making this a terrific conference thank you all