 I think it's about time. I'm glad so many of you were able to stay till the end of the meeting. I know that for some of you going to the East Coast, staying to the end of the meeting presents some dilemmas, red eyes, an extra night or slipping out a little early, but I think we are going to really make it worth your while for staying this afternoon. So welcome back to the closing plenary. Before we move on to the main event, I've just got a couple of things I want to do. I want to just remind you of a couple of announcements in your packet. One is for the Designing Libraries Conference, the location of which have just been announced. Another is for the JISC-CNI meeting in Oxford this summer, and I hope I'll see at least one or two of you there. And finally, of course, the date for our December meeting, I want to note by the way that we are well along on planning for another digital scholarship center workshop. We're not quite ready to announce that yet, but we will announce that on CNI Announce when it's ready. With those housekeeping announcements, I want to say thanks to a few folks before doing anything else. We had an amazing number of presentations at this meeting. As we've restructured the schedule a bit, we've been able to accommodate a lot more presentations because we have room for 30 minutes as well as longer presentations. So you've heard from an awful lot of people who have contributed to this program. And I'd like you to join me in thanking all of those folks. Thank you. I would also like to thank the CNI staff for putting this meeting together. And along with the staff, I'd like to express some special thanks to two wonderful volunteers from San Diego State University who helped us to capture a number of additional parallel sessions on VoiceOver PowerPoint so that we have those available for you and for your colleagues going forward. So I'd really like to thank the whole team that made this meeting run as smoothly and effectively as it was. Thank you. And with that, I'm going to introduce our closing keynote speaker, Professor Larry Smart. Larry and I were trying to figure out a few minutes ago exactly how far back we go. And we're pretty sure it's at least 30 years. Back to the days when I was at the University of California and he was at the National Center for Supercomputing Applications. You have a sort of a stock bio of Larry in your program, but it really doesn't, in many ways, do justice to what a force he has been for advanced computing and for the convergence of advanced computing and advanced networking to support scholarship over decades and decades. He was one of the key architects of the program that really brought what was then called supercomputing into the mainstream of science and engineering in the 80s through the NSF Supercomputing Center program. And as part of that, he wound up as the founding director out in Illinois Champaign Urbana for the National Center for Supercomputing Activities. That operation changed your life more directly than some of you may remember. Larry hired a bright graduate student by the name of Mark Andreessen. And they produced a little thing called Mosaic, which was the first graphical web browser. And really in many ways, for better or worse, was the fuel that really made the worldwide web take off. Before then it was textual links and very much a line mode kind of affair. So that was an absolutely transformative moment. By the way, I was looking back and if you were going to CNI meetings in the very early days back in the early 90s, you would have had a very early look at Mosaic in a session called Navigators and Navigation, which CNI put on in one of its first meetings. Now Larry has gone on to do all kinds of amazing things since then. He has been central to many of the advances in high performance networking, in thinking about cyber infrastructure. He has served as a key advisor to the National Science Foundation and the National Institutes of Health, the National NASA. He has done incredible service for our government and for the progress of science. You may have seen work he did some years ago on the Optiputer, which really I would say fundamentally changed the way we think about really big displays and really high resolution displays and visualization. And in recent years, he has been one of the drivers of something called the California Research Platform, which is now evolving beyond California into a genuinely national program. I mean, it actually was reaching outside of California rather rapidly, but now it's serving as the seeds in some way for a genuine next step in nationwide cyber infrastructure. And I think what he's going to do today is tell you about that and try and place it in a sort of a historical and evolutionary context. I've been hoping to get Larry to address CNI for a long time. This is his hometown and these days at least. And so it couldn't have worked out more perfectly. Please join me in welcoming Larry Smar. Well, thanks very much, Cliff. It's a great pleasure to be here because actually, although I'll take you through sort of the front tiers of big data networking and computing and so forth, actually what's left to be done is what you do. So at the end, we're going to come back to data discovery, curation, annotation, and in a whole new world that you haven't worked in before, where everything's going from 10 to 100,000 megabits a second. So that's what we're going to talk about. It's important to understand how these things happen from the federal agency point of view. So I'll take you back over 30 years ago to when Sid Karan and I wrote the proposals. I wrote mine in 83 and his in 84 to an NSF that didn't have a program to create national supercomputers centers and that changed very rapidly. But the reason was that we cloned the Department of Energy supercomputer facilities, the idea of mass store, the idea of visualization, which was light green on dark green at that time, and which the graphics commands were lift pin, move pin, drop pin, move pin, pull pin up. But that is what we brought over because we'd worked high in Livermore and Sid in MFE, the magnetic fusion energy network. And so this is a once in sort of three decade event when the Department of Energy, where there's the nuclear weapon program and a lot of other things keep, get things to happen earlier. Well, I'm here to tell you that 30 years later it's happened again. And this is probably the biggest change in NSF funding for your campuses in its history. The DOE came up with this notion as big data continued to exponential from all the scientific instruments for instance and clusters and so forth on your campuses and came up with this idea of something called DMZ. Now they might have come up with a better name, but we're stuck with this. And I'm not going to have many word slides like this, but just to show you fundamentally it says that there's a separate network on your campus for scientific or engineering or humanistic or social science big data and that that requires special data transferred nodes to hold the data and terminate the optical fibers basically. And then you do a lot of performance and testing and throughput of this and you have a different security mechanism than just something like firewalls. So DOE coined this term in 2010. We had actually at UC San Diego gotten NSF proposals, they optiputed that Cliff mentioned in 2002 and then we had built the first campus wide version of this back in 2004. And that plus an NSF workshop called the report called the Campus Bridging Task Force led the NSF to start a program which is rather phenomenal and yet not that well understood or known about. And this is the campus cyber infrastructure program which over the last five years had made over 200 campus level awards in 44 states to establish separate 10 to 100 gigabit big data think of it as like LA freeway right inside the campus but a big freeway for data separate from the commodity internet. Let me put an example at UC San Diego there are probably 40,000 users of the commodity internet and they run all over one 10 gigabit backbone. Now this is where I give you a 10 gigabit to 100 gigabit just for your data. So that's the difference. And here's an example of one of those awards the we call it the prism UCSD prism award. And we now so we've had now since the process of this court site and then this one probably 30 or 40 different sites on campus where we have direct 10, 40, 80, 100 gigabit access to physics department, biology, script oceanography and so forth. And then there was a separate CC award to put 100 gigabit per second link between the SDSC and scenic the campus the California research and education network. Here's another example this is just one of those 30 or 40 this is Rob Knight's lab who's he's one of the leaders of microbiome analysis producing we're running he and I are running about a million CPU hours a year on the comet supercomputer that's about a bit more than a CPU century so that run your computer 24 hours for 100 years we use that much per year. So to generate data from a massive sequencing genetic sequencing of your microbiome well that data goes from his lab to from the gene sequencer to his lab over to Cal IT to my Institute Qualcomm Institute is the UCSD version branch of it and then in 120 gigabits a second over to SDSC where he has a thousand node cluster but there's also the supercomputers there and then if you need to bring down data you can go across internet to and bring it from places like NCBI which is where the biological data genetic data is at NIH also we have a ten million dollar grant from IBM to Tudor Watson in the microbiome so that it can actually add that so the logical next step which was a proposal I put in a few years ago to the NSF is to link together many of these campus DMZ's into a regional DMZ and we have all ten of the UC campuses the three private Stanford Caltech and USC and then one of the Cal State San Diego State and then up to the University of Washington and that runs because the only way we could afford to do this was because of the investment of all of the members of scenic the research and education network over all these years that has produced probably the best optical regional network in the world now the interesting thing is this isn't a networking exercise although it is an exercise but we are looking for how it changes the workflow of the use cases of scientists who are in distributed multi-campus teams generating big data in many different fields and so there are over 50 top scientists that are part of this as well as the CIOs and network officials from at least 30 campuses now to put this in perspective the funding for this is about three FTEs a year so this is probably the largest volunteer activity in the history of networking in the US and it's a great experiment to see what happens now again we couldn't do it without this amazing thing called scenic it's a non-profit I've been on its board for 15 years or so but it provides the networking to 10,000 K through 12 in California hundred over a hundred community colleges all the Cal State 23 two dozen of those the ones I just mentioned the privates California but also the latest edition is almost 1200 California libraries and in fact the LA public library has is the first library to have a hundred gigabit per second connection to scenic so in case you think libraries aren't a part of this actually some of them are at the point of the spear and because the there's 20 million California is using scenic it's not for commercial it's not it's just the university you know research community and the education research community but it connects 12,000 sites you know it's just amazing 8,000 miles of optical fiber so knowing helping having built that up over the last 15 years is why I knew we could do this in California most places are gonna be more difficult but you have regional networks everywhere and I'm working with the quilt which is the organization of regional networks and there are many regional networks that are now following our lead on this now the thing is if I come to you and give you a fiber optic and I said the good news is you now have about a thousand times the speed of the internet you've been having to plug it in well to what you know I got that blue cat six thing I got I got lots of plug-ins for that okay what do I do with this thing so there are of course optical plug-ins now most you know Mac laptops have had a gigabit or 10 gigabit plug-in for a long time but just take a PC you can get a 10 40 240 100 gigabit network interface card and so we decided to build on the commodity base that's what I've always done because the commodity base gets you know much more powerful every year without you spending any money on it and so you basically imagine a big data PC now the thing about it is if you're gonna have to be accepting data at that speed you have to have really fast storage so the rotating storage isn't gonna cut it so we use SSDs and we put in terabytes of SSDs solid-state disks in these things and in fact more recently the non volatile memories even faster than that and that can keep up with this flow and keep TCP from backing off as it traditionally would have done at these speeds particularly at these distances and so we developed this very simple concept called Fiona's which are flexible IO network appliances and because what we're interested in is not campus gateway to campus gateway we're interested from your disk to the disk you're trying to get to at these speeds now that's an unnatural act because if you're in a building on a campus your department determines the networking in that building then the campus CIO determines it on the campus then the wide area network vendor then another CIO who does things differently on their campus on the remote campus and then a different department right now that's why nobody had the job of guaranteeing you have disk to disk good connectivity so I thought that's great I'll apply for that job and that's what this grant basically does so we invented Philpapodopoulos in particular on the prism grant came up with this thing and you can see there for under $10,000 you can have these these terminating devices but the thing is cool is imagine you only need a gigabit per second sustained we can do that for $250 and so we're now in our workshops actually having workshops where we we train people up on these and then give them to them and then they go home and plug them into their network and begin to debug their campus network and their relations to others what that lets you do is imagine now that the Fiona's are you know like this thing is the terminating device for both the wireless internet and Wi-Fi right your PC is the same for the wired internet so these are terminating devices for the for the for the dedicated optical fibers so what we do then is we can use like grid FTP to actually move a 10 gigabyte file four times a day between all of the Fiona's so these rows and columns are just all the different endpoints well when we started you'll notice that orange means we couldn't get ping to work but by last summer the green means that you have five gigabits a second disk to disk throughput four times a day as measured and we have now extended this so that is now up at 30 gigabits a second that we're using on the these things called mad dash which again the Department of Energy developed now how many of you know about Kubernetes okay the rest of you need to get on the the bandwagon here this is an extraordinary thing because fin fundamentally what we're doing is building a distributed computer but it's like a single computer because the fiber optics are faster than the back plane of your local cluster so although there may be speed of light latency it's essentially acting like it's all in one rack and so who else does that Google has created a worldwide set of data centers connected by fiber optics and when you do one of a billion searches a day that Google handles they replicate that at multiple sites multiple data centers right just for so that you don't lose anything the same goes with a lot of the other stuff Google Amazon all these guys do that Microsoft and so Google actually decided we have to figure out how to do that well first of all was the idea of containers so you take your software they want to execute pudding containers and Docker and other things have been doing that for years but now at Google everything that is software runs in a container that then can means that it can go around and be and drop on to any computer and it has enough internal information to execute on that machine with no human intervention so then you need something to orchestrate this flow of different containers with different software across multiple data centers and so forth and so they developed that software into what is called Kubernetes and they made it open source and gave it to us this is hundreds of millions of dollars if not more development and and testing right now many companies started saying well this looks like a good thing we're going to use it so in 2017 basically Kubernetes conquered the world and they're now over 40 major companies including all the cloud providers that support Kubernetes this is a whole another game because now essentially with this middle layer you can in you can actually treat this distributed computer finally after 30 years of journey toward this point as a single computer you just throw your container on it and it takes care of things now it's a little more complicated but anyway the point is that this also does something that we haven't even begun to really think about which is that because the cloud providers have have adopted it this means if you're running on your local cluster or local computer and you have an account on one of the cloud providers it just transparently can be going over to the cloud when you want it to whereas before it's always this big barrier right how am I going to get to the cloud so we are now using this aggressively across across the Pacific Research Platform but it gets better here comes the Ginzanives particularly for you folks okay and that's storage so as you know Ceph was developed at UC Santa Cruz as a object storage system Rook has been developed as an open source file blocks object store system which is runs under Kubernetes and then runs on top of Rook and it's a cloud native software so I don't want you to look at some much detail but those are all universities that have these Fiona's and you can look underneath them and see 40 giga the top UCLA that's 40 gigabits a second disk to disk and each Fiona has a hundred and sixty gigabyte terabytes of rotating storage so we're deploying about two petabytes of rotating storage across this Pacific Research Platform now we've been having a lot of workshops and we'll have many more to both focus on the tech focus on the applications like if you have applications how do I get onto this thing and so forth but of course we were driven by the applications that we brought to the proposal and I went out and you know hunted down the leaders of a bunch of these fields that I knew were generating massive amounts of data and were multi campus collaborations life's not long enough to go out and find people who aren't collaborating with each other and get them to do it okay so I figured okay we got a lot of things to do here let's start with the folks who are already so the LHC the tier one two three four centers around the world they they already know how to work with each other so actually Frank Wertheim who's the executive director of the Open Science Grid to where a lot of the computing goes on is the co-PI on Pacific Research Platform by you know by design but here are the ones we're using so let me give us give an example when you couple the local area prism network on the UCSD campus with the Pacific Research Platform which by the way goes all the way to Chicago to Starlight as well as to Hawaii within the United States he is then looking at big file transfers from his cluster up in the physics department on the UC San Diego campus to Fermilab to the tier one center for the CMS experiment and what you'll notice here is these these things are about five minutes but at the bottom apart and that's 30 gigabits per second sustained over about 15 minutes so so actually sustaining these things not just peak is now quite possible and this is again across the campus across the wide area back into Fermilab and so forth now what that allows you to do is we worked with the Santa Cruz folks who are really very good on this DMZ thing and have a great team there and they also have a lot of astrophysicists and astronomers one of the best groups in the country and they've done a lot of supercomputing on NERSC the DOE's supercomputer at Lawrence Berkeley lab well how do you get your data back and and you say well what am I gonna do with if I have terabytes of data you know at NERSC how am I gonna get it back what am I gonna do with it well it turns out these folks have over a thousand node cluster just for astronomy and astrophysics Santa Cruz so they can do a lot of science analysis of the data locally but they had to get it back we now have a hundred gigabit per second link between the Hyde's cluster at Santa Cruz and NERSC supercomputer Corey and that was such a major step forward that scenic in its annual meeting earlier this year awarded them their innovations and networking award for the year which is a big deal but it turns out that that was the astrophysicists but the astronomers who are doing observing are also at Santa Cruz and they have several their members of several large telescope surveys that are going on with the computing being done at NERSC one of the telescopes that's in Chile one of them's at Palomar and essentially they're looking at the whole sky at night and they're and anything that change moves changes color or changes magnitude compared to way it was last night and that's what they do at NERSC is compare them then an alert goes out to go look at that spot and it can be asteroids it can be supernovas it can be a lot of different things but you can see we're talking you know nearly a terabyte per night and many of you probably know about the large-scale synoptic telescope NSF is building that will go actually have its first light next year and that's between Chile and the computing which is in CSA is two 100 gigabit optical fibers now they're looking at billions of things in the sky most of the observable universe and within minutes of the observation the notification that something changed in this piece of the sky as they go across the sky it goes out to the observers you know how many alerts we used to call them as a person who did observational astronomy in multiple ways radio optical x-ray satellites for years we used to call them telegrams right well they're 10 million a night so we are going to need machine learning just to figure out what to point my telescope at or which satellite every night this starts next year so what we're using is these smaller ones to get the use cases how is it that the end users not the people you know bringing the data from Chile to NCSA and computing on us but all those people around the world who are going to use it how are they going to use it how are they gonna say let me have all the novi for the last you know three months in this section of the sky so give me that subset of data right and and so we're figuring out how to do that and how to build a software to make that possible now the NSF has also fun in many your campuses what are called cyber engineers or cyber teams and these are really important people this for instance Shah Dong here who's been very helpful this is him receiving his rack mounted Fiona is what's coming from astronomy and physics and so he he knows the science but he also can work with the networking people and these people are just incredibly valuable if you have one find a way to keep them let's go to human genomics and a particular cancer genomics as you know the revolution in cancer therapy has been not that I've got breast cancer or ovarian cancer or brain cancer or whatever it's cancers and information disease you know the cell that became cancerous was your human cell and it had your DNA and now it's got mutated version of that DNA in that cell well where are the mutations because that's the cancer you've got because that's what it's interfering with is whatever those turn out to be either control or our genes or something so all of the data from patients is all over the country NIH put together into a single database which was housed at this David Housler was a PI he's at Santa Cruz he put this at the San Diego Supercomputer why well look here this is the downloads of that data to the community I was talking about but now imagine health care facilities all over the country that's eight gigabits a second in June of 13 here we are this is two years ago it's essentially sustaining 10 to 15 gigabits a second and this is over you know weeks so these are the it's not like I'm inventing this networking stuff because I like networking it's because the need is already been there but hasn't been met with a solution then as often happens the grant ended and went to the University of Chicago and so what was how we gonna do this well it turned out because we're at Starlight and because we work for many years with Chicago we could just flip our links over to the Chicago and it just keeps going because anybody who's coming into the PRP they wouldn't really know the difference now the amazing thing is as we begin to get some of these successes and I won't take you through all the other feels all of a sudden new but people came to us and said well we have things we'd love you to work on and so we of course aren't quote-unquote funded to do that but in so we look at which cases you know make some natural sense I'm just gonna take you through a few of them first is Jupiter how many of you use or know about Jupiter right it's become essential the lingua franca of data science data computing and Fernanda Perez developed it who was the developer of Python at Berkeley and we've worked very closely with him so we plonked a big what we technically call a hunky Fiona down at Berkeley and then we connected them with 40 gigabits and so we basically now have this back plane for California and our goal is to have Jupiter everywhere so that anybody who's doing data science is recording it in these electronic notebooks that have URLs and then you just give somebody URL like you would from Google and they have the live software the live data and they can execute it right and then for those of you who are in the social sciences and humanities one of the things that has been going on a lot is cultural heritage preservation and so here we have a project that we're supporting with Jeff weekly here in the front row at University of California Merced and that we're building these kiosks which are for virtual reality of for instance either lidar or other ways of recording it incredible detail cultural heritage sites around the world in fact we have a University of California office of the president catalyst award to Tom Levy to who's one of our faculty to go to places that are in danger of either being blown up or earthquakes or natural disaster sea level rise and record them in enough detail that we can study them essentially remotely and long after they're gone and so they these kiosks are tied together with and you notice these are like 24 48 megapixels so 24 times there 20 or 48 times the resolution of your desktop and all of the data is stored in these large storage systems then the newest one which in California we care about a lot is applying this to wildfire early detection and to following the development and this also got one of the scenic awards this last big meeting so there are cameras that are out on wireless high-speed network that Hans Werner Braun developed under the HP Wren NSF grant for about 15 years and this is shows you wanted and last October near San Diego and then we're also able to bring down the satellite images of the where the fires are but also do predictive modeling now quite accurately in real time so the way that HP Wren work is that you have these now they're up to over 200 megabits a second mountain top to mountain top wireless license backbone and then down into the valleys where you can we connect all the seismic sensors along the San Andreas fault that a lot of ecological reserves are on this and so forth but there are cameras looking in all four directions on all the mountain tops and so we get early detection within 30 seconds or so of any plume that is seen and people are looking at automatic detection and things like that this is shows you just in 2014 up north of where I live four simultaneous fires going on during the wildfire season and the thing you see in the lower left is the number of hits on the on the servers now in the past that data was just coming into a server say at the San Diego Supercomputer Center or one at SDSU but because those are all on scenic we just connect those servers with the PRP and now all of a sudden you have data redundancy you have disaster recovery you have high availability during these 10 to 100 to 1 spikes of people hitting because when the fires go this is about the only place you can go to look and see is the fire coming toward you and so they're the public during the most you remember the Napa Sonoma fires in October and then the Ventura LA San Diego fire in December we had during the Napa Sonoma over 800 million hits 800 million individual users and millions of hits so Ilkay Altanus who's the chief data science officer at San Diego Supercomputer Center has an NSF grant called Wi-Fi I'm co-PI on that and what we can do is take in these there are about 250 meteorological sensors on in San Diego County alone on this that SDG and ER our utility has put in place as well as that so and then we have the weather for us we basically have all the weather data in the United States on a giant fire hose that we hook into and and so that goes through PRP into the Comet Supercomputer and through and a workflow I won't take you through but it takes all these different layers of the landscape data so we go into lots of databases that indicate the area of California that is involved here what's the vegetation water is it you know paved is it whatever all that and then we have more and more both helicopters fixed wing planes drones under the proper control of the fire authorities not amateurs and this gives us information about the wire wire perimeter and that goes into a code called firesite which then I think essentially if you know the topography you know the wind field right you know the temperature and humidity and you know what is the ground cover and its ability to burn or not you can imagine that you can predict the fire and that's what this does and and that really then leads to these kind of fire maps that are being very widely used and we're working directly with most of the fire authorities in California on this well a different one is you know just like the other weekend San Francisco and half Moon Bay and so forth have we're totally drenched Monterey because of what they called an atmospheric river well that's the way the Pacific oceans evaporated moisture ends up going across California and if those rivers hit you you get this real deluge if they don't you get drought and so there's a center for Western weather and water extremes at the San Diego University California Scripps Institution of Oceanography and then up at Irvine there is a place that I was just visiting I was Sarus Sarusian very senior guy in atmospheric science that keeps track of all precipitation events worldwide so you can imagine downloading that and down into the Scripps Center to actually then forecast the precipitation or not which in California is more or less life and death and and so we've hooked together these two centers along with NASA and so forth it was taking 20 days to move the data for one computing that's not gonna help you a whole lot for the weather forecast using the PRP we've now got it down to 20 hours and by the end of this year it should be down to 20 minutes now if you're doing science doing it once a month one step of science or one step of public awareness you know put predicted prediction of weather once a month versus once a day versus once an hour is completely transformational but and this is taking me to where the last thing I talk about which is they have this huge object database so imagine a thunderstorm convective storm like this you know I we used to do this at Illinois and simulating these anyway it comes up like this forms the amble top but it's also racing along the ground at 50 or 60 miles an hour as a nonlinear self-excited entity in the atmosphere and they keep track of those precipitation events as they evolve and that's an object and then they put that object in an object database right and then they do this for everywhere in the world for years now you want to go in and do machine learning on this to begin to understand where precipitation events are more likely to begin and where they go and so forth and and so that we're now working on adding machine learning there here's another thing where machine learning is coming in this is an undersea microscope that we hooked up we put fiber off the script peer into the ocean and now this thing can look at these zillions of phytoplankton which are the base of the food chain among other things they're one of the things with ocean acidification these things have cars calcium carbonate skeletons which dissolves as the sea gets more acidic from it and absorbing all the co2 that we're putting in the atmosphere you know just diatoms every fifth breath you take the oxygen in that breath came from the waste product of the diatoms in the ocean so if these guys go away you'll be sucking a hard to get that air and so fundamentally understanding the biology and ecology and the changes in this is very important well this thing has now gotten when I made the slide 300 million images which now need computer vision and classification to understand what's going on it's now over a billion so we put another grant together which has been funded started last October which is to build a cognitive hardware and software ecosystem on top of the Pacific research platform because now we can store the data move the data compute the data so everybody wants to be in to do machine learning on the data and so what we realized is that we had these Fiona's and they are PCs and in the back are slots and you can put eight gaming Nvidia cards in the back 32 bit and these are gonna be fully containerized run Kubernetes and run a wide variety of algorithms so these GPUs now if you go to Amazon web services or or any of the high-performance centers they all have the 64 bit double precision in videos and these things in these PC form are somewhere like you know in order of magnitude less expensive the noise in the data that you're working on is such that doing it in double precision is not actually getting you a higher accuracy now there are moments where you do need double precision by all means you should use the Nvidia cards that do that but for an awful lot of academics who also are much more price conscious than corporations this is quite a change but it gets better because the CPUs and GPUs are both von Neumann architectures which we've been working you know 60 years 70 years on but there's this whole new set of as we begin to reverse engineer the brain which is really picking up speed the idea of getting architectural notions from the way nature evolved over millions hundreds of millions of years information machines is leading to things like IBM's True North this is the cover of science almost four years ago and this is a spiking neural net hardware accelerator and Devin Ramota who is a product of PhD product of UC San Diego's IBM chief scientist in this area is talking about these parallel things well the first thing Lawrence Livermore did was get a 16-way one and then more recently the Air Force got a 64-way one so this is beginning to be parallel neuromorphic computing and you're beginning if you look around you'll read that the Chinese and the Japanese are building AI supercomputers and and so machine learning accelerators are are really a big thing so I started with this is our most senior professor Ken Christogato and machine learning algorithms a pattern recognition lab at my institute at Kali to now two and a half years ago and this is in that box is the first of the IBM True North chips coming to San Diego so the idea is whenever you hear about deep learning and AI everything what they're really talking about is multi-layer neural nets an idea that is 30 or 40 years old but because of the vast rise in data and computing power is finally becoming possible to be used quite generally but that's only one of the different modes of machine learning and statistical machine learning that people use and so we are taking all these algorithms and putting them in our pattern recognition lab and then optimizing on these different architectures and then doing the measurements of the energy efficiency in which case it may be several orders of magnitude less energy than say a GPU or a CPU and then also the speed well this time last year we had essentially zero GPUs on the San Diego campus I convinced Mike Norman who ran the Supercomputer Center and Frank Bertheim we mentioned before that there was going to be a need for these for science applications because of the rise of data science and so they got 48 of those our virtual reality cave alone has 70 of the Nvidia 1080 GPUs and then we have another 48 sitting around on other things that we do with visualization this grant will provide 96 more and then something you all should be thinking about if your campus is going to be competitive it had better be getting an undergraduate major and a graduate major in data science and for instance our undergraduate major in data science at UC San Diego starts this fall it currently has zero undergraduates in the fall it will have 700 now where are they going to do their computing I can guarantee you that many of these campuses do not have any GPUs for their students and so we talked to Vince Kaila and a very forward-looking CIO at San Diego and we now have 88 of these and what they did was they just took our Fiona 8's the Fiona's with these and we just racked them up and then they made them available and they put four courses through in which the students now can run their algorithms on the GPUs and get that experience by the way the way I figured out that this was something to do is I was standing outside the coffee cart between Cali T2 and the computer science building and I heard these students behind me they're saying man you know if I haven't done a project using TensorFlow and running on a GPU I can't even get asked to an interview but that's not enough to just have the GPU so around this whole thing we're building over the PRP a cloud multiple clouds of alternative architecture such as the 64-bit GPUs I mentioned that are both in the NSF exceed resources and Amazon Web Services but then AIS T we've worked with them a long time over in Japan and they're just building a a one of these AI deep learning supercomputers with over 4,000 NVIDIA 64-bit tensors on them but then the non-voinment non- von Neumann ones so it's not rocket science basically Google once it decided once the CEO said you know we're gonna put deep learning and everything Google does and then the you know the guys in the back room came back and said uh you know that's gonna be a lot of computing right and he says yeah yeah well we got a lot of data centers yeah well we're gonna need another couple you know 12 or or more you know hundred million dollar data centers to do this and somebody said wait a minute what if we just go into TensorFlow and look at the compute intensive core and then we turn that into an ASIC especially designed on von Neumann chip that accelerates just that and doesn't do anything else and that's their TPUs now and they're as a result they are didn't have to build in those data centers and it's the use of those TPUs that actually allowed them to beat the go master the go champion Microsoft has done the same with FPGAs which have been around 30 some odd years field program or Gator raise another non- von Neumann architecture that is programmable in silicon and they're now putting one of those in every one of their Bing servers so this is not esoteric this is totally routine every day you're using them and you don't even know it IBM we already talked about and then there's a ton of new startups that are doing neuromorphic New Edge and San Diego another San Diego was Nirvana which Intel bought so Intel's now integrating machine learning accelerator into its own GPU because it bought it to Altera and then it's and their FPGAs as well and so they're there you know that's where Intel sees it it's integrating all these so but I just want to make it available to academics so we can you know keep things going well the last thing I'll say is that there's been so much interest in this that we were asked to have the PRP fund a national research platform workshop so we have no we have been told by NSF and our five-year grant you should at least explore what would it take to scale this we're not supposed to go build it we certainly don't have the resources to that so we actually pull together a lot of the regional nets and a lot of the other folks who are interested in it and we're having a second one in August in Bozeman and the registration is now open and particular like I've done for last 10 or 15 years we're working with minority serving institutions and EPSCOR the underfunded NSF states to get them access to a set of workshops that are very hands-on very hardcore workshops and all this in the four days before this workshop well the final thing I'll just mention is that we've also gotten a lot of international interest we actually had the Netherlands as a part of this so they have a Fiona over there now but then I just was talking to Chris Hancock who's head of our net the network in Australia mentioned Japan we're hooking up with them and Kisti in Korea we just showed that you can get a five gigabit per second disk to disk to Korea and this long distance TCP IP used to be the bugaboo of networking it works just fine and even Guam you wouldn't know it but Guam has become a new center in the Pacific for fiber optics and big big thing going on there so I'll just leave you with this fundamentally after all these years 30 years we finally have built and this thing it's of course just a you know breadboard it's fundamentally I mean with SDN coming software divine networking and other things by the time we're ready to deploy this nationally say 2020 or something what we made it is not what it'll be built out of but we just showed that it's possible and we got the user use cases but I gotta say there's essentially nobody in digital libraries nobody in data discovery annotation curation working with us we'd love to do that so if you're interested in that and you have a way somehow you'd like to be involved we can't fund you to do anything but if you want to know almost everybody in this project is unfunded as it is including me so we've now got this massive amount of storage but but as you well know there's a big difference between working storage and archival storage that's not sorted out as to how we're gonna deal with that and who's gonna do what who's gonna do it's the same big problem but now massively larger and then how do you get application teams who are doing just fine doing their science the way they're doing and writing a lot of scientific papers to adopt this how do you think of the new cybersecurity at these speeds of course obviously if something goes wrong goes wrong really fast and so cybersecurity is really important and then and then I believe we're just gonna be adiabatically in using the cloud much faster than people tend to think so Kubernetes is gonna help there but I can tell you that this is not mainly a technical problem you know this is fundamentally a social problem and a social challenge and that has it you know just the networking experts that we've gotten we have 20 or 30 of the top networking experts in the West Coast they all just did their local thing we got them two years ago starting to have one-hour phone calls every week to debug the big graph I show you the big matrix to figure out why all of a sudden some green thing had gone red and and now they just take it for granted they're all collaborating together that was a social transformation anyway many more thank our sponsors and take your questions I'll start us off that was wonderful and terrifying really both at the same time I'm so inspired but I'm also just really really worried because to your point about the undergrads were standing behind you chatting about needing to know TensorFlow and you know that they're the on the plus side of the digital divide my campus which is Davis has you know 30,000 students now who are on the other side of the digital divide and they they're showing up for classes just to learn basic Python R Jupiter skills you know I don't even know how to begin to get them to awareness of this and and they're in fields like environmental science biology they're the next generation of science but you know how do we even begin to get that divide a little closer together when it's just getting worse every day well you know there's a variety of divides so in particular there's the age divide so the younger they are the more likely they are to be picking this stuff up on the street if they happen to be computer science majors well you know it's really amazing what kids think is reasonable to do these days and you know not back in my day in the 60s you know I mean that was we were like really well behaved but it's a great question and I think it's going to make there's going to be a new digital divide among campuses between those that understand the future is already here this is not about the future it's about the past and are deciding they're going to move aggressively in that direction you see that all over the camp all over the country campus is setting up digital data science initiatives majors hiring people Berkeley is going to I think in the next year or so require every Berkeley undergraduate to do anything that has evolved something that's digital on Jupiter okay so the campuses that aren't even started on this that's the new digital divide and so I I think that's why I'm here to give you a talk this is it's not like I need to sell you anything the exponential has been coming for a long time it isn't like we didn't know this was coming so you know but academics are pretty slow to move and this is pretty fast to change well and our educational systems aren't set up to keep pace with this degree of change you know it takes years to get a new class approved much less a new major and so is there a different model we should be looking there by you know if you have to sit around and talk about what are the multi hundred billion to trillion dollar a year industries in this country that have not yet been digitally disrupted and I would say the two biggest ones are the medical community and education so I think that the time for total disruption in both of those areas is the next 10 years are the questions I don't mean to scare you you know I'm curious if you have thoughts about the impacts on preservation of all this work and like research group you disability because the the level of infrastructure you're talking about here as someone who does work in software reservation and like you're talking about like custom TPUs and all sorts of custom hardware and things like is there embedded thoughts in all this about how to like replicate some of this in the future or like you know preserve the containers that type of stuff is that part of the conversations not nearly enough I mean it ought to be auto the metadata ought to be auto generated as to what was the container what was the version of the software what was the path that went through you know all that stuff should just be automatically collected I mean the software knew it at the time and there are examples of that but that's why I'm saying the people who are experts in that area metadata is just the most important thing you can't really do decent data discovery without it I ran a Gordon and Betty Moore 25 million dollar grant to build the first global microbiome repository for data and and we made it met you couldn't put the data in without the metadata and then we set up so you could do metadata searching on it not just data searching which is what still pretty much has done even at national repositories like Jim bank or NCBI things like that so that's why I say it yeah I mean well that's what Vince surfers you know I've worked with for 30 years and I'm advisory board he's been talking recently about the coming dark ages where basically all this stuff just disappears because no ones paying attention are putting enough resources in to understand that if we don't do what you're talking about it'll disappear okay there somewhere keep coming back to see negative but I mean I mean you know it's so much fun doing this and and and so I think also you know most of you if you're on campuses will find that you are on the map and so you have somebody who's you know a cyber engineer or you've got a you've got a CC grant look them up figure out where you are who's doing what what's in you know going on data sciences yes sir John Insoles engine that archive this this is a really great stuff really appreciate it out one of the things I was curious about is that the end of the Fiona's when you're you know downloading these massive amounts of data I I get that you need to do it in SSDs but then where does it go I mean what's what's the you know it piles up at the end of the pipe I assume but then how do you drain it from the right so we can put just commodity discs a 160 terabytes of rotating storage as well and so that basically the SSD acts as a capacitor to the rotating storage and then beyond that you know we're putting this we're putting out two petabytes but if you look at I mean how many of you you know use open science grid or know anybody does okay do you realize that open science grid which which is basically linking together in a high throughput environment clusters all over the country delivers more CPU hours per month than all the NSF exceed supercomputer resources put together just to the academic community and to and about half of its particle physics out calculation that but the rest is biology and chemistry and all kinds of things they're deploying a very large distributed data system as well and I think actually the distributed data systems here are more exciting than the computing piece because it's all about the data stupid you know so in the end of the day yes job so maybe a softball or a lighter question in all your long years of doing this kind of work and as you started as an astrophysicist what was the one sort of discovery or event that delighted you the most can I can I talk about physics absolutely the discovery by LIGO of the gravitational radiation from the colliding black holes two years ago that won the Nobel Prize last year my PhD thesis was I was the first person to take Einstein's equations for general relativity and their full non-linearity to put them on a supercomputer and calculate the head-on collision of two black holes which the term at that time was about five years old and calculate the gravitational radiation that came off of them and then I held a world of global workshop for two weeks and wrote up edited the book sources of gravitational radiation and the first chapter was by Bray Weiss who got half the Nobel Prize and the third chapter was by Kip Thorne who got the other part so I waited 40 years for that moment and I not believed you know that it was inevitable it's not you know you're you only have to measure a hundred thousandth the diameter of proton out of five kilometers variance you know time variation to see it so it's the most probably the most sensitive experiment humans have ever done and it took the National Science Foundation amazing patience a commitment to build two generations of LIGO at over you know billion dollars just in the belief in the laws of physics so that was pretty exciting thanks all of you well thank you for that tour through the present and the near future it's striking you know listening to that how much of the kind of predicted future is really here running today and I think there's a very important message to take away from that in terms of the genuine urgency of grappling with some of these developments and I thank you enormously for opening everybody's eyes to that this is this has really been wonderful thank you so much Larry and we will have these these slides available because I know a number of you will want to track some of these individual projects and things like that and with that we are adjourned I wish you safe travels I hope to see most of you in December and I'm sure I'm gonna see many of you in many other places between now and then thank you for coming