 okay so switching to the second life time zone good morning and it's beautiful weather out there in Italy it's evening it's incredibly hot so I am buried inside with the mayor conditioning working the tonight we're going to talk about big data a topic which has become very very relevant and popular also in the media and over the last months you out to the Cambridge Analytica business similar thing the big data are the present and for sure are the future of society especially ecological alongside the advance of the society and as we shall see it they are like the old Janus basically the faces you know the one and the bad one is always like that so with the physics was same you know physics was one of the most significant advances in our understanding of nature but one way to open the a for to bomb energy to add on therapy medical treatments so everything every time we discover something new or we have a new scientific paradigm there are always good side and bad sides therefore this talk basically will approach both aspects first of all the good part of the big data is for sure going I mean they're for sure are going to change our life and they change it for better and then at the end I will spend a few words about the dangers which are built into big data world let's start with the short of classes okay there have been many moments in the not too many moments actually in the history of humankind when the human society has undergone deep revolutions in the thinking in the social structure and political structure I mean referring I mean we have for instance the 6th century when mankind the Greeks discovered the rationality with the school most of the stick religion began but in more recent times we have had a few revolutions which we can be clearly that one in the 16th century with the renaissance when we science another one in the 18th century when our way only is changed once forever the more conceptual century and three great scientists of our age I Freud and Einstein all three of them in different way contributed to remove the last leftover of Anthropocentrism they removed man far away from the center with you so let's see I'm trying to I will try to convince that what we are undergoing right now is just that the bone of a revolution which is made out of one and big data she'll send a few what they are second computer networks the internet third artificial intelligence and fourth the intelligent human machine interfaces which are often neglected but they are a crucial piece of the puzzle the technological evolution has been incredibly fast the when I was born I'm 61 now so I told I when I was born in 1957 the first satellite was launched and it seemed a sort of miracle in 1967 the first personal computer was invented surprisingly enough in Italy by olivetti which was the so-called the program 101 which was the first person at the first portable PC then in 1969 humans arrived on the moon and then the technological evolution began to speed up in 1975 we had the first handheld the programmable which was the H you let the 25 1977 the founder on entered by phone at the time they were so you can see from the picture they were sides of a shoebox but they did a big advancement 1978 the first super computer to accessible tomorrow we have a spice shuttle in the first McKintosh in 1992 worldwide web you see there is a first picture which was put on by the technicians at the CERN at the European Center for Search advertising group of sessions made by CERN scientists the so-called the CERN it's it was an alternative to any 1993 the smartphone things go up 1997 the first social network which was a 6 degrees in 1998 Google work and then only 2004 2005 the other we had the book the first modern social network after in 2011 we are iPad and Siri which I mean seems less important than the other but as you shall see it is not so now into 2017 obviously this exponential growth has continued as continued reaching levels which were most impossible to think only a few years ago why first of all because our the very same idea of what the CPU is has changed the law now processors equivalent all many mine each processor of this is much more powerful than the first table computer processors are you big so you can find them in recent estimates give that there is more than one processor for each human being alive on this planet so six billion processors hiding into your cars into your cell phones into your washing machines into any domestic topic in your PCs and and so on and these processors remember a storage capability as well as computational capabilities therefore they really are an impressive network connected via the internet these processors have changed our lives this billions of processors are in continuous contact with each other they share their data they connected their information these informations are stored in databases worldwide and in this database we find all types of information from networks of sensor monitoring the environment to the level of pollution of monitoring seismographs you know the seismic activity of earth to monitoring the humidity soil to optimize the crops webcams all over the places both for safety and so for monitoring all these things are connected and all these informations are moved through either dedicated or through the normal internet to form a sort of a electronic print of everything which happens around us yeah as one as Mike said the Internet of Things the Internet of Things is reaching a huge level the amount of data is unbelievable when I began to work the whole the whole amount of data available on digital devices in the world was of your of few billions of the order a few hundreds of gigabytes let's say or nowadays it is already to 10 to 224 to 1 followed by 24 zeros of bytes was something which is called a yacht abide a unit which I think most of you have never heard of and in a couple in two three years we expect the amount of information to grow to the level of a bronco byte which means 1000 is a huge amount of information at the moment that the amount of scientific data in this cloud of data so your 1000 1500 data but this fraction is bound to remain more less constant just to give you a comparison I mean the storage capability of a human brain over there and I think human brain like Sheldon in theory it's one other giga so one million or billions times smaller than the amount of data which are nowadays available on the Internet the all this data at least the largest part of them is connected via the network and the network is an incredibly complex structure I mean there is no way we can even grasp the complexity of the network not even Google is capable to produce a real topology of web we estimated that nowadays there is something between four billion and five billion nodes at least for what called in theory be scammed by the normal search engines like Google is you must add the dark web which is you know it is as big as the additional web so basically it's a huge amount of node each one or node is both a source of data in the repository of data and what is more important is also a source of computing power therefore for the first time we have a huge amount of processors connected together sharing with you an amount of data which is beyond any human understanding it is by itself should give you an idea of the which we are facing the third part of the topic as we say this artificial intelligence okay if you go on the web and so on you find hundreds of eligible agencies but summarizing basically they end up to the two things which are written there the agents may be seen as the ensemble of abilities which allow men to act with the purpose to think rationally to act with the environment to create a new concept and this is one interpretation one or the capacity to analyze a large amount of data find patterns make prediction and take decisions on the grounds of this factor whatever you define as artificial intelligence scientists which are basically operation 11 other operational definition intelligence which is the so-called the touring test if you put a human being on one side of a black screen and the computer on the other side of the black screen and you cannot see which is which and you ask a question and you cannot tell whether the computer or human being is a or other human being is answering to you so she therefore let me show you just a few aspects of modern world already machines can beat humans in in activities which are few years ago were considered to be you know the epitome of human intelligence chess playing and the geography the deep blue by a p.m. won the world champion of chess the chess master chess in an open confrontation and Watson won geography beating all the best human opposers so as it always happens when you are dealing with artificial intelligence which seems to be the last form of anthropocentrism we have a base with the intelligence belongs only to us the day people said well but that is not intelligence you can put it under the form of an argument but then what do you think I mean about what Google does every day I mean I'm sorry I didn't translate these initial sentence because I got it from an Italian book but basically it's a sentence from the Tony O'Crow by man which means lucky are the on people wise were highs which never go at point at far away point where everything becomes difficult said which semantically speaking is equivalent to not everyone can understand the complexity of life well Google not only can find that this sentence in old data repositories but can also find all the sentences which are semantically equivalent sentence so basically in other words Google art rhythms can interpret the meaning and find all the equivalent sentences all over Google can also find all images videos and web which contain for instance the image of a chicken or the image of a vulture and good with the accuracy with the good accuracy or Google can also is beginning to translate every sentence every book basically in any of the 295 existing languages these already tells you that artificial intelligence is here it's not something which will come in the near future it is already here to the point that we have a cars which drive from pirates Paris in France to China without any human intervention we have the automatic landing over planes most of us ever landed several times under the guidance of an artificial intelligence algorithm guidance of a pilot what is more important in 2015 the NASA landed Curiosity on Mars and this was done completely controlled by the machines there was no possibility I mean Mars is 14 light minutes far away and the land the whole landing lasted the seven minutes so therefore by the time the landing procedure started you know the mission reached the earth 40 minutes after therefore everything which had happened inside the little possibility well the therefore artificial intelligence is already here in specialized ways basically we still don't have something as fuzzy as the human brain which can do many different tasks with more or less the same level of lack of accuracy but on the other end we have specialized algorithm which can reproduce and even overcome the limitations of human brain safety of all these well I mean there are different aspects of the problem I want to go into the detail but I always am a little surprised when I listen discussions on the newspaper on the media about the privacy of communication these type of things become about that each single byte of information which moves across the internet phone calls SMS what's up Skype echo they ever move on the internet even normal phone calls which go through optical fibers and this type of things each single byte which moves on the store listening station in Bluffdale which was the only place in first place in the world where they built a storage or one Yota byte of data with the which is still largely unknown scientific community maybe it show slowly leak out in 10 years we'll under we shall understand how they are capable index organized and to extract information from one and all this information is continuously monitored analyzed decomposed by artificial intelligence the four pieces of the revolution are already all in place fields of application of big data finance from the train of stock market marketing in the same even the way people put items on the shelf so supermarket is the side that using big data and using artificial intelligence algorithms the motics environment material all telemedicine genomics by informatics as informatics and so on and it is digital libraries but the supply management whatever you can is in field of action or big data and is being affected not but it is therefore not the surprise that the most sought after segment of market is the so-called data scientist the growth of demand from companies from public administration from academies of data scientists people capable to handle this data flow is going an exponential growth while the number of people aiding this type of competencies is still growing linearly so therefore if you have a young child young son some skills in mathematics not assisted the commanding to purchase a career in this field it will be unemployed only for a few picoseconds I mean very same moment they get their degree their role you know by companies by banks the let me now move to the first part of this gene Janus like nature of the the scientific community was well aware of this already in the early 2000s the Microsoft sponsored a let's say book but actually it was much more than that sponsored a whole line of research which ended up in this book the fourth paradigm which you can find Amazon where basically it Jim Gray Alex Sallay and other great data scientists realize that the way we do science nowadays is changed to the two original paradigms of science which was experimental science and theory so basically theory intended as analytic description of the world models and you know having physical law where surpassed and that other two paradigms needed to add that which had different methodologies the third one was the simulations these were made possible by the supercomputers and these were already present in the sixties because they appeared like the only way to deal with the complex from phenomena from cosmological simulation to atmospheric simulation complex materials and nowadays I mean in years ago was now but now we will say it's well established there was a fourth paradigm which was that intensive science is that it's intensive science basically it's a sort of synonym of artificial intelligence of statistical learning machine learning that mining the way you want to call it because the situation nowadays is that most data are never inspected by humans already now in astronomy in high-energy physics 99.999 percent of the data are never looked inspected seen by human eye which means that most of the data analysis most of the data compression is performed by machines through smart algorithm what is more important this data are far too complex to be visualized and understood by the human brain in few slides show you what and that the world is far too complex to be understood with analytical and the data can only be understood I mean the world can only be understood through comparison of data complex data with the simulation so this is what e-science is all about what the data-driven approach scientific discovery to give you an idea of the potentiality I just want to repeat something which I always show if you think about it in physics physiology social sciences there is not one single analytical law which depends on more than three independent variable take for instance the people be the ideal thus low pressure times volume equal a constant times temperature three variables one two independent one dependent in sense that if you fix pressure and volume and whatever you do you cannot find any analytical law which depends on more than three parameter if there is a perturbation to the law usually it is considered as a perturbation not as the dependence on a fourth parameter therefore this leads us to a very simple question this is a rather unrealistic simple universe so we live in a universe where all physical laws do not depend on more than three variables or rather all our scientific knowledge is a human bias which is introduced by the limitation of our brain we are three-dimensional bodies who have lived and evolved in three-dimensional world our learning processes our visualization capability our understanding capabilities suffer from this limitation we can see patterns only in a dimensional space we cannot see patterns in four-dimension we cannot see pattern in an higher number of them remember that when I speak about dimension e I'm not speaking only about the three spatial dimension as many people will say but we live in a four-dimensional universe yes but we don't see it as a four-dimensional universe space plus time can understand it mathematically as a four-dimensional universe but we cannot experience the four-dimension altogether in an integral way so definitely our rather simplified perception of the world and also our rather simplified description of the world is dictated by our revolutionary limits machines big data artificial intelligence for the first time of are offering us the possibility to overcome this limitation in the astrophysical case let me skip this slide let me show what it means busy now we have incredible machines for instance here you have one of the first survey telescopes in the world a beautiful telescope which is going to be surpassed in a few years by the largest not in survey telescope but this VLT telescope a Paranal by the way this is by the telescope which we are using to collect the data which are analyzed by that sundial collaboration which we have seen in the first slide basically this telescope provides you 100 gigabyte per night far too much to be inspected by these things are analyzed to automatic procedure the final outcome this automatic procedure is something like this I mean you have this is a globular cluster just to give you an idea of the amount of information this is a small part of the field covered by VST with the single exposure the dynamical range of normal picture cannot reproduce the complexity so I'm zooming in you go to the yellow box and you see that in the yellow in that small region there is a huge amount of a star then if you go to the red box and you see below an enlargement you see again that there is a huge amount of information so basically each one with this object is either a star or a galaxy this one are analyzed to automatic procedure in a small part of the sky let's say the sides of full moon in a moderate depth exposure so not very deep exposure you can have up to 60,000 objects each one of these objects is identified its properties are measured then these things are transformed in a string of numbers going to the details because for each object you may have a language in which we are observing the luminosity the shape the additional this turns out transforms the object which you are observing in a string of numbers each one of these numbers is a dimension so at the end you end up that your observations of the sky transform it to each parameter space formed by billions of objects is really the rows of a matrix and many many hundreds of columns the dimensions each one of this column is a parameter therefore it can be considered dimension so you easily end up in parameter space which have hundreds of dimensions and billions of objects the problem is that you need to put together all the machineries this would have been impossible before the revolution started by communication technologies by the computers by the big data and so on you need basically put together all the available technology all available knowledge order to extract from this data sets a higher hold standing over you and it's a complex task this is a statement from a recognition science foundation at the same of this millennium let's say of this 1002 exploitation of massive data sets which was the wording used before big data became very popular that time they were called massive data sets now they're big data but these are just bad words you know that for selling purposes the substance is the same requires a much deeper understanding of computing infrastructure and of ICT technology new mission and communication technologies then what is currently done and the request which were put besides this computing this advancement in our exploitation of data was that the society wants to better understand the world to exploit the technological developments deriving from it and also the society feels the urge to teach people how to put together the pieces of the puzzle and how to use for their own purposes these these technologies from this you know you have had the proliferation of projects which are nowadays called crowdsourcing cities and science you can like them you can dislike them but basically it has somehow transformed also the way not only science is made but also the way science is communicated to the world so and we can already know that you know the there are many discoveries which have been produced by this revolution for instance at Caltech my colleague George Orkowski used this technology and discovered the first binary quasi or recently actually one week ago the exploration of this i-dimensionality parameter space allowed to discover six new phases quantum phases of matter and obviously once you are dealing with this space of huge dimensionality you want somehow also to be able to visualize the complex information which is contained in these datasets and here i show you i mean there is a huge effort to work by the you know and develop tools capable project this complexity the lower number of a dimension the number of dimension which can be read by the human eye because at the end we always want to see with our human eyes so here you see a few examples of this complex simulation but what i really want to emphasize is that that virtual communities i mean we are all we are here in second life we all know that they're fantastic gathering place i mean offered twice by this new technology this is me in the theater of the Meta Institute for Computational Astrophysics which we found a few years ago teaching physics to undergrad students and you know below you see how our working group meeting you know in second life because we are simple but there is an aspect which is of a second life for instance which has never been very familiar second life is also the place where for the first time people could explore scientists could explore this new way to visualization here you have a few immersive data visualization realized you know by the group Caltech you know and you can see the visualization left you have a chemistry and biology visualization the right side you have visualization of a complex mathematical network and on the bottom panel you have a visualization of all galaxies in the slow digital sky survey characterize the you know their parameter and you know these massive visualization of virtual worlds offer us you know a huge amount of possibility first of all you can easily visualize up to six seven parameters all together play shapes with the primitives you can visualize you know sides then it is more important you can navigate through objects so that you can go behind the cloud of object and see you know what is behind you know a large distribution of data but so I mean this was just for historic reason it is nice to remember that second life is much more than what we usually think I mean it's not only about socialization there is a beautiful even though obsolete technology by the modern standards but it was the forerunner or many things which are happening which will happen in the future let's now move for a second to the wrong side of genius I think that one of the most interesting speech I ever heard by a president was the one which Obama gave during a show where basically Obama rightly tracked the polarization of the accuracy in United States to the big data because basically one of the wrong sides of big data is that they are not ethically neutral I mean actually data are always neutral the problem is that people who exploit this data are very seldom ethically neutral so many many things can be said about this so I will try to summarize as quick as I can every time you enter the internet every time you buy something on the internet every time you in somehow interact with the search engine what you do is stored kept for the analysis and then you contribute at building your your virtual identity contains more information about you than what you can it's not only social security number identity card the number of credit card which you have it's much more than that is your taste or sexual taste your own taste your policy everything and all these things can be used for all episodes a very good very famous example is that if you take the fascist leftist moderate you ask them to type on their browser on google the same question for instance egypt the word egypt the fascist gets his answer the islamic fundamentals the left guy gets a rabbi a rabbi spring i mean primavera araba the rabbi spring the revolution in the islamic country and the moderate gets trip or cruise along the need nile you are not unknown to your browser your browser is organized in such a way to feed you with what you like buying in discussing in politics and social networks are exactly the same so basically you have a loss of plurality and information and you end up in living in a bubble a bubble where basically your opinion are never subject to criticism and where your opinions get reinforced by the fact that what is offered to you by the internet is something which is expected to see so if you believe that the man has never landed on the moon you will keep will keep getting from web advertisement on uh conspirational theory or sites which support your idea and you never get another point the this basically is at the beginning of the radicalization of police of uh to the lack of a political aid or the proliferation of something which has become very popular the last few years in the united states which is the so-called fake news and therefore basically if you are not freaking out about the net neutrality you are not paying attention what's happening around this is not only in the united states something similar we added in italy or basically and also all over europe where a new set of political parties right winged and demagogued run by demagogues were winning by proper exploiting network you have the case of cambridge analytica who supported trump made trump win the election in the united states but also in italy if you look the castellegio associati is which is a strategy of network is the one which is behind the recent success of the five star movement with aio and salvini so basically if you are capable to manipulate the data you can bring the public opinion exactly where this is one aspect terrible aspect incredibly dangerous and this is something which needs to be solved at a governmental level international and local because i mean it's really the first time and we can really see democracy in danger if i'm able to control all that amount of information i can lead the society in the direction which i want but there are other aspects which are even more difficult at the end i will show a book which i strongly recommend to each of you to read which is weapons of matter destruction by kate o'neill your mathematician the book what happens on the left let us assume that they have a good idea but they have no previous credit record and they want to start a small business what happens i go to a bank for a loan the bank goes to this big data version of information and they find that they have no previous experience experiences so my credit history is slow i have a little financial credibility the bank refuses the loan my credit situation becomes much worse i lose all all start my business and i slowly why because the banks perform these analysis through companies and companies are always paid by group of interest who are aiming at reaching a specific goal and this goal is profit is never public interest why because we shall see where the problem is in a few slides look what's happening in europe and but also in the united states is not much different research you are a young researcher you are very good at what you are doing but for instance working in specialized field the age factor which is a way to measure the effectiveness of publication is by definition low because you work in a field where there are a few scientists or you have a few people who are citing your word for instance the most famous mathematician last century god is an age factor of seven clearly though he published only seven papers so his age factor cannot be higher than that in nowadays world gated will not go get even a researcher position not to measure a professor for instance in different fields in astrophysics the average age factor is 29 in astro informatics the age factor is 15 so i apply for a position of professor of astro informatics these evaluation criteria are run on common big database in a very way i'm turned down because i do not reach the average for astrophysics even though in my field of astro informatics i'm very very good and basically instead of having an astro informatician i must hire someone with a degree of physicist or something like this not to do very really needed again this happens because governments delegate action to groups of interest who are far too limited far too interested in grasping the complexity of the problem in finding the proper solution was done to the last but one slide let us assume that i have a good bank income and i want to buy a house i find a beautiful house which i like i would like to buy then i got again to ask for a loan to a bank but someone has classified the district as a functional various indicator crime rate average income state of maintenance is the bank refuses to give loans for our to buy houses in those deck districts what happens is that that district is penalized loses in habitants loses opportunities as slowly goes downhill the transforming goes down it's the same fictional scenario no it's exactly what ultimately what's happening in it's happening in chicago right now so this is basically a short overview which i really recommend you to buy this book this weapons of method destruction so basically there is no to quote to paraphrase math matrix there is no blue or red pill which can save us i mean the only way to get out of this and to get the good part of good data and to defend ourselves against the bad part of data our knowledge and the rarest thing of all which my opinion is natural intelligence thanks to social thanks to many to this proliferation of hops and this type of things we are losing our natural intelligence we really should go back to direct confrontation to debate to discussion but with people who think differently from us not with people who think the same way thank you very much for your attention i hope that you enjoyed it i am here to answer all your questions thank you i'm flattered thank you very much is there any questions which i can answer thank you well uh gg in reality with the some tricks now we let's say in a straight way even in second life you can visualize up to eleven independent parameter and this was a very nice experiment with if you have a data point which is characterized by eleven parameters for instance no you visualize a single time you can exploit the strange property of the experiment we ran many minutes ago the properties are following i mean basically if you read it if you look picture of a crowd hundreds of people you can easily almost instantaneously recognize if in that crowd there is something you know someone you know because evolution has has trained your brain to perform a special type of compression this takes place at the clinic level which transforms each phase in a small number vector of numbers eleven twelve fourteen features basically the features based on symmetry distance of the eye on the ratio between the distance of the distance of the nose and so on the skin step of things and for instance these are also this is also the same algorithm which is used in many ways so what you can do basically you can transform each point in a phase changing you know attributing to one parameter the color of the skin another parameter shape of eyebrows another parameter the roundness of the face and these type of things therefore if you do like that with all parameters you realize that you can almost instantaneously spot in this huge distribution of data points points which are the same so basically you can do an operation which from a mathematical point it's called clustering so there are many many ways to visualize more parameters the problem is not only to visualize them but before that the problem is to find patterns in more that is much more complex and this is the kingdom of field of that mining machine learning statistic which is called the statistical learning there you have a huge variety of algorithms which are current sport you support vector machines things like that she can to find this pattern but in any case yeah what you are mentioning is something which is can be done you know it has been done actually to visualize more than three though I think so yeah yeah I think that next worse will be mainly in cyberspace I mean it depends I mean the war which really matters will be in cyberspace because they would be I mean let me put things ladies I mean as long as 50 percent of the I don't know what is the number but basically that is the size of the money of the world goes into buying weapons there will be always worse because you know these companies if there is not a word they started I mean look what's happening in the Middle East and so I mean all these wars will make eyes is this type of thing will not exist that these were if companies and states were not selling them so you will always have no type of wars because this is what people need you know for selling weapons but the real wars those which can really affect everyday life with fault in cyberspace and question other questions thank you very much Technin I mean this is a an old story I mean it's it's called the solipsism I mean you know matrix has just envisioned on the screen something which has been discussing by philosopher for the last 2000 years I mean it's what does he mean to live in a simulation everything which happens to us is an electric signal which is induced into our brain so you know yeah in that sense we all live in a simulation is the simulation the word which is in our brain external stimuli if you can control that simulation from the outside I mean you would end up in matrix I don't think to tell you I don't think that's something like matrix impossible in the near future I think that would be a real option that is more complex the holographic projection of the 3d surface there are actually things are not really like that but the holographic universe it's it's a sort of projection of an higher dimensionality space yes but that is a different story it's not connected what we are mentioning here oh yeah it is an interesting topic I agree with you so if there are no other question I thank all of you and I move to my dinner because it didn't tell you now it's almost what time is it now it's 8 o'clock a little after 8 and I'm starving so thank you very much everybody and I hope to see you soon again in second life good appetite to you taglin thank you daily bye okay that's all