 I'm very happy to welcome my friend, a database researcher, and the Chief Executive Officer and Managing Director of Persistent Systems Limited, Dr. Anand Deshpande. Thank you. Thanks for agreeing to come here and share your thoughts with us. Let me very briefly tell you about Anand. He did his engineering from IIT, Kharagpur, went to US, did his PhD. Then he worked in HP Research Lab for some time. When he came back, I was actually keen that he works here as a faculty member, but he had better ideas. He started Persistent Systems Private Limited, a company which he started with some two people, I think. And now it has grown to more than 5,300, he says. This is a special company which does not do just the conventional services work, but they specialize in product outsourcing, which means they have a large component of internal R&D activities and design and development activities where they develop pieces of solutions for product companies all over the world. It's indeed a privilege to have him here. And he is going to talk to us on some of the emerging trends which will have a great impact on our lives in the coming decades. He is specifically going to talk about things which we shall visibly see affecting us in 2011. So, all your son. Thank you very much. And thanks a lot, sir, for inviting me here. It's always a pleasure to come to IIT and it's also a pleasure to talk to a large number of you who are teaching database courses and are looking at database research in a big way. So, I must say that I was a database researcher. I don't really do much research anymore. But as part of my job, I'm essentially a salesman. I go meet all kinds of people all across the world and ask them this question about what are the hot technologies and what are the new things that they are looking at. And typically every year at the end of the year, I try to look at what may be the top 10 technologies that we should track as a company for the next year. So, this list that I'm going to share with you is the top 10 for 11, which is top 10 for 2011. And when you make a top 10 list like that, it's always fun because I'm sure there are many things here that I have included, but I have also missed out many other things that could be very important. So, this is definitely a personal list in the context of persistence. And I'm sure there are other areas that you think would be important and should be included, but that makes it for a healthy discussion. So, there is excitement across all component layers. On the processor roadmap, on the storage roadmap, on the networking roadmap, the telecom networks, display technology and graphics processors, all companies that are leaders in these areas have already projected very impressive roadmaps for the next year. So, very quickly let me share with you on the processor side. Intel has published its roadmap and has announced that 2011 they will launch something known as the Sandy Bridge architecture. And essentially, it's a 22 nanometer processor technology. And with that kind of a processor, you are going to see a lot of activity on the processor side. In addition to the Intel roadmap, actually AMD has also announced that their next version of the product known as Bulldozer is expected to come out in 2011. Most of these are all multi-core architectures. People are looking at ways of how you can put multiple cores on the same chip. And that is creating some interesting opportunities for people to look at how they can improve the performance of their systems by looking at multi-cores. So, multi-core architectures is a big part of what is going to happen on the processor roadmap for next year. What is exciting this year is also that the mobile devices, especially with tablet computers, starting with the iPad in the summer of 2010 and then Samsung and a whole large number of Android tablets coming out at prices that are very compelling. All of a sudden, processors are being sold in large numbers for these kinds of tablets. And there are other processor vendors such as NVIDIA and others who have processors for the cell phone that are starting to now become important in the whole scheme of things. So, the processor roadmap is very exciting for the next year. Next year's consumer electronics show, which is usually the biggest show for consumer electronics that happens in 2011, January, will already have a large number of tablet PCs on display. And some of them will have Android running on them, the iPhones, iPad's next version of the iPad with cameras is expected also to come out pretty soon. And again, mobile phone wars also continue. So, there is a lot of activity right now on the tablet and other related areas. One of the most important things that I think which would happen very significantly next year is the fact that the display technology is going to change dramatically. If you would have noticed still just a few years back, we used to have better and better CPUs and processors, but displays were not shrinking. You were always looking at some very large displays and big screens and CRT tubes and all of those kinds of things. In the last 2 to 3 years, you would have noticed even at home that all those CRT tubes have gone, you are starting to see LCD displays. First it was plasma, then LCD. And now it is expected that 2011 will be the year of LED displays. And there are some very interesting demos of what kinds of display technologies are likely to happen. And if you look at the density of displays on something like an iPhone, you will already notice that the CPUs while they are keeping pace with the display in the last 2 years has really been getting very sophisticated. The touch sensitivity on displays again is going to create a new opportunity for people. You are already starting to see things like Microsoft Connect, which is a Microsoft gesture-based program which will allow you to interact with your computer using hand gestures and things like that. So again, a lot of excitement on the technology roadmap for the next year or so. 3G and 4G on wireless networks is going to happen. And actually that is putting a lot of pressure on how computers must be seen. In the past, we have been used to using computers in a particular way, but with the mobile phone and tablet starting to become more complete programmable devices that people are using for all kinds of activities. They are putting a large number of demands on the CPUs such as always on access for super-fast boot time, fast broadband speeds, near zero latency across all information, more than day-long battery life, appropriate form factor, location-based information and presence. All of these are going to force computers to rethink their assumptions about how long it's good enough to boot the system and things of that kind. To number nine, where I see a lot of action in the clean and green technologies that are going to be hotbeds for innovation. There is a clear understanding that we are consuming large amounts of power and electricity for many of our devices and gadgets, both computers and home equipment and all of those kinds of things. There is a lot of work going on in smart homes, smart cities and various other places where people are looking at ways to improve the quality of life and other things by using smart clean green technologies. For the database community, this is pretty good action because you will see that all of these technologies require you to access sensors across various networks and collect data in large amounts and then process that data to be able to provide efficient use of your sensors and your systems. Overall, this is a good area for database researchers to think about how can you use data that gets collected from devices for energy-related things and improve the efficiency of energy along these lines. There is also a lot of activity going on trying to ensure that devices get shut down at the right time so if your disk is not going to be in use, then how do you turn it off and how do you live without consuming power when you really don't need it. A lot of the cloud computing activity that I shall talk about later is again related to this fact that people are very focused on improving clean and green technologies. Let me go to number eight now. In my context of number eight, I believe that user interface and design will make a huge difference. So far, we have been building good technology solutions but now technology is getting commoditized and the quality of user interface and design is going to make a lot of difference as far as what products consumers will buy. I have a few quotes here. One is from Steve Jobs who says that design is not just what it looks like and feels like design is how it works and of course you would have seen that while there are other cheaper clones available for Apple products, Apple has been doing a phenomenal job of keeping up with new designs, new ideas and new innovations on user interface that have made people buy Apple devices even after paying a premium and this is essentially what has happened and this is going to be very important in the next few years. I think the user interface is coming out of Microsoft through the Kinect or Touch and many other things are also going to get from the consumer devices right into the enterprise and creating a new set of opportunities for visualizing stuff using new and better interfaces. I'd like to very briefly mention one of our customers who is in the business of selling reagents, basically chemical substances that they sell to companies at large and in the past they were using round glass bottles for doing this and they partnered with a company called IDEO which is a very well known design firm and they came up with these new bottles which are square in shape they have big flat labels at the back where you can write on them you can open the lid on this through one hand you don't need two hands to do it the bottles are angled in a way such that you can stick a pipette without tilting the bottle and your pipette works very nicely and overall because they are square bottles you can save large numbers of bottles in a fixed space without having to waste the space that goes away for having round bottles so clearly something as mundane as bottles when they were able to redesign them and this was a three year effort to try to come up with newer bottles they were able to get significant improvements almost doubled their sales just because they had better quality bottles to support them so overall you know user interface is something that engineers have ignored in the past but I think in 2011 this is going to be a very important trend that people need to look at on the UI implementation side we will see a big fight between HTML5 Silverlight and Flash brewing for 2011 it's clear that eventually I believe that HTML5 is going to pretty much win over both of them in some form or the other but you never know because Silverlight and Flash are trying to create some new tools that make it easier to write programs and maintain programs so overall the next year or so you will see a lot of activity around HTML file when Silverlight and Flash and if you notice the main driver for these UI implementations is not the computers because you can run all three on the computer but it's the tablet PCs that are defining whether HTML5 is going to win Silverlight or Flash because people do not want to build multiple applications for different kinds of devices but want to have the same application same look and feel whether you are on a cell phone or on tablet or on a handled computer or a major desktop so again the UI user design and the implementation areas are getting to be very important to number 7 now and this is an interesting problem from the database standpoint this is a book called the fourth paradigm where the ability to handle very large volumes of storage and complex computing is redefining how we do science and let me explain what I mean by that this is a nice book that is available and this is based on the work that Jim Gray did a few years back on trying to look at data problems and data challenges for scientists and this is a quote from that book by Bill Gates he says and the photograph by the way is Jim Gray and not Bill Gates and he talks about the impact of Jim Gray's thinking is continuing to get people to think on new ways about how data and software are redefining what it means to do science and let me explain to you what he means by fourth paradigm so the first paradigm is all about observation so in the past say thousand years back or even 500 years back scientists looked at empirical phenomenon, natural phenomenon and actually described it using an observation and this is Galileo who is best known for his discoveries with his telescope so his idea was that he would look at what he was observing in the sky through the telescope and this was the first paradigm of how science was done in the second paradigm people used theories and created models for their research or whatever they were trying to observe for example Newton came up with Newton's laws so he observed the falling apple but then he came up with laws that explained all the Newton's laws of motions similarly Kepler planetary bodies and their interactions and Kepler's laws and then you have Maxwell's laws and many other laws of this kind where people took observations that they saw generalized them created mathematical basis theoretical basis for it and then eventually try to make sure that the observations that they saw were consistent with the observations or the theories that came up moving on the third paradigm is about simulation so here what people were trying to do and this was most common in the say from the computer study 60s onwards where people were looking at or scientists were looking at theories coming up with ways to validate their theory or models by using simulation as a way to do it again you know in all of these cases in the last few years when a model or creating a theory first then trying to create observations and then validating the theory or disproving the theory so it was always theory first and observations and data later in the last few years what has happened is that people have started to turn this around in a completely different way and what they are starting to do is they first collect data in large amounts and then try to find what kind of data and what kind of theories can be extracted out of the data that has been collected so some of the examples of research of this kind has been through astronomy so if you look at astronomers they have been collecting data so today if you want to do astronomy you don't need to go to a telescope you can go to data that has been collected by the telescope and you can essentially look at existing data to come up with new theories and new discoveries we have been very actively involved in a project called Virtual Observatory India that is part of the Global Virtual Observatory Scheme and what we have been trying to do is to collect data that gets collected from various astronomy telescopes and putting it together so that an astronomer in any part of the world can go to their computer and actually look at data that has been looked at and scanned by astronomers in the past and you don't have to do this again and again there is another very interesting project that I would like to recommend you to take a look at called PsiDB and this is a project where people are looking at scientific data and trying to see how one can save it, store it and what kinds of queries are likely to be because if you look at these scientists and look at the way they are working either in astronomy or in the nuclear physics or for that matter weather sciences and atmospheric sciences, oceanography they are looking at very large numbers of data large amount of data and managing it and manipulating it is incredibly hard so you will find that scientists are spending more of their time in just working with data management tools, defining them and using them rather than doing science I would like to share with you an example of a project like this this is called the SKA Radio Telescope which is the square kilometer array this is coming up either in Australia or in South Africa it is not yet been built out but it is an array of telescopes that are radio telescopes what is important to note here is that if you want to do these kinds of things they expect that they can see something that is as far as 10 billion light years and so almost nearly the big bang and to do this the kind of data that they will have to process is 1 terabyte every second at the speed of 3000 teraflops so these are going to drive the computing requirements and this kind of data is a delight for all the database researchers so encourage you to take a look at this if you can so just to give you a recap number 10 we discussed the fact that new technology on the semiconductor side and all the facts around new processors and displays are going to happen in a big way so number 9 we talked about technologies number 8 we talked about user interface and design and in number 7 I shared with you how science is changing and how people are looking at science in a very different way by looking at data first and science later rather than coming up with the theory first and data later ok so again in the data context you know life sciences bio stuff is starting to become pretty important and I am very excited about technologies that are likely to happen in personalized medicine over the next couple of years in the last 2 to 3 years see as you may be aware about 10 years back the human genome sequence was completed and sequenced at that time it was one genome sequence that took billions of dollars to sequence in the last few years we have gotten it down to a fairly low number and people expect that in the next few years we will be able to sequence an individual genome for a few thousand dollars and that can be done quite easily in many locations in the world and there is equipment being built around that there is also a lot of research that has happened in the last few years by the medical and clinical researchers that have identified that the drugs that we are given today especially for you know the cancers and other kinds of things don't always work for everyone and the effectiveness of the drug in some sense depends quite a bit on the nature of genes or gene markers that DNA markers that you might have in your body so the presence or absence of certain DNA markers can have a very significant implication on whether a particular drug would be considered suitable to you or not suitable to you so in the next few years it is expected that we will be able to get a machines available that you can get in a pathology lab that you normally go to and people will be able to identify markers very effectively and get fragments very quickly and thereby decide what medicines are right for you and which ones are not now these kinds of technologies to do them are very high data requirements so the next generation sequencing machine which can do the entire genome will collect about 5 terabytes every other week so you are looking at corner you know pathology labs trying to collect 5 terabytes every other week so from being just a data being a blood bank they are starting to become a data bank and with large amounts of data being collected and managed it is a lot of work just to keep this thing running so there is expected to be a large amount of research happening over the next couple of years in trying to figure out how to manage and effectively identify what you are looking at with this large amounts of data that the next generation sequencing machine and the rest of the bioinformatics and other kinds of things are likely to do so sequencing sequence alignment this is going to be an important area and other related opportunities that I expect which will be pretty hot would be bioinformatics healthcare systems implantable medical devices and health insurance data is again going to be a very important aspect of what is going to be things that are going to get looked at in a big way in 2011 clearly there is a big push from the Obama administration in the US on health insurance and health insurance reforms in the US and that is going to force a lot of money in this system to look at new and better ways to improve how patients track their data and how insurance companies track data and expect a lot of database research in this area in the next few years now let me move to number 5 so as you may have noticed you know we are talking a lot about the 2G scam but that is a scam that has happened for spectrum that was disbursed few years back the 3G spectrum was already disbursed few months back and people are already talking 4G as well because there is also a Wi-Fi already distributed in India and expected to roll out pretty soon Sprint is rolling out the 4G networks in the US they already have that and people expect 3G, 4G networks to come in very soon and one of the exciting things that people are looking at is when you look at these kinds of devices and these kinds of SIM cards or networks available people are not just looking at connecting cell phones for individuals but they are also looking at how can you connect machines to machines so M to M connection is starting to happen and how do you get devices that are all internet enabled on a Wi-Fi network or on a 4G network that can be used very effectively to ensure that you can communicate with devices all over the place so this leads to what has been known in the past as the internet of things so when people have referred to this as saying that you will have a washing machine that will have a chip in it and you can find out where it is and when it should be turned on and turned off all of that stuff in the past was not possible partly also because of the cost of communicating with such devices all over the place and there were no standards for communicating with such kinds of machine devices all over the place now this has been fixed and I think by and large technology one would expect that these devices would be available on the 4G network just to give you a little bit of a sense of what the 4G network is going to do so in a 2G network rather the 1G GSM network you are looking at roughly 0.17 bits per 1 hertz in the spectrum on the 3G network that moves up to almost 2.88 bits per 1 hertz and on a 4G spectrum that is as high as 16.32 bits per hertz so essentially you are looking at packing through new technology a very large amount of bandwidth within a cycle that you have available so essentially video on the phone you know TV on the cell phone is definitely very real when you look at 3G 4G networks and something that we have only dreamt of when we looked at 2G phones and so because you can do video so easily you can record video easily and you can circulate it you expect a whole lot of action which is because of that kind of stuff so people are looking at some very innovative devices when they are looking at sensors networks of these kinds one of the other challenges that has happened in the past has been the fact that you know we have not had enough IP addresses if we had to get billions of devices on the network IPv6 to some extent has addressed that so in the next few years you will see sensors sensors everywhere and every sensor essentially will come with some kind of a sim card that allows it to communicate with the network and collect data and push command signals directly onto the device so this kind of an internet of things is starting to happen and one could see this at home in cities running traffic signals light signals so some of the green computing will get enabled with this one and if you would have noticed in the last couple of years and this is true for all technology changes you know when you have a new wave of technology it happens when a confluence of multiple events all happens and converges at the same time so if you notice the hardware is getting more exciting displays are getting more exciting you are looking at you know sensors and sensor technology trying to become very visible and in my opinion 2011-12 is going to be a very exciting year for new technologies and new researchers to come up with new ideas and how do you manage large volumes of data that is going to get collected is going to be a very significant piece of how this whole business is going to work. Again it will be part of medical devices and smart grids and so as I was mentioning to you you will find that certain times there are these cycles that happen and people have talked about there are two kinds of cycles that I have been reading about in the last few years one has been mentioning about a 14 year cycle and if you look at the latest one which is the one before or the one that we are in was from 1994-2008 which was the internet cycle and in the some sense these are very cyclical so there is a boom and then there is a slowdown so if you look at the market in the 1994 to 2001 it went up there was a lot of action in the internet market and then over the next 7 year period it gradually came down if you look at the years before that would be the PC era which would be 1980-2000 sorry 1980-1994 that was the PC generation so the 14 years of the PC were from 1980-1994 and if you look at the previous 14 years that would be 1966-1980 that would be the mini computer and sort of the computers getting into offices era so if you look at it every 14 years there has been a change in technology certain things get aligned and new ways of looking at things starts to happen so we are looking at 2008 onwards a new 14 year era where the first 7 years are going to dominate various things and lot of these technologies are all going to fold into each other to get to the next set of things okay so the number 4 on my list is social networking and collaboration are starting to move into the enterprise and I am sure you would have already noticed that you know there is a lot of activity on the collaboration front within the sort of individual kind of a model on the consumer side so if you look at it there are a large number of people on the Facebook already and everyone is using Facebook and chats and all kinds of things like that it is so popular that from web 1.0 which was when the web started where you are looking at a website and coming off of a website web 2.0 already in 2006 within 10 years people were generating content and user generated content was sort of a big thing this has grown very significantly and if you look at the numbers you are looking at social networking as completely changing the way people are looking at it it is no more one to one to one to many but now the web is many to many with every individual communicating with every other individual and if you look at Facebook which is clearly the biggest of these you are looking at 600 million users who are subscribers to Facebook and if Facebook were a country only China and India are more populated than the Facebook so it is the third largest country in the world is today's Facebook in a way now if you look at the numbers they are completely staggering in terms of the number of people who are on Facebook 500 billion minutes are spent on Facebook every month and a huge amount of time people are on the Facebook at any one time if you look at how much video is uploaded on YouTube for example you are looking at 24 hours of video is uploaded on YouTube every minute so if you wanted to look at all video that is on YouTube it is simply not possible it is 412.3 hours of video that was recorded maybe this is statistic from a few months back we are adding 24 hours of video every minute so the amount of time people are putting on videos on the net is amazing and if you asked about 10 years back most people would have wondered why would anybody want to put their own videos on the net you know why would you want to put videos on YouTube it seems like such a stupid idea but today if you look at it the amount of video uploaded on YouTube is pretty amazing then Twitter is another one of those I don't know how many of you are Twitter users or fans there are 27 million tweets every day and you know a very large number of people on Twitter just to put some statistics that Twitter versus the Facebook on Facebook as you mentioned there are more than 500 million to 600 million users on Twitter you have about 100 and 608 million users but what is very interesting is if you look at how often people update their setup what age groups are involved on Twitter and how they look at Twitter it is very interesting to see large number of people are on Twitter every day they update their stuff on Twitter every day they are looking at stuff on Twitter every day and there was a research that I heard about very recently where a professor at Indiana actually looked at Twitter feeds to find out what impact would it have on the market and thinking that you know maybe if you looked at Twitter feeds the market is happy people are happy they write good stuff on the Twitter feed and there would be a correlation in the market being up and the Twitter feeds being good but they found actually the other way around in the sense that when people start writing good stuff on Twitter they are happy Twitter tweets then the share market seems to go up so you know all kinds of funny things people are looking at and the amount of data that is getting collected on Twitter is amazing and this is really happening in a big way and LinkedIn again is very popular amongst professionals you know you have more than 30 million users in the last 7 years and these numbers are very staggering now what is very important to notice that you know we tend to know a lot more about our neighbor's dog than we know about what the person working in my next cubicle is doing because Facebook gives you this access to find out what's going on and what people are saying within the enterprise is that okay if it is you know if I know so much about my neighbor outside in a consumer environment why don't we have something equivalent of a social networking Facebook within the enterprise so this is starting to happen in a big way people are starting to use different kinds of ways for doing it there is one of the popular ones is chatter which is released by salesforce.com they recently announced that they are going to make it free and available to everyone so I expect chatter to become extremely popular in the next few years and then there are other business models that are happening because one of the things that social networking sites have done in the last 2 to 3 years is standardize their user interfaces and API so you are able to go to a twitter feed through a program and look at what's going on an ongoing basis you can go to Facebook connect to things connect who your friends are so very interesting kinds of applications that are using business models that are revolving that are based on social networking and this is going to move to the enterprise over the next 2 years so again you know I don't know how many of your twitter and facebook users but what you will find is that depending on the age group of the people in the room number of twitter users seems to change or number of facebook users seems to change and one of the things that I recommend is you know try it out it's pretty interesting it's very addictive okay now going into number 3 this is relating to mobility what we believe is that video is the new voice so as we have seen voice was the killer app on the telephone expect video to be the killer app on the telephone or into various kinds of things in the past if you look at the trends video consumption Netflix and Hulu have become incredibly popular where they actually are movie sites TV sites and they have become very popular and are incredibly used by a lot of people the next layer of it has been YouTube which is video sharing where people contribute video and now people are looking at things like Skype and FaceTime for you know face to face video communication so we are at a tipping point in some sense in the last year or so on iPhone 4 and after that people have really Apple has released something called FaceTime which actually allows you to have two Apple iPhone users can communicate with each other on a video phone most of these new Apple phones have a video camera on both sides one in front and one at back so the one in the front is specifically meant for you know video conferencing if you look at some of the new devices like the Samsung devices and many other smart phones they are all coming with two cameras one in the front to look at your picture and the other one to take pictures from the other side so very much you know video conferencing device as they call it and the video chat era has started to happen video is getting all over the place just to give you the big boy in the video setup similarly Skype Skype has more than 500 million users so probably the fourth biggest country after Facebook and then they are adding 300,000 users per day so if you log into Skype and look at the number of users online at any one time you are looking at 23 million users who may be online at the same time so a large number of people are on Skype and this is clearly the best known video conferencing or voice conferencing brand and what is staggering is 13% of all international call minutes in 2009 were on Skype and one out of three Skype calls today is a video call so clearly Skype has started to take a dominant position on the desktop and people are using Skype video all the time on the desktop now with these mobile devices and new mobile devices coming in and 4G happening you are looking at streaming video coming on to your cell phone and that is going to be a very interesting opportunity for people to look at there is also an explosive growth of connected screens if you look at the numbers here which are absolutely staggering in 2010 from pretty much zero to we have had 15 million units of tablet PCs such as iPads, Galaxy and various other things from 15 million in 2010 we are expecting it to go all the way to 54 million tablets within the next one years within one year three times as many number of tablets next version of iPad is expected and so is one expecting the next version of Galaxy and many other Android tablets even in India we have just seen in announcement of Adam in the last week Olive Pad is also available so lot of good stuff on the tablet just to give a statistic 1 million iPads in 28 days that is what iPhone iPhone took 74 days to get to the 1 million units and this is iPad was sold for in 28 days to 1 million units now the best in some sense has been the Nintendo V which got to 1 million units in 13 days so you know there is room for improvement here if you look at smart phones again 81 million smart phones in 2010 to going up to staggering 875 million units by 2014 so you know more and more people are going to carry smart phones and not phones anymore and connected TVs is another big thing within the last year both Apple and Google have announced Apple TV and Google TV but essentially some form of IP TV is going to happen in the next few years and already today we are looking at 45 million 19% of the flat panel TVs across the world are today connected or connectable on the internet and this is expected to go to nearly 42% of all TVs will be IP enabled in some sense will have a way of connecting the TV directly into the internet so you are looking at some very interesting devices and Skype video video conferencing of TV video being pushed on the internet some very interesting application opportunities again large amounts of data getting collected all the way across and good opportunities for database research we do a lot of work in this area and we have been enabling Skype for multiple devices and the next trend that I want to share with you on the mobile phone is relating to the apps so as you would have noticed in the iPhone and on the Android App Store has been a very popular thing where you can download apps as you need them use them when you need them and then be done with them so again very large numbers as far as apps are concerned they are coming in from all kinds of sources they are coming from the operating system side from device side from the carriers and from some independent vendors as well so which one for you there are all kinds of apps whether they are games, they are music there is video books and there is travel all kinds of things and anybody who is using smartphones today already is fully aware of this but this was going to proliferate across all kinds of platforms and all kinds of devices so one expects that televisions will come internet ready and you will download apps that are appropriate for you on your own television let me just continue with the thing so if you would have noticed in the last few years we are just getting overwhelmed by data so sensor networks are generating data and large amounts of data is getting created and lot of this is happening in real time and it needs to be processed in real time the other trend that we have noticed is of all the data that is collected 80% of the data is unstructured by unstructured we mean things like video, text and all kinds of things that is not standard can't be stored in a relational database very easily and analyzing large volumes of data is both challenging and critical so you know again when you look at analyzing large volumes of data you are looking at how do you make sense of this large amounts of data figure out how you can visualize this and how can you put it all together but so data is getting all over the place and it is starting to become very important and it is causing a lot of opportunities for research for us so now let me go to the 3 points that I made to get started one is unstructured data is getting to be very important so if you have to process large volumes of data of this kind which is actually very different from the relational database things that you may have been talking about where you have structured data structured types and data has the standard asset properties where you want to really force the transactions when you are looking at log files you are really not looking at updates in a particular way but you are looking at streams of data and trying to see how to process these streams of data so when you are looking at these kinds of data there are different algorithms opportunities to look at new algorithms to look at this kind of data and solve them so in the last couple of years you would have noticed a whole stream of activity around what is known as big data and Hadoop and I think it is worth looking at it there is a lot of discussion on this around no SQL is relational databases gone all those kinds of things my personal opinion is that there are certain algorithms and certain kinds of data for which Hadoop is a very nice way of looking at and analyzing data but for certain kinds of things especially the ones that we are used to in transaction stuff relational databases and the standard BI that we are used to will continue to be there and that will continue to work well but as you go along for certain kinds of things perhaps Hadoop is a much better way to get the job done it allows you to have large volumes of data petabytes of data being processed in the same time there are certain limitations there are batch function you have to do only so many at a time and there is certain steps in which you should do it but it is certainly something which should be a good research area for people to look at and that is something that people should look at the other thing that I wanted to mention is around the fact that you want to analyze large amounts of data how does one go about it and there one of the things that I am very excited about is the opportunities around visualization so I want to share with you one example of visualization strategy that I would like to be looking at and I would be happy to encourage you to look at so today we are generating about 3.3 exabytes of digital and information is getting created every day and just to give you what an exabyte is one exabyte is the equivalent of 50,000 years of DVD quality video so we are collecting a lot of data every day and this is going to keep growing just to give you these numbers a terabyte is 10 to the power 12, a petabyte is 10 to the power 15 and an exabyte is 10 to the power 18 so these numbers are very large when you look at them and if you look at why is this happening it is happening because you know disks have become cheap we can store all this kind of stuff from something that was 228 dollars per gigabyte in 1998 we are looking at less than 10 cents per gigabyte today and this is just to give you some historical data here this is a 5 megabyte disk from 1956 so you see there were 4 people trying to lift a 5 megabyte disk this is the 1981 gigabyte disk from hard disk from 1980 and if you look at this here you will find that there is a small memory card probably a gigabyte again which is really small and tiny and that is right next to it 1 terabyte hard disk from 2008 and you know there are million gigabytes in a petabyte and a petabyte is typically 20 million 4 4 drawer filing cabinets 13.3 years of HD TV quality video so a lot of data is 1 petabyte and 1 million petabytes 1 million petabytes 10 to the power 15 is 1 exabyte and we are generating 3.3 exabytes every day so this is the amount of data that is getting collected and this is going to continue to grow exponentially so people are expecting 44 times as much data as we have today in the world that we have collected in the last centuries to get collected over the next 10 years and we are looking at most of this 80% of this being unstructured data all across the place so you know you need to find a way to organize it figure out a way to make sure that you know you can deal with it so lots of pentabytes and all that so clearly there are many technologies that are getting used you started with the rdbms et al reporting analytics I don't want to go through this I'm sure you would have already discussed this in your class again there is a lot of excitement around map reduce and Hadoop I already mentioned this to you and again it would be difficult to really discuss much in this lecture but I'm sure you would have already discussed this so again a lot of activity there let me keep moving here and again unstructured data is an area that I think you will see a lot more research in in the next few years this is an interesting idea that I have been working on which I wanted to share with you around visualization see what happens is when you look at very large amounts of data it's very difficult to visualize so let me give you an example of what it means to visualize a petabyte so here to store in an iPad about 64 GB you put it in an iPad and you stacked iPads one on top of the other one petabyte would be 62,500 fully loaded iPads and that would go all the way up to 794 meters so it's twice the size of the empire state so you know if you look at these kinds of facts and you look at data like this being organized in by doing it in relation to something else you automatically conjure a mental image for it for how to comprehend data how does one come up with these kinds of things automatically so that's an interesting area that I've been trying to look at the other one is to look at interactive games and visualizations as a way to you know visualize data so we are used to 2D pie charts bar charts and things like that but they have been pretty old we've been we've had them for more than 30 years today with LED displays better quality displays there is no reason why we need to stick to old pie charts and dashboards can we not do something more exciting and finally data as a service is starting to happen in a big way there are multiple companies that provide wrappers and APIs on various data sets and I expect that the ability to manage and merge data sets from different sources is going to become a very important trend for people to look at I expect the next few years to be very exciting for database researchers if you think beyond databases and think about data think about the data getting collected and think about what is happening there okay anyway let me just tell you that in my list the number one is cloud computing and if you look at it in the last few months to years you know you would have seen a big buzz around cloud computing and the main reason why cloud computing is starting to happen is clearly because we are today not leveraging our resources as efficiently so if you look at today's computing systems we have hundreds of people who are carrying machines we don't use them all the time so we have lots of free cycles all over the place and today when we provision our equipment we provision for the maximum utilization that might be needed but that utilization is for a very short period so we have lots of resources that are actually available to for us to reuse so if you are able to bring you know these resources to bear and become available to the market and to people then you can provide computing resources today at about a fifth of the price of today's resources if they can be shared so multi tenancy is one of the themes that is coming up around cloud computing and the idea there is that you know you can share data across or share systems across multiple instances and that allows you to actually build out systems far more efficiently and the only reason this is possible is because of scale so as I have pointed out to you we are collecting large amounts of data several exabytes a day and if you are looking at so much data and you want to manage it it's very difficult for people to manage it in their own in their own small labs or in their own data centers just at scale you really get excellent performance and you are able to do this quite well. I just want to give you one last example about how cloud computing is becoming very important so if you look at it few years back all of our parents and grandparents probably had cows at home and we were all completely devoting our time to worrying about milk and cows today you go to the store and buy milk when you need it so you know really we are not really interested in worrying about cows as much as we are interested in getting the milk when we need it so if you look at computing why do we need to pay for computers we just want computing cycles when we need them and as long as you have them when you need them and you can pay for it when you need to pay for it there is no reason why you need to own them and this is a big trend and it's starting to happen in a big way I expect 2011 to really make a change in this market again you know very large numbers are predicted as you would have noticed many of the applications that you are used to including things like office all of them are moving into the cloud they are moving away from your desktop and the main reason what is driving the cloud is economics because you are really looking for cheaper resources or less money to be paid pay per go pay per use computing and pay as you go models for various kinds of things on the cloud and the reason this works is because we are grossly over provisioned right now and we have heavy penalty for under provisioning that's why we have to over provision this is a picture of one of the world's largest data center so you know these things are coming out to be pretty big and the reason people are using cloud now is because of business factors because clearly you know I want to spend less and if I can get away with less I want to do it on the side because of the economies of scale in the data centers people are able to provide this kind of stuff and are able to get you know cloud computing setups to come in at one-tenth price and then technology factors finally make it possible because you are able to consume technology at you know through devices that are well connected so you don't need to have a whole lot on my client because my network is fast enough that I can get to what I need to get to when I need it and finally the clean state slate approach has helped when you look at all your assumptions and redefine them you get a good chance to really look at what is possible and what is not so with this you know I think cloud computing is going to be the next big trend that is going to happen and again large amounts of data problems will get created because of the cloud let me now stop here and hand it back to professor Farrak if you have a question please raise the hand on your interface so that we can see that there is there is one query from NIT Warangal he says what are the challenges of cloud computing in business intelligence you want to use it for business sure see one of the interesting things that is happening in the context of cloud and business intelligence is as follows see in the past when people ran their business analytics they were running against their own data sets so people were doing transaction processing against your own data set today what is happening is that there are large amounts of data sets that are available in the market that you can connect your data sets to and you can do some very interesting queries that combine your personal data your organization data with data that is available online so let me give you an example of this very quickly so we have one customer for whom we are building an application on an iPhone which essentially allows you to look at so here is what happens you know see you subscribe on a website on what your allergies are and allergy maps in the US are easily available online when you walk into allergy prone zones it alerts you that you are in an allergy prone zone that you might get allergies to these things it also connects you to stores where you can buy medicine and what particular medicine you can buy for doing this kind of stuff now a lot of the data that we are using for this particular example is all coming from public and open sources and it's very easy to aggregate and create mashups that involve data not only that you have on your own site but also data that is coming from multiple sources and that's what is making the BI and the whole thing very interesting and exciting and you want to do this on real time and real data already map based mashups are very easily available so today anything that has geographic content people immediately want to connect a map to it so those are examples of BI on the cloud and you will see a lot of similar examples coming up let me now stop here