 A warm welcome please for Srivenas Gopinath. Welcome. Thank you. Thank you. So, thanks for that introduction, that good piece of introduction. It felt very nice. And, well, that philosophy bit in my introduction actually, it came because I didn't know what to put, you know, when I signed up to speak here I was asked to give some personal tidbit. And that's the thing that came to my mind. But what we are going to talk about today is anything but philosophy. Let me be very clear about it. Okay, so it's a deep dive session into data analytics, which is the big data analytics system. We have developed for government of Andhra Pradesh as part of ePragati. So, without further delays, let me get straight into it. So, we have heard about the data analytics, the big data analytics, Hadoop, and the need to have an enterprise-wide analytics system in corporate sectors. But trying to do that in a government sector and trying to do a whole-of-state analytics system is a different ball game altogether. So, you know, we had to first build a strong business case and then conceptualize it first of all. You know, why? Why we need an entire statewide analytics system? And then next question would be how and then what? So, those are the three phases, which I'm going to describe broadly today. So, this is the concept. So, the concept is data-driven governance. So, what is data-driven governance? It's an approach that values decision backed up with data that can be verified, usually with the help of statistical and business intelligence methodologies. This is becoming more and more prominent these days, particularly with the innovation in the field of computing. So, we have this distributed computing, massively parallel processing technologies and all that. So, in the older days, when we had these large processors and supercomputers which were very expensive, people tried to avoid these kind of big initiatives. But with the technology we have today, it's very viable and feasible to do this. So, on top of that, we have the availability of data, external data that is, and the technology to process them. So, some of the facts and figures are given here. You can see that by 2012, White House had already invested more than 200 million dollars in big data analytics system. And then you have UK government which was able to locate almost 7 billion pound additional revenue due to analytics, by leveraging analytics. So, data-driven governance in brief is to leverage the data and the technology to process data to improve the functioning of government and enhance the services provided to the citizens. So, value proposition. So, some of them are listed here. These are pretty common value propositions that you will find with any data-driven governance initiative across the world. Fraud and pilferage, prevention of fraud and pilferage of funds, better engagement with citizens, effective usage of funds, and managing public perception of government. This is very interesting because you will see that these days with the advent of social media, citizens have started engaging themselves in government and it becomes imperative for governments to hear what citizens have to say and to work closely with them. So, that's about the public perception of government. So, the government need not wait until the general elections to understand what people think about them. So, it's there, it's available live. So, something happens today, some significant event in government today. The government will come to know about public perception the next day within 24 hours. So, that's the strength of social media in governments. So, that's an important angle to consider. We'll come back to it later, but these are the value propositions. Typically, data-driven governance had a traditional approach, right? If you see, governments had already started doing some kind of analytics a few years ago. And these were mostly at the department level. So, for example, the tax department or revenue department might have had a small analytics engine to find out who is paying tax and who is not paying tax and those kind of things. But what we are going towards is a whole of government level initiative and enterprise-wide analytic system. Again, this gels in nicely with the concept of enterprise architecture and JSR and Dr. Saha already mentioned this many times that it's very important to think as a unit, as one government or rather than thinking as separate departments. So, who knows? The department available in... sorry, the data available in one department may be very useful for some other department. So, you may have the revenue department which would want data from, let us say, power and other departments to find out the energy consumption pattern and then see if an entity is paying tax the way it should be paying or not. So, that exchange of data and getting the maximum value out of the data is most important from a government analytics perspective. So, again, leadership is also important. As you can see, the last point talks about enterprise strategy. You have a leader at the top who will drive all these initiatives rather than having leaders in the department level or sub-department levels and not trying to drive large initiatives. Moving on. So, as I said, governments have already started with the concept of data-driven governance. In AP, we have a core dashboard and some amount of analytics and reporting available to the chief minister at his fingertips. And also, of course, AP is one of the first states in India to recognize the importance of data and getting the value out of data to run a government. But there are areas for improvement and a couple of them are mentioned below, the predictive and prescriptive analytic capabilities. So, that is where governments would want to look at. So, if you are able to support your decision or if you are able to predict something and then decide and then if somebody asks, why did you take this specific decision? You could easily give some kind of reasoning. You could support your argument with some data, facts. Well, that will also avoid a lot of controversies. So, it becomes much easier for people to explain why they have done what they have done and reduced controversies. So, predictive and prescriptive analytics and then, of course, data collection and governance strategies. Two very important critical success factors for any program involving data and IT. So, these are the two areas of focus which we have identified, but there are many not going to details of those aspects. So, we have this animal called Datalytics, which is big data analytics and it has a vision and then, you know, it has a definition and it is there because of a reason. So, the vision is there, of course, to support the sunrise AP 2022 vision and how by providing an integrated whole-of-government business intelligence and big data analytics platform. And the definition is, of course, business intelligence and data analytics system, conventional and unconventional, which is big data. Big data, of course, includes all kind of data. It's wrong to categorize big data as only unstructured data, but big data is all data. So, in the context of the ePregative program, the Datalytics fits into the support layer. You can see there dotted circle, along with systems such as CLGS, Certificate Less Governance System, Payment Gateway, et cetera. Datalytics is an important supporting application. The reason is that you really do not give any direct, you do not really get any direct G2C services out of Datalytics. So, it's not that citizens directly use this analytics engine. I mean, some entities might use the system, but ultimately, it is meant to support the departments and the government to better provide G2C services. So, it is, therefore, a supporting process. Well, it could also be seen as an application or a system that can enhance productivity, but ultimately, fundamentally, it's a support process, right? So, that's why it's in that layer. Well, any program has to be supported with strong business cases or use cases. So, unless you come up with some business reasoning and use the scenarios, nobody would buy your idea. So, this is one of the most important things that we did in the beginning when we started with this concept called data analytics. So, it said, where is that we want to use analytics? You know, there were arguments both for and against having an analytics system. You know, some people said that why do you need analytics in government? Government can run without all this big technology stuff. But yes, government can definitely run, but there is some value out of coming out of these systems that might be very, very useful for governments. So, we went about building these use cases. We spoke to many departments, tried to explain them what an analytics system is, and then, you know, tried to solicit that information from them, data from them, and not try to build these scenarios. Broadly, you can categorize these use cases into two. One is supplementing existing business processes or department processes, and then creating new business processes or department processes. Now, this is one use case in primary sector where we have a service identified already, pest control and prevention of these uses. Now, the question is, what value can data analytics bring here? So, we were dependent on experts to go to do the field visits to see what's happening in the field and then, you know, get some data from satellite and weather department and then put them all together and come up with some analysis and then, you know, say, okay, most likely, this is what is going to happen this season. You might have specific type of pest attack because the conditions are favorable, and these are mostly expert opinions based on their own experiences and etc. But what can data analytics bring now? So, we have the capability to analyze data from multiple sources, right? So, it could be something happening in the neighboring state, for example, Karnataka or Tamil Nadu or Telangana. Something might have happened there which might be very useful for us to know, right? And then, you may have a lot of articles on the web which talks about specific kind of disease or pest attack. And then, for you to gather this information, this bits of pieces of information from various sources and then put them all together and understand it's manually impossible. So, now you have a system which can do that for you and can actually give you some kind of a prediction of what might happen, right? And then, it can be presented in a way that is easily understandable by people across departments at all levels. So, you have the field level officers, you have the department HODs. So, each one needs information, specific piece of information in a specific format. So, you have that capability to deliver now. So, that's the value addition here. So, you have using internal as well as external data and advanced analytics to predict disease and pest attacks well in advance and help farmers and departments take remedial actions, right? So, this is one case. Opportunities for creating new services. Again, this is another scenario. A typical scenario, gaining insights into relationship between student enrollment, attendance and dropout rates and condition of schools, availability of basic facilities, toilets and etc. And then, you do an analysis and then you find out, okay, that probably most likely the high dropout rates or the low past percentages are because of these reasons. People are not able to commute from their homes to the school or it's too hot. The weather might be very hot for people to just walk. They don't have commutation facilities. So, then you bring the classroom to the people, right? So, you have mobile classrooms as a concept. So, you can think about delivering educational services through mobile classrooms. And there are a few more use cases. These are typical standard use cases. You have the commercial tax department trying to identify the anomalous and dealer behavior. The department of energy trying to do some load analysis, PR department doing sentiment analysis and all that. These are typical and you can find them in any literature relating to analytics and government. Of course, some of these also come from our interaction with the department. So, it's more customized to AP. Okay, so now you have the concept of data-driven governance. So, people have bought it, bought the idea. So, now how to go about doing it? How to about realizing this concept and coming up with a system that can be used to drive the governance, to drive data-driven governance. The normal way of coming up with ideas is brainstorming. You sit in a conference room or in an enclosed place and then with a whiteboard and then you scribble something on the board and then you pick brains of people around you and also from outside, et cetera. So, that's typical brainstorming strategy. But to give it some kind of a structure, we have something called as TOWS, Tau's Analysis, which is basically SWAT plus strategy alternatives. So, you identify the SWAT strengths, weakness, opportunities and threats. You list them down as a project team and then you come up with alternatives for each combination of the strength, weakness and opportunities and threats. So, you'll see the X-axis here, the green boxes are strengths and weaknesses. The red boxes are opportunities and threats and the blue boxes are where they intersect. So, you have some strategic alternatives in each of the blue boxes. Some key strategic alternatives I have picked. If you see the first one, it says develop strategy to use data from sensors, social media and internet for analytics. So, it becomes very clear that you need to have a big data analytics system and you have a use case here, a business use case. And similarly, if you see point number 5, project leadership. So, we all know now how important leadership is in an enterprise architecture initiative. So, how can that be leveraged to get the laws, rules or processes changed? You know, sometimes these things become bottlenecks for information exchange. So, you know, they need to be fixed before you can now go ahead with the initiatives. And of course, there is leadership skills to motivate and bring in more volunteers and technology experts. And all these things have happened. I've seen it happening. We have seen it happening in our team. So, now we have strategic alternatives. So, we know the data-driven governance is important and then we know what alternatives we have. Now, we have to crystallize them into a package and bring a system, an application system out of it. So, that is the process of building architecture. We have the enterprise architecture considerations, the principles, the processes, the standards, etc. And then we have the need to produce value quickly and incrementally. Right? So, that's also an important consideration, like Dr. Saha mentioned. It's important to think like an implementer rather than just like somebody, you know, who is doing some consulting and lip service. And then, of course, the governance is a key aspect. It's one of the critical success factors. And we have the governance body to see that, to see through the implementation of data analytics system and ensure that it meets its desired objectives. So, now we have come to the phase where we have to decide what are the analytical capabilities that we need that the government needs. All right? Broadly speaking, it's easier for us to imagine a scenario where you need all the capabilities that's available. But that's not viable or not even feasible. So, we have to carefully plan and then prioritize and then select the top ones and implement them. Although your architecture should be flexible enough to undergo gradual improvements over a period of time. But the way you implement, you know, should be based on priority and, you know, the need of the R. And there are important considerations around creating an analytical system, the collection, data collection and enrichment strategy. Right? That's the point number four. As you can see, it's a very important aspect. And the way we do it, the way we plan the analytical capability, so we have the user cases which is our stories which is given to us by the departments. And then we know what analytical requirements the departments need. We now go and identify the data points or the data sets that we require in order to fulfill the analytical requirement. And then we go in search of the data sources. So, once we build this matrix, it becomes easier for us to come up with a data collection strategy. So, that's why it's bolded there, you can see. And capability development planning is also an important aspect. That is, you know, it's fine to develop a system and give it to the government, but the government should be able to manage the system and take it forward. Right? So, the government should have some capability on its own to maintain and upgrade the system. So, all these aspects like training, skill sets, hiring, et cetera comes into the picture there. Data governance is an important aspect too. As I mentioned earlier, it's a critical success factor. And I have a slide then to discuss about the governance, aspects of governance later on. Yeah, here it is. So, data governance. So, define, approve, communicate data strategies, policies, standards, architecture, procedures and metrics. Enforce policies, standards and procedures and make sure that all the applications, including data analytics system, adheres to those. And of course, the oversight at overseeing the delivery of data analytics projects. That's also an important function of the government. Reducing bottleneck issues, identifying the department processes, rules or laws that hinder data exchange, all these become part of the governance team. Last but not the least, you have the value, promotion aspect, which is understanding and promoting the value of an analytical system across the departments among all the stakeholders. And that's also a key function of the governance team. So, finally we have the inputs for data analytics application. You can see it's the architecture considerations, analytical capability requirements and strategy alternatives. So, we have the blueprint. And first step, of course, is to come up with a reference architecture or find a reference architecture if available. And this is one of the biggest challenges we faced. You know, we did a lot of research and we found that nowhere in the world any government had tried anything like this in terms of developing an analytic system. So, you had a lot of use cases where department level initiatives were done and we had some architectures for those. But whole of government level data analytics, big data analytics system is something which not many people. In fact, at the government level, nobody had even tried. So, this is one of the first I can safely say, and in fact I've got the confirmation also from some of my colleagues elsewhere, that this is one of the first of its kind being developed in the world. You know, if you find something like this, it's still under development or the scope is not as large. But this is one of the first. So, we had to put in a lot of effort in coming up with a reference architecture. And finally we did, and this is what it looks like. You'll see that it contains all aspects of an analytic system. Excuse me. So, you have the sources which are structured and unstructured, the bottom. And then you have the delivery which is multiple deliveries, alerts, portals, collaboration, mobile office apps, et cetera. And then on the right hand, on the left hand side, the dark blue boxes show you how some of the common aspects like metadata, security, life cycle management have been included in this model. We came up with a more detailed architecture pattern, which is this. This is very specific to government of Andhra Pradesh data analytics system. And you will see, I mean, it's not legible. Pardon me for that, but this is our logical architecture. Again, it's the same thing. And the key aspect here is the integration layer, which is the central enterprise service bus kind of a product, e-highway, which will help us in delivering and also extracting data from some of the sources. So I'm not going to go into the details of the architecture. Just a few bullets on the key features. So we have predictive, prescriptive, descriptive and causal analysis capability, multiple delivery mechanism, KPA analysis, which is basically very, very important for departments because that's the way they are measured. And then you have all the big data analytics system, textual sentiment, pattern matching, topic detection, log stream, et cetera. Textical and mathematical analysis, reports visualization. And one of the, you know, we have this concept of real-time streaming log analysis and sensor analytics, which can easily gel with smart city initiative. So tomorrow if you have a lot of sensors coming, sensor data coming from, let us say, smart city initiative, then this platform will be in a position to capture and analyze those data. Now the next important aspect is the team building. So we have conceived an idea. We have an idea of forming a central analytics team which will cater to the needs of all the departments. And we have some rough idea on how the team is going to look like. The composition include, I mean the team might include data analysts, engineers, data scientists, statisticians, software engineers, DBAs, and OEM specialists. And we wanted the system to be easily configurable and maintainable. And like, you know, the corporate systems which are technology heavy because you have a lot of support system there. And the core team, the technology team is huge and they can, you know, develop systems on their own. Whereas in government to minimize cost, it is better to have a configurable system which can be easily maintained by the existing support team. Maybe with a little bit of training, but mostly it should be manageable by the existing team members. So that actually brings in the COTS angle, right, so, and then the OEM specialists. And then, of course, collaboration with academicians. It's important to work with various universities, particularly when you talk about statistics. So we have advanced statistics institutes in the state which can contribute and also leverage the system for the R&D purposes. The challenges, of course, it's always there. But before that, the critical success factors, availability of data. So these are some of the questions which we need to ask ourselves as a team to ensure that we are on the right track and we have the right support system. Do we have all the data points to perform accurate data analysis? What is the quality of data? Do we have enough historical data to perform analysis? And coverage of data, data governance, training and talent development plan, of course, and then team composition. Just one quick bit. I know data might be available, but we may not have enough historical data. And then you use whatever is available with you and then you get something wrong out of it. That's a dangerous situation to be in. You want to have analytical capability, but you want it to produce accurate reports, accurate predictions. You don't want inaccuracies or huge inaccuracies to creep in which will defeat the purpose of having an analytical system in the first place. So one example, there's some study which was done some time ago which predicted that onion prices would peak in February 2016 and this was done a few months ago. Five or six months ago. And it predicted that onion prices in India would be at its peak in this month. But that has not happened so far. So imagine what would have happened if this information had gone to the government and the government procures or comes up with a strategy to procure onions in advance and then they find out that, wow, it's a waste of money. So you need to have good data and enough data to be able to do advanced analysis. So that's a very important factor. Key challenges and, of course, the strategy is to overcome those challenges around data, so the data availability. I think Pallab already mentioned about that. And how do you overcome the challenge of availability? So you have to be proactive, start collecting the data. At least, if not now, you will have some data one or two years down the line which can be useful. So you come up with, you can see the idea now. Don't wait till your idea is materialized but start collecting the data in some format. So that's one strategy. If you don't have enough historical data, well, if you don't have it, you don't have it. You can't do much about it. At least start building so that it will be useful in the years to come. But if you don't do it, then you will not have data even after two years. So that's very important. Quality of data, and this is very important specifically again from statistical analysis point of view, you need to have quality unbiased data and it's very difficult to control the quality of data in the open world. So the systems and the sensors and et cetera applications, they produce data and there is legacy data. And you have to accept them as they are because you can't go back in the time and then correct the fixed issues. So the first step would be to come up with some parameters which can measure the quality of data which you are interested in and then come up with some solutions to fix and improve the quality of data. So that's again a very elaborate and thoughtful process and data sharing, again, it might become a bottleneck particularly because now you have all the departments giving their pieces of data and it's going to be consolidated in one location and departments might not want to do it and you have seen that some of the technical support teams resist to share data. So that bottleneck has to be removed and again that comes from leadership as I mentioned earlier. So I have seven more minutes and I think I'm on track. So finally what's that we are looking at? So we have the system, so now we have the concept, we have the idea, we have the system, we know the challenges and we are already now on the road, we have started our journey towards building the analytic system. So what's that we have learned and what's in it for other governments and other states, right? So this analytics framework is unique as I mentioned earlier and it can become a framework in itself for other governments and states to follow. It is something which is conceived, created here in Andhra Pradesh but it can be useful for other governments in other states and nations too. But of course we have our learnings and we present our learning as guidelines, right? So understand your data, assess data as a capability. Now data as a capability is an important concept, we are bringing in this concept here because data itself has a lot of value in it and if you have good amount of data, quality data it itself becomes a capability which of course you can tap into. Governance framework is important. Prioritization of analytical capabilities is important. Don't go after all the fancy analytics that's available in the market. Just have it in mind. Build an architecture that can support all those fancy stuff but you prioritize based on the cost and the time, etc. and what exactly you need, where you have your data capability, etc. And then the strategies for acquisition, creation, profiling, cleansing and enrichment of data that's also an important thing. And bottlenecks for data exchange. Identify and eliminate them. And finally, and I think it's one of the most important considerations although it generally gets left out at the enterprise architecture level the implementation strategy which is whether to go on cloud or on premise or hybrid, etc. And again, you have to decide a good implementation strategy one that will work for you. Don't go by what corporates do or other people do. Based on the data you have, the size, the volume, etc. you come up with a proper implementation strategy. So these are the guidelines and that's it.