 So I'm going to give a really quick introduction of who I am my name is Shivam Shankar Singh I graduated from the University of Michigan and our work came back Worked in the Indian Parliament as a legislative assistant to member of Parliament for one year Then worked with Prashant Keshore who runs a company called I pack after that I moved to running election campaigns in the Northeast and then I worked for Ram Madhavji and BJP and handle the campaigns in Manipur and Tripura Manipur was a really tiny state And that's where we started experimenting with a lot of data stuff because each conscious range is just like 50,000 people So you can do a lot of things there This is actually a report that was published as soon as I landed here in Bangalore This basically is an Oxford study that says that companies have invested half a billion dollars and different parties have invested half a billion dollars across some 41 countries to experiment with what can be done with data The study says that since 2010 many political parties and governments have spent over half a billion dollars on research development implementation of psychological operations and public opinion manipulation over social media How did people hear about politics and data together for the first time for a lot of people in India the first thing They heard about was Cambridge Analytica. It's a UK based company that basically took people's data from Facebook and then used it to target target advertising on Facebook itself this I Our parties major made a huge hue and cry about this. They said that okay Congress is using Cambridge Analytica BJP is using Cambridge Analytica as far as I know no political party in India has used Cambridge Analytica And for just Facebook data, it would not be a valuable tool on the other hand We have a lot of things that are being done in data and politics in India This is one of the presentations It's a campaign pitch that we pitched to a political party and when the Cambridge Analytica story came out I was contacted by a reporter and the first thing I thought about we should really change the names of our slides Weaponizing data is not a good name to have after the Cambridge Analytica story The technology that's used in politics is decently basic. It's python to scrape off data from the web It's put into a database for visualization. We use things like D3.js We use QGIS and Tableau to make it look pretty to basically put it over a map so that party leaders like the way it looks We've developed mobile apps so that leaders can just look up constituency profiles. There's actually a really interesting story on this The first time we went to Tripura with a major party leader He had a tablet on his hand and in a meeting full of party Karikartas He just made random people stand and asked them, how many Muslims do you have in your constituency? How many people from the Jamathya tribe do you have at your booth? And that was something that got the Karikartas really interested because they realized that party leaders have this level of data It incentivized them to work harder. It incentivized them to actually go out on their booths and start collecting data for themselves So a big disclaimer before we start the next part I do not recommend that anyone go out and actually do any of this Neither am I saying that we've done any of this It's all things that are possible in politics. Basically don't want to be the next Cambridge Analytica headline here Most of the data that's actually used is absolutely publicly available and it's all legal The first part of data that political data analytics starts with is actually from the website of the election commissioner Every state has a chief electoral officer from that website You can download something called the electoral role an electoral role is basically a sheet of paper That has people who vote at that booth the information that that sheet has basically like it has the constituency name at the very top Then it has the booth name and booth number. It has your voter ID. It has your name. It has your father's name It has the household number. It has your age and it has your gender So this is data that the election commission provides to the general public You can log on to the website right now and download PDFs If you zoom into this, then you see this is the first 30 names from a random constituency in Bihar. It's the constituency I belong to So let's convert this into an excel sheet With this what you can see is that there is some information that becomes very obvious as soon as you see it What do you think you can tell with just this much data any guesses? Yep. Cast is definitely one of them. The age is definitely there's a column for age So, you know the demographic profile of that constituency based along two dimensions. One is cost the other ones age Exactly For the cost we've actually written algorithms now where it happens at the constituency level that people fill in that this cost Equal just surname equals this cost If you do this for a big enough data set then it automatically assigns cost to people For some people like the third one in this row You don't know what cost the person is but you can just Google it and as soon as you Google it Google tells you like Someone asked a question in Kora and someone answered it So that's how you get the cost for people The interesting part is that there is a lot of ambiguity when it comes to last name and cost For one Chaudhary in Bihar actually correlates to two different cars One is Bhoomayar the other one is Pasi But the algorithm knew what to assign the name to just because of the constituency the allocation happens at the Constituency level there are physical people sitting for every constituency and so for an MLA constituency There are about like 2.53 lakh voters But there is someone sitting and filling it up for 10,000 voters manually and it happens every for every constituency Some surnames stay the same across the state, but some change constituency wise. So you still need to do this exercise As someone mentioned you can tell family We don't really know what to do with this data yet because whoever's working on the ground for a political party Already knows who's like a part of which family And like how do you market micro target based on that? The accuracy of this exercise for the states that it does work in for the big North Indian states like UP and Bihar It works surprisingly well for some states. It will not work at all like Punjab everyone's last name is Singh It's it's the largest schedule cost population in the country But you cannot stratify people based on that because it's sing in the name of the village That's all you get for a place like that You might have to do it manually, but like I haven't done it yet. So not really sure This is the information that we have courtesy of the election commission now We have the name we have the age We have the location of the person because that is the booths that they vote at we have the gender What can be derived out of this is the caste and religion religions actually surprisingly easy to derive Caste for the major ones like Riyadhav Rajput is decently easy to derive to for the smaller cost Which use multiple surnames? It's slightly complicated But then this is what political parties have that no one else has they have a huge amount of free labor You have so many party Karikarthas who would be so excited to be a part of strategy You can get thousands of people who keep do doh gante a kid. They're just filling up I shot this last name equals this cost this last name equals this cost and we basically Gamified the entire process and people are really excited to do this for hours on end This is some more data that the election commission itself provides you this is a booth has less than thousand voters on average The this is some constituency from the northeast which had like 600 to 700 voters You have the voting profile for every booth. You also have the cost profile for every booth So if you have data for the past three Lok Sabha elections and the past three assembly elections Which we don't we have it for like two assembly elections and one or two Lok Sabha elections because Delimitation happened which is when they change the boundaries of all the constituencies and you just don't know what region fell in part of a new constituency With just this data and basic statistics You can get to a pretty good understanding of what community is voting for what party or what community is voting for candidates of their own party because you have different booths where you have Basically stratified them into different cars and different religions and you have the voting record from those booths So just by taking out an intersection. You can see what how the community is voting in my experience The intuition talk that we just had it would correlate pretty well with this because whatever the local intelligence and the local party Cadet tells you key by a yellow cast is going would that it actually ends up being true So you might not need this data at the end of it We've seen cast religion age family relations booth level voting trends What comes next is the gray area and I know the slide is black and it's intentionally black on purpose This is a tweet by internet freedom foundation data protection is not some distant intangible elite demand many Indians today use Paytm Ola just dial flipkart as per a recent unsolicited email We got many of such databases up for sale. They open they open up people to not only spam, but also identity theft What it also does is that it provides a lot of data to politicians to actually compromise democracy at the end of it There are student databases up for sale You can just get whoever's a student in what stream in the entire state. You can get people's smartphone data You can get matrimonial databases. I'm sure we could think of a use for that You can get online shopping data You can get the list of government employees and all of this is available through unsolicited emails by random companies The X factor when it comes to political data analytics is phone numbers And it's not the CEO of airtel or like the CEO of geo who's selling you these phone numbers It's random people in random constituencies There are people who just mind the list of every sim card issued in a state and they will just give the data to anyone Who wants it like if the other hard database the enrollment process is so compromised You can just think what the data security is like on something like a phone number Another really important factor that you need before you can actually turn anything into actionable intelligence and politics is people's socio-economic status That influences voting behavior to a large degree. What do you think would be a good proxy for? socio-economic status any guesses Shout it out. What address could be one. Yes, not very specific though like urban clusters Everyone lives everywhere Twitter's too limited Mobile phone brand could be it but we've actually found a much better proxy that has a one-to-one correlation to socio-economic status That is people's electricity bills So you can use land record census data in SSO surveys BPL lists and all of these things are used But they're complicated electricity bill on the other hand everyone has an electricity bill And it's a one-to-one correlation the more air-conditioners you have higher electricity bill higher socio-economic status you just get And so honestly this data is surprisingly easy to find you don't need to contact the power ministers There are people sitting inside discom so well willing to sell you this and there was a discom in Delhi Where a friend of mine was paying the electricity bill and he realized if you entered the customer ID It showed you the bill billing details the name the address and the amounts for the last three months So what he did was he wrote a script it basically just passed through all the numbers like incremented it till it ran in decrypted it And he had the bills for his entire locality So micro targeting Facebook advertising is a major part and like Twitter and stuff is a major part But what's allowed for true micro targeting in politics in India is an app called WhatsApp much more prevalent than Facebook much more prevalent than Twitter and In this you can actually target an individual instead of trying to target a group by just age When it started out in 2014 one of India's major political parties will not name it They had 9,000 to 10,000 groups in the entire country during the 2014 elections in Karnataka itself in the last election. They had over 20,000 WhatsApp groups operating These groups that people are added to and like some of you might have been added to these are not groups of random Numbers when it started out in 2014. It was actually just random numbers They had a list of numbers and they kept adding 256 numbers to each of the groups now It's not random and you guys honestly probably were not added to a group because you were assessed to be high SES urban youth Which is a demographic that no one really wants to target right now But what this allows you to do is that you form groups or specific communities You have groups based along certain cast lines. You have groups of social certain socio-economic segment you have groups of particular constituencies Think of a group of youth between 18 and 25 Hindu low to middle socio-economic status Non Yadav OBC in booth number 5 to 150 in the 12th assembly constituency of Uttar Pradesh This is basically a group that has a specific age group specific income status specific caste What can you do with something like this? For a party that's trying to win an election in this constituency It knows that 22% of the constituency is upper caste 16% is Yadav and the Yadav population supports the opposition party. That's the core World Bank of the opposition But the region also has 18% non Yadav OBCs What they would think of is to get the upper caste vote plus the non Yadav OBC vote for this they run focus group surveys internal AB testing and like political acumen at the end of it and then they test messages So the general message here is Yadav's have cornered all reservation benefits intended for OBCs Party X only supports Yadav to the detriment of all other OBCs. We must stand up and fight this injustice We must teach them a lesson this election So this is a very explicit statement of what a political party wants to do. This is not how it would work This is where propaganda comes in fake news comes in random facts come in because you want to convince voters of this Message what you do is you start sending random made-up statistics that 90% of the OBC reservation in the state in educational institutes is taken by Yadav's No one knows if that message is true or not, but it is something that resonates with the people. They're like ha ha say what I'm So here is where polarization starts fake news start Distorted facts start caste conflict start people start linking nationalism to one party It doesn't happen instantaneously if I sent you a message today, you'd read it and not think about it anymore But if it was a concerted campaign over a two to three month period where I sent you facts where I sent you jokes And all of them are pushing you in a very specific direction to think in a very specific way The may the providing your sense of victimhood Even when that fact is corrected you will not believe the correction You will continue to believe your originally held belief because it plays in line with your bias Some of you might have heard of the Stanford prison experiment that was conducted in 1971 This experiment in the US they classified people into two categories One was prisoners the other were guards and they basically showed that people's behavior is dependent upon the position that they're put into So the guard started torturing the prisoners. They beat up the prisoners that they basically mentally harassed the prisoners Around seven eight years after the study was published. It was found that the Study is absolutely fraudulent. No such thing happened. It's completely made up But if you tested college students today, there's a lot of data on this 70 to 75 percent of the people in intro stat classes Still believe that the study is real and this is what happened There's a lot more happening with data today This was just a snapshot of something that can be done with just phone numbers and electricity bills and publicly available data from the Election Commission Right now political parties are collecting data's of beneficiary of government schemes like the Ojwala Yojna Pradhan Mantri Awas Yojna the number of toilets that were constructed What's going to happen is political parties are going to target these specific people for specific campaigns Eventually someone might get access to loan data Eventually someone might get access to IRCTC or payment wallet data. There's just a whole world of possibility Think about I don't know if anyone's doing this. I'll just clarify that at the outset But think about what someone could do if they had your call records. These are not taped phone conversations These are just the numbers that you've called in the duration that you've talked for Do you think it would be something that's illegal? any guesses Should be legal right right now in our books We have no law that would explicitly say that a data compromised in this phase illegal What it's covered under is section 43a of the IT Act a body corporate who is possessing dealing or handling any sensitive personal data on information And is negligent in implementing and maintaining reasonable security practices Resulting in wrongful loss or wrongful pain to any person then such body corporate may be held liable to pay damages to the person So affected a really important component of this is the wrongful law loss clause Just because a political party is accessed your data. It doesn't mean that some wrongful loss has occurred to you How do you prove that it's a wrongful loss? The other thing that actually does a reasonable job of protecting data in India is something called the information technology reasonable security practices and procedures and sensitive personal data or information rules 2011 But this only covers password financial data physical physiological and mental health conditions sexual orientation medical records and biometric information everything else if the data is compromised There is no guarantee that anyone will be punished for it or anything will happen at the end of it You might register an FIR that your data was leaked through a certain medium, but till now I don't know of any action that's been taken on such an area for so many other hard data leaks that have happened in the country No one's taken responsibility If you had people's called records, this is what you could do with it You could map out the entire network of people with this network. You could identify key influencers in society There are some people like they're called extra words I've heard who talked to a lot of people on the phone for really long durations They would be influencers in society you identify them you target them with specific messages And then they will start using those arguments in the daily conversations that they have with their friends and families That's how you propagate a message through a network. It's not just through social media It's through your friends and family that a certain message is being pedaled There is urgent action required to stop all of this so that The essence of democracy is maintained in India. We need data privacy laws someone selling your phone numbers and electricity bills And it's not even clear if it's illegal The other part of this is that we need data storage laws right now the requirements But there just aren't any requirements of it. It's not passwords and financial information It can be on an unencrypted database. It can be on an excel sheet You could just scrape it with incrementing a number So you need data protection laws and data storage laws which govern how a data is stored in an encrypted database You need spam restrictions on things like WhatsApp You need Facebook Twitter and WhatsApp all of them to come together to start flagging fake news all of this needs to happen together So WhatsApp actually made some promises to the government of India. They surprisingly responded the Ministry of Information and Technology wrote to them after a series of lynching incidents after fake news of child abductors spreading across the country their responses basically said that they are going to Prevent fake news from spreading and this is how they plan to do it New protection to prevent people from adding others back into the group which they have left Okay, great, but I don't think this happens. We never add people back to the group. It'll be counterproductive. They talk against you Administrators to decide who gets to send messages within individual groups. It's always going to be said to all people It won't matter a new label This is under testing now a new label in India that highlights when a message has been forwarded versus composed by the sender Have any of you ever gotten a message at which it says forwarded as received Do you think it makes any difference to the people who read it? Mostly doesn't the people that's being that are being targeted with these messages are people who are already primed to accept that information They already subconsciously believe it. They just need a document saying that they're correct about it New project to work with leading academic experts in India to learn more about the spread of misinformation Which will help inform additional improvements going forward. I have no idea what this means. Let's see Literacy workshops and advertising campaigns on how to stop fake news So the weird component about this is that what's up probably has the best platform for a literacy campaign It's what's up, but they're not using what's up. What they're doing is they're giving out newspaper advertisements for some reason Fact-checking accounts on what's up So it doesn't account for boom live and Hyderabad police where you just forward a piece of news and they tell you if it's fake or not This requires people actually questioning their beliefs and actually forwarding it themselves Which is again something will not happen if you already believe the news all of these are unlikely to address the problems What can actually work it will have to be a combination of technology and new laws in the country I will talk about technology because everyone out here has something to do with data Artificial intelligence that can categorize things as fake news would probably be the next logical step And it's probably going to be a lot more effective because education is a very slow process If we start educating people right now on what's fake and what's not it's probably like not going to end very well for like the next 10 elections For for this the real concern is privacy. What's up says that they're end-to-end encrypted What could happen is that maybe groups which have 25% more or more unknown numbers Or like say even 75% of the numbers don't have you in their address book and you've added them to a group Then that group maybe doesn't need to be encrypted because it's not people. You know, you're just adding random numbers to a group Thank you immediate action is required What I am here to do is to tell you about how your data can be misused and how weak the data protection laws are in this country the first part of this is being informed so that you can push for tougher legislation in the country and Actually talk about data security in a sense that actually translates into real action in and out. Thank you Hello Just make sure you restrict your questions to one because we have a lot of hands So and the volunteers, please make sure that you take back the mic once the question is posted. Please. Thanks. Hello Hello Hi Hi, so I just wanted to know you mentioned that electricity bills are able to you know target people really really well especially in states of UP and Bihar where Power theft is a very common problem. Does does it result in false positives or anything of that sort? Does it result in what sorry false positives as in where let's say somebody is Using up electricity for just hundred or 200 bucks, but he's very rich or something of that sort So that usually doesn't happen There will be outliers in this but the outliers don't matter the point is if you have like 70% accuracy or 80% accuracy You know feel like politics. It's more than enough Hello. Yeah. Hi. Hey, so as you said that you make cuts on cars and everything so what if Every action is a different one right so in some cases the people work against cars So and you work with the politicians. So how do you can you actually predict that in a new election? And how do you deal with the politicians with that when that happens? So some cars will never shift their loyalties It's known which side they're gonna vote on There are some cars which do shift loyalty and those are the cars that are targeted for something like this Yeah, but in a particular new election. Yeah, there are cases now and the Cars that voted throughout historically in the same. Oh, yeah strategy is made new for every election Strategies never recycled for every election. How can that be predicted for a new election? Like it is so it can be predicted This all of this gets combined with survey data and focus group discussions and stuff like that So it is not just tech. It's also a lot of on-ground activity You also have a party cadre that's working on the ground continuously who provides you feedback on which direction The voting sentiment is going in. I didn't have any question, but I was just thinking one way to counter this force might be to gamify Identifying fake news Like for example, if you can gamify that part where if a person can identify this a fake news and send it to an authority Then he or she gets some amount or something like that That's that can actually Reward this trend because because ultimately I think people always care about So that is actually a great suggestion, that's one small problem with this so alt news might have heard of it Alt news is a website that corrects fake news It releases corrections on a lot of fake news, but their reach is so insignificant compared to the fake news itself But how do you ensure that it reaches the same people there is a party mechanism that's arranged for 20,000 WhatsApp groups in the state of Karnataka Yeah, if you started rewarding people, but who's incentive is it to reward people? Hello, yeah, yeah, so we are talking about the elections, but We are not considering the corporates that are making huge profits using the data segmentation Everyday we browse many apps and use WhatsApp and Facebook and all the companies are using the data Can we also talk about corporate laws regarding this? So definitely a holistic data protection law is required This country needs to think about what companies and what anyone who has access to data can do with it We just don't have laws for anyone and politics is one space in which we see immediate results of something like this For what WhatsApp or Facebook is going to do with our data and how they're gonna monetize it I don't know yet. All they've done is like push advertisements towards us, which isn't so bad Hello, yeah, I just wanted to know your opinion. You said that we need laws I think everybody agrees that we need laws, but we also saw that it's against the interest of the politicians to make this law Yeah, so how do you think that this can go forward? So the way I see it that is eventually going to be a PIL on it And the court is going to have to order something the incidents of mob lynching and fake news propelling people to act violently In random parts of India is going to get so much that eventually the court is going to have to act on it The other part that's gonna have to do something about it is the election commission because their job is to keep elections sane and in there With this technology, they haven't really caught up. They don't know what to do with it They have no means of monitoring expenses on social media. So they will have to get into this field So you just mentioned that you have been working in Northeast for some time Yeah, could you just give your opinion on how you felt? Say data being utilized data into information or say fake news and all that you've been talking on is different How it is different in Northeast versus rest of India because we have always so how it's utilized is different for every state It's different for every election in a place like Northeast. You have a lot of tribes Tribes are actually much easier to identify than cast because they use the name of the tribe as their last name in a lot of Lot of parts of the Northeast. That's what happened in Tripura So it's very easy to segment people into different tribes What then happens is that you can identify influencers within those tribes and target them specifically because vote Transference is a lot more prominent in the Northeast. You go to basically a village headman And you tell him keep us up now community cover transfer cover. This is the deal would transfer This is something that's not so prevalent in the rest of India anymore But this data allows you to basically set the right cost for the right village headman Thank you Hi again, so I just wanted to know do you believe the invisible hand of the market can help as in if there is information Symmetry amongst the political parties if everybody is using the similar data Can this effectively act as an you know insignificant factor So the data is all public like none of this is proprietary data that only I have access to anything like that Most of it is on the election commission website the mobile numbers You can just buy off of the street any party can buy it the point is some people have started using it the others Just haven't realized how to go about it Some parties just have more money than the others Hey You said that access to call logs Is a pretty can be an influencing factor to you know promote campaigns So what do you think has has Islam day monetize data available illegally like in this space People already doing it. I have no idea. I haven't met anyone who has called records for people No one's tried to sell it to me yet. So someday it might happen. I'll let you know Okay, maybe like after this talk goes online someone will be like boss. I have this data