 So, I have got a very difficult slot and I call it a difficult slot because it is after lunch and before tea. So, I am hoping that I am able to grab your attention. It's a very interesting topic, something that affects all of us whether we are part of the data science society or not. And my topic today is social network analytics to enhance marketing outcomes specific to the telecom sector. I would like to begin with a brief introduction about my background. I am Kavita and I have around 12 years of experience in analytics consulting industry. Interestingly, my first 10 years was with financial industry and banking industry and last three years have been with the telecom. I was working with Fair Isaac, Experian, Accenture and last company that I worked for was Vodafone. So, that is a brief profile of mine and I will begin this session with asking a very small question to all of you. How many of you are familiar with social network analytics and what is the difference between social media analytics and social network analytics? So, anyone who wants to quickly, maybe two people, anyone who wants to tell me the difference between social media analytics and social network analytics, any other guesses? The question is because there is a kind of confusion in the two areas. Social media analytics and social network analytics are very different and the reason why I feel more elated doing social network analytics is because it is pure max. There is no probability factor involved in social network analytics. It is very close to network graph, if I can put it that way. Of course, it uses data from social media, but that is only one dimension into my entire social network analytics graph that we build and that is why it is very important to make the difference between two because when you talk of social media analytics, it comes through scraping data, dealing with unstructured data, dealing with text comments, whereas social network analytics is more about looking at various parameters whom you belong to and as we go through the slides, it will be very clear what I am talking about. When I ask, say one of you, who are you? No one just talks about their name, right? You associate yourself either to the place you are in, to the company you belong to, to the country you have come from, to the organization you are part of. So all of us knowingly and unknowingly are part of a larger social network and that is our identity. Today, I am an individual not just by the virtue of being an individual, but more so because I belong to a particular place where I was born, a particular city where I studied, my colleges where I have done my masters or whatever and then the companies that I have worked for and of course the most important element are family, our neighborhood who form the lifeline of our existence. So we are very, very closely associated with this either in real time or in the virtual world. All of us have friends who come to our home to have a cup of tea and we have friends on social media, on Facebook and Twitter whom we are connected in a digital space. All of these people who are connected to you drive your purchase behavior. So for example, the kind of movie I watch over a weekend, the game that next I would follow, say in a FIFA World Cup, is all driven by the conversations which we have around us and that is the idea of social network analytics. That we are driven by what we hear others doing, it is very common, right, all of you would believe that. A recent study said that in spite of all the advertising going on in print media, in TV media, in Facebooks, in Twitter, in Instagrams, actually the real thing that sells your product is word of mouth and that is where the power of social network analytics lies. I will just take two minutes of diverse, diversions from my domain which is telecom and I think this picture is a very familiar, a very tragic incident that happened long ago. Now what I want you to focus on is the diagram in the centre and this is the social network analytics that worked during the 20th century 11th attack and if you look at this network, it is a very interesting network. If you see the lower part of the network, it is a large network and then there are smaller networks with ring leaders in each. So if you see this centre dot is a ring leader, that centre dot is a ring leader of that group and here you have this guy called Muhammad Atta. He is actually the ring leader of this entire operation. If you very closely look, Muhammad Atta is connected to everyone in this network. So he has an information about what is happening in the network from all the different sources. It is very much possible that the ring leader of the second network or the third network has no information as to what is happening in the entire network. So in terms of coming back to our main topic, social network analytics, Muhammad Atta in this particular network has the highest degree of centrality, betweenness and closeness. So he can spread any information the fastest. He can get all the updates in the network the fastest and he will be the person who will be updated the first if there is some problem in any part of the network and being a terrorist operation that it was, it was very important that the information assimilates very fast. Another very interesting point in this graph is, if you look at the dots, there are different four or five colour dots that you see. These were the different flights that was used during that operation. So our flights in this case are the different dimensions, when we build a social network analytics, apart from all these people who I call as actors in my social network analytics graph, the flights are one of the dimensions. The place from where they operate is another dimension. The third dimension could be that kind of flight they were using, right, the type where they refuelled their flight because there was cases of refuelling. The last point that I'll try to highlight here is, if you look at the correlation coefficient, clustering coefficient of this network, it's very low. Can anyone quickly guess what could be the reason, why is the clustering coefficient of this network, it's written there, but maybe because the people who organise this entire thing did not want the network to break if one small part of the network breaks. So if one small part of the network breaks, maybe the only person who would know about it is the ringleader, the others would still continue their operations as it is. The reason why I bought this is, it's very strange to know that something as big as this had some connection to the way we are connected and this graph was built by using all the flight details, the media coverage of this particular incident, all the actors were identified and of course this graph was drawn post the incident. I'll quickly give a two-slider introduction to social network analytics. I'm sure there are people in audience who might not be very familiar with some of the concepts. A very small example I have taken, this is a small social network analytics that we are talking about. The two important things that you need to know here is the nodes. So Gaurav, Rajiv, Rahul, Seema are all nodes in my network and the blue lines and the red lines are my edges. The lines connect my nodes. Now Rahul is the person who is connected to everyone in this network. If you look closely, either directly or indirectly and Rahul has an indirect connection to Kapil and Preeti and that's why those lines are red. So this is a small example. I think two things you can take away from this slide is what is nodes and what is edges. The five important dimensions. So when you run a social network analytics in any of the languages you use, these are the five statistics that you should be aware about, degree centrality. So the number of people you are directly connected with an edge is your degree centrality. The betweenness centrality is basically the number of paths that pass through you. So you are crisscrossed by how many edges that is your betweenness. The closeness is your proximity in the network. So for example in our last diagram, Muhammad Atta was most closely connected to everyone. In this particular diagram it's Ritima who is the most closely connected to everyone because we are also taking her connection to Rahul and the network. Eigenvector. So that is an important concept. Of course in a 20 minute session I don't want to get too much into detail of eigenvectors but the whole network analytics framework is built on eigenmatrix and eigenvectors and clustering coefficient as I explained. If your network is very sparse, the clustering coefficient will be low and if it is a very closely connected network. So for example, Alimony of IITs, they might be a very closely connected group. Their clustering coefficient would be very high. So just a two slider I thought it will help people who wouldn't be very familiar. So degree and centrality of Rahul is 7 and the maximum betweenness is of Ritima. Quickly moving to the case study that I bring in here, social network analytics. Now we always used to talk about 2 years, 3 years back in time about customer 360 degree and trust me there are organization who are still struggling to get their customer 360 right. What I am going to talk about bringing the social network analytics aspect and getting your customer 720 right because now your customer is not a person who operates in isolation. He is affected by all the media advertisement, all the Facebook advertisement, what his friends tell him, what his neighbors tell him. To an extent that what today I buy something which my daughter would come and tell me is very good on the technology side. I take opinion of my younger team members if I have to buy today a phone because they are more aware about the newer technologies than maybe I am. So we are no more living in a world where it is a customer 360. We have moved to a customer 320, we are part of a larger network and what is important although it is written in very small text here is taking permission of an individual before you use his social network data whether it is a LinkedIn, whether it is a Facebook. You are supposed to have right permissions from the individual to use that data otherwise you are doing it illegally. I know that there are people who do it on a augmented level. So you create segments of customer and then you analyze your behaviors but individually if you want to have the network profiling of people you need to have their permissions. For example, telecom companies have their own apps right we might think that this app is just doing what it is supposed to do. Tell us the data usage that our bills etc. No, when you install that app and try it next time they take permissions for accessing your data. So they are actually aware of what site you are going to, what purchases you are making and that's the way they tailor their marketing. Yeah, you do that right? But actually they trace your online behavior and that's how they come up with their marketing. So for example, a student will not be targeted with the same product which a VP of a company will be targeted right. Similarly FIFA World Cup is going on there will be a lot of promotions that restaurants will give for coming and watching their matches in the, I mean in last screen right. So that will be more tailored to people who would be more active on FIFA groups or commenting a lot about FIFA. So I'll come to the strategic part of the problem which is how do I increase my effectiveness of my marketing campaigns and to increase the effectiveness of my marketing campaigns there are two things I need to do. Either my mission would be to spread it to maximum number of people right. So today I sell a product to say, Kavita, how many more people can she spread her word and to do that you should target someone who has a high degree centrality in your network. So you have built the network of all your customers, choose a customer who has the highest degree centrality if you want your information to be spread to as many people as possible. There might be a second need right, there are products which are like for one day to day right. There is a final coming up tomorrow and I want people to take up my offers. So you want the information to assimilate very fast. For that you should use people in the network who have the highest closeness value. So that's just two pointers as to how you can use the outcome of your social network analytics graph. Second part which is again very important is how do you improve. I can even build a social network model using just my CDR data right. So I am just bringing in the elements which I will use to bring my social network analytics in a telecom environment. So I just have the CDR data called data record. I can even use that to build a social network graph but what adds more value to it is bringing in the dimensions, different dimensions which can be geography. So which city you are in, which neighborhood do you stay, which pin code are you using. I can also get a lot of input added to my social network analytics graph or make it more powerful by adding the organization level information right. Because generally you spend around 10 hours in office. So that's the people you talk more. By the time we reach home we are almost dead tired and we hardly talk to people in our neighborhoods right. So that's where you spread your word more. So you can do at the organization dimension very critical in your social network graph. Past and present companies I may be more connected to some of my ex-colleagues than my colleagues today because I have worked in that organization for 10 years and this organization may be very new and then of course all your social media tracing. So what you have done on Facebook, Twitter, LinkedIn, Instagram apart from that what you purchase. So where are you going to? Are you more of a national traveler or you are a foreign traveler? Did you go to make my trip to book a hotel or did you use make my trip to just book a flight? All this information is what can make your social network graph more powerful. Very quickly I know I have touched upon some of these but what is the output of a social network analytics graphs looks like and how it can be used to make strategic marketing decisions. So all the notes. So for example let's take a very quick example thousand customers that you have. You will have the data of all those thousand customers the node values one to thousand and then you will have the influence score attached to each node. So ideally if you want to make your social media social targeting sorry not social targeting your campaigns more effective then you should trans target people who have the high influence score in a network. Similarly as you saw in the example of 2611 social network analytics there might be very small networks. So you might have a large network of your customers but there will be say small networks just take an example city-wise focused right. So maybe you should pick the most close node in each of those city-based network and target those. Similarly you can use organization may two or three key people key influences to target the customer. So that's the way you would choose your right set to target your products. Of course the strength of each network the closeness quotient as I have already talked about geography wise key indicators and clustering coefficient as I said if a network is not very powerful there are there is an IM network or an IIT network existing but they don't interact very frequently either on social media or even through calls then they may not be the best people to kind of target through social network analytics. I think I will focus on the bottom part of the slide. Do you know your customer? I think this is a problem which every domain faces today. It was focused on understanding the customer journey, how many transactions you make, how frequently you make, who are the people in customer care you call what are the channels you use but now it has moved to a place where you want to understand the psychology of the customer. It is no more just his transaction behavior with you. You want to understand his psychology and especially in telecom sector I'll say at least I understand the India market pretty well and I have seen due to my stint in Vodafone I have seen some of the global markets. It's a very saturated market be it in B2C space or B2B space. So what you need to do is actually steal customers from others and when you are stealing customers that's where it lies right. You have to understand what your customer actually needs before he actually realizes what he is looking at and that's where your alerts and messages work wonders. Just to give you some numbers as an example. So the cross selling and upselling at least in the case study that we did, we saw a 2.5 times jump in the campaign results. And what I have read through research papers etc. That if you build more and more data into your social network, it can be as good as four to six times better in your. Also for specific products as I told right there is a FIFA World Cup going on and you want to target products specific to it or there is some kind of examination coming up for students and you want to target products or some data packs specifically for them specific product marketing for specific segments. Look at your social network, look at the segment or cluster which is focused on students and please target there. Don't target the entire social network anyways. Now just getting on to the moral bit of the technical side of it. So for our case study that we were working on, we started with NodeXL and anyone who is either working in SNA, I would say that to do a small pilot and to show the effectiveness of the pilot to your senior leaders in the organization. I think this is one of the best tool because you don't invest in infrastructure. You just get it more or less free of cost and you can build a sample data and show how your SNA can be a powerful medium. Because just to make you realize how big the question can be, even analyzing CDR data of thousand customers can be a huge task. You can't do it on an infrastructure that we have in organizations. Because it's all cross calling data, right? It will be all mapped to each other. So you would need good infrastructure to build it and I'll give a snapshot of what I'm talking about. But to do a POC or pilot, NodeXL is good. You can make your case study at least on that. And of course, you have such social network analytics who were one of the pioneers in this space. Recently, I think Python and R have picked up very well. So for one of our projects, we used R Studio. Python is something I'm also still exploring to be very frank. But the network X package in Python is very, very powerful when it comes to social network analytics. There are a lot of other tools in the market who are more kind of aligned. I tried to bring in the analytics tool who tied back with SNA because you might have built your models and you want to add your social network analytics framework to it. So that's why I have bought these. And just two minutes stop on this. Any telecom analytics model that you want to run using SNA cannot do without a big data stake. So you would need a Hive, HBS, Spark infrastructure to build it. The reason why I'm telling you is all of this is meaningful only in real time. I can't have recommendations two days later. So if you want to go at an implementation with your social network analytics models, then big data is the answer. I think just to conclude my session, I think what I tried to bring in this 20 minute session is what is social network analytics for people who would not be totally familiar with it. I talked about what are the advantages of using social network analytics today in the market that we are existing in, times we are existing in. And third, I tried to discuss in detail one of the telecom case studies. Thanks a lot. I think I had a brilliant audience and a very attentive audience. I really enjoyed giving the session. I'm sure you would have also enjoyed your time. Any questions, I am open to take it. Yeah, yeah, so the influence score is actually a mathematical equation which is calculated using the closeness, betweenness, all these numbers. So they're all combined and it's a equation which finally gives you the influence score. So I'll just give you an example. Someone who has a high betweenness and a high closeness would generally be on the top of your charts for an influence score. Please ask for the mic. Okay, let him. Hi, so I have a question. In terms of data, the data you have to calculate these scores and have this network ready. Do you guys, at an ethical level, what kind of dilemmas that you face, that what kind of data you can use and what you can't use? Because for me, when I go online, I am very, I would say paranoid about the order data I'm sharing. Very good question, and I think it's good to be paranoid with the GDPR guidelines coming in and the current regulations. Indian government is even going a step further and coming with its own regulations. So I would be a bit cautious in answering this because I have been part of organizations, but having said that, I think the best bet is to take permissions. If you don't have the right permissions, don't do it. The other way of doing it is you always have segments identified. Most of us who are working in analytics industry would already know our segments, right? You would have already divided into customer base into segments. So even if you have access, which most of the companies have, to transaction level data, please aggregate it as a segment level and use that as a targeting tool rather than doing it at individual level. Because it can be very risky. Trust me, at least in companies that I have worked with, they have a fine which can be bigger than the revenue of the entire country that that company is generating. So please be very cautious. So first and foremost, the influence score is not an individual score. So what you are telling me is two nodes of a large network. So the score of all these two individuals will depend on how well they are connected to the network. So if you ask me very on a very tangential basis when I hear your question, maybe I'll give more weightage to the person who is using Vodafone products because he is a real ambassador of my products, right? But it's so social network analytics also depends a lot on your frequency, right? I may be calling your number one time in a month. I may be calling someone else's number twice in a day. That itself makes a lot of impact. So when you ask me a question, how do you decide the weight? It also depends on frequency of calls, duration of calls, and all other networks that I'm part of. So am I connected to that person on Facebook or it is just an organization level connect, right? Is it a LinkedIn connect to whom I never talk on phone? All this will come into play. Yeah, so for different problem it will be different. So I cannot give you a number currently as to how would you give the weightage. But all of these dimensions, so when I bought those five key dimensions, these will go as an input of a function. So it will be F of all of these to give you the influence code. We will take it offline, okay? Yeah, so I have a question on implementation of it in a spark. So since I am using a spark for quite some time, we have a library called Graphics for graphical processing. So can you give me some insight how to use SNA on a spark? Okay, first caveat, I'm not a technology person. I'm more of a data science analytics person. Having said that, I can get back to you with this answer on which specific library is to use because the project that I worked on, there was a technical hand who helped me with that and they would know the exact answer. So I wouldn't be the right person. Thank you. Hi, I'm Shorit. So is there any analytics tool like you showed in R Shiny or like Python? Yeah, Python, so which can work on very large graph database, like say Janus graph where like you have millions of nodes. So it's not the tool that is a problem. It's the infrastructure that you have. So forget that you can run a SNA on a laptop for more than 100,000K people. At max, I have tried and I have been successfully able to run on 100K and that also if you try to analyze the entire CDR data, it will be very difficult. You have to pick a sample and then do it. So you do need a big data environment to run some of these models. Yeah, but for that, are there any like already built tool for data analytics? So as I'm saying, tool is not the constraint. R can do it, Python can do it. Okay, okay. So it's the infrastructure that comes as up. Got it, yeah. Thank you. Okay, I think. Yeah, so in case of social media, you have to generally buy data from all these. So if you have to get Facebook data or LinkedIn data, there are either third parties or these companies who directly deal with selling of data. But last six to one year when all of these issues have happened about data leakage, I think they have stopped selling data as the form that they used to earlier. Earlier I know they are with third parties which used to sell data. You can, still you can, but it has to be through verified sources. So for example, telecom operators, they take permission of the individuals and then use it. So that is a way to do it. Last question, because we have a tea break. They would be, so they have to take permission. There is no other way out. So my data is mine, right? So if someone has to run a model where I am one of the rows in their data, ideally they should take my permission. That's my legitimate answer. It was a great session by Kavita and she has the highest degree of centrality of knowledge in the area of social network. We should give her a great big hand for this fantastic session. Thank you very much. Thank you.