 Če, da se ima. Če, da se ima, ešte, da se ima, me rečimo, da se vse sredne kažem, več, da se je pričočilo, mi se pričočilo, da je pričočilo. Zelo bomo pačo, da se nekaj tačimo, da je bilo pojavno, da je pravi, tudi na socialne netvore. Tudi vič je odvenila klavnje, vljut, datacenske vče, pri Sam Francisco. Proto, da se počuče biti o zelo kratiljšku. Sve, da so kratiljški, nekako ne zelo, da je to ovo. Kratiljšč je tudi socijalne občajstvo. Kaj je zelo kratiljški, ki je zelo kratiljška, vzelo kratiljški v zelo, kaj je zelo kratiljška. Vzelo kratiljški vzelo, da je tudi vzelo, that we adjust into our data pipeline. Then the data is kind of normalized and various features are extracted. And then we calculate cloud score as well as various topical sets. So for example, for a given user, we calculate what is this user interested in, what is this user expert in, and so on. We score about 750 million users with the cloud scores and assign topics to about 400 million users, so pretty large data set. On top of this, what we also do is when the user comes to cloud.com, we offer some content that user can share. And then on top of that, we can kind of suggest user what is the optimal content to share so he gets maximum engagement with his audience, as well as we offer kind of suggesting, hey, these are maybe the optimal times that you can post in order to maximize your engagement. So here is an example of how cloud.com looks, if you come in. So you can see your profile page, your cloud score. Then there is maybe a topic that you just selected or there is topics of your expertise. We show some experts and also kind of some data regarding how frequently the talk was talked about. And then if you decide to share some content, you can go, basically you get into the sharing mode and then among the things you can kind of go and select, hey, I want to actually not share now, I want to schedule it, and then you can figure out what is the optimal time to post it. And feel free also to interrupt if you have any questions during the talk. So what is the motivation behind this? So for a lot of power users, a lot of brands trying to reach their consumers or maybe just some kind of celebrities that try to engage with the fans, often time objective is when they post something, they want to get the maximum response possible. So basically by posting at the right time where we are trying to argue that actually you can increase your engagement. And that's what this research is about. And in the last couple of years, if you look at it, there has been like tons of infographic studies where basically they try to kind of recommend you, hey, what's the optimal time to schedule. And a lot of these studies did recognize that actually if you go like from network to network, different time, same time doesn't work. Some of the studies also recognize, hey, if you're in New York, different roles apply than if you're like in Europe. But a lot of these studies were quite anecdotal. And then look at the problem in a very systematic way. So here what we are trying to do is basically introduce the problem to academia, open data set, and also kind of get a little bit more systematic approach in solving the problem. So what are some of the challenges when tackling this problem? The graphic studies, a lot of them go and analyze data. Hey, for a given city, for a given network, you do some aggregate studies. And that's pretty doable, because once you have a lot of people in the city, you can get a lot of good aggregate stats on that level. But when you start talking about what are the optimal time for individual users, things get a little bit trickier. So let's say you're going to assume that your time is cyclical, the user behavior patterns repeat week to week, and that you want to maybe study your problem in 15-minute granularity. So even just like trying to constrain problem in such manner, you're going to have about 700 time buckets for a week. And then to start getting some significant statistics for each of these buckets, actually for individual users, it starts to be quite difficult. On the other hand, also there is no open data sets regarding this problem. And then oftentimes the challenge of the problem comes, because all of us are pretty unique. All of us have unique audiences. So if you live in New York, you probably don't have an audience that is only located in New York. In this example, for example, for a certain user, you can see that 43% of his users are in US. And even US is like a three different time zones in different cities. And then arrest of other part of audiences is just like a spread all around the world. So trying to tune in, to create optimal times when to post for this kind of complexity of audiences is not as easy task. And then on top of this, each network needs to be observed separately, because simply different behavior patterns and network dynamics are exhibited on different networks. So what is the kind of problem setting for a user on a social network? We want to find the best time to post a message in order to maximize the probability of receiving the audience reaction. Yeah. So the question is what taxonomy of the social networks use. So this study is done like on Twitter and Facebook separately. And we do a little bit of data analysis on other networks, but not to calculate the personalized schedule. Yeah. And so yeah. So this is the problem. And then like some of the constraints, some of the kind of limitations we put when approaching this problem, basically we consider replies, retweets, favors, likes and comments. We assume that users exhibit like weekly behavior patterns. We observe first 24 hours of reactions. And we'll show later that actually like once you post something, the probability of receiving reaction steeply decays with the time. So first 24 hours seem to be plenty of time. And we observe the problem with 15-minute resolution. And starting bucket is basically from the when the Monday starts. So first 15 minutes of the Monday. So overall the system overview. So when the user comes to cloud, he registers his social accounts, then like on the users we have, we collect the data, we fetch them from external networks using their social network APIs. We parse the data, we parse them in normalized form. So all of the networks, we have like a different JSONs or like a different data structures, but once they kind of get at the end of this system, they are all represented in same like a kind of proto buffer form where we basically capture who is like what is the post, what are the reactions and all kind of the metadata that goes with them. From the collected data then we derive a couple of building blocks. Some of the most basic building blocks is the application profile. So for a given user, when does he usually create content throughout the week. The second one is reaction profile. For a given user, when does he usually react and then post to reaction profile is basically it's kind of probability function of getting reaction as a function of time given that you created the post at time zero. And then the user input graph and user output graph is combined to derive like second degree schedule and first degree schedule. And then we kind of just prepare data and serve it on cloud.com. And our pipeline mainly runs in a hive. It mainly runs in a hive and then like if you need to do a lot of kind of transporting functions or like a machine learning in Java and then basically wrap them up in hive UDFs. So this way we can do still kind of complex things in Java through UDFs and then like how solve the problem where you have to kind of deal like with a lot of different data sources and do a lot of joins and basically maintain kind of data transformation complexity. So let's now get a little bit intuition about the audience behavior. So one of the most important things that characterizes kind of the social networks is a post reaction time. So here for example we can see like a first 24 hours of basically since the post was created and on the axis and on the y axis we can see like fraction of reactions received in the first 24 hours. So like when it hits 24 hours like it's one. And here we can see like that actually different networks exhibit different patterns. We can see that for a Twitter roughly like first 50% of all reactions is received within first 24 minutes. So basically it's very fast network and then for the Facebook to reach first 50% it takes about like a 4 times longer. And then other networks are kind of more similar to the Facebook basically. And why is the post reaction particularly useful in this study? Well it's useful because we use it to anticipate the reactions that user will receive if he posted this moment. So if you apply basically discrete convolution of if you post a time t you can get like what is anticipated amount of reaction you will receive throughout the time. So like post reaction in terms of some other kind of looking it through the lens of some other dimensions. So in this case what we did like we look at the post reaction profiles for different topics. So if you post let's say some content on the topics or politics like does that get like more reactions faster than food, drink or like a passion. And there seem to be like some difference like not as pronounced as in the case of where we are dealing with different networks but it's also something that it may be like our worth considering. And also we look at the post reaction profiles as a function of how many in degrees like user has. So for example how many followers certain user has and it looks like basically if user has about like 100 to 10,000 followers basically the reaction profiles look very similar. But then if the user is kind of getting in the cohort of 10 or like a kind of celebrity level where you have more than a million followers it seems like reaction profiles are quite different from average users. And this may be just because maybe the celebrities and guys with less followers are using social networks less and basically they are not reacting as much. So if you elaborate your primary objective it's the kind of broadcast more than react. And then if you are like a user, like just a casual user maybe you are just browsing or rarely even logged in in the social network. So another part of the audience behavior we do here is look at how the reaction behavior changes our data set for users from New York and we look at user reaction behavior across Twitter and Facebook. So basically here on the x-axis we have like a week and the working hours are kind of showing a little bit in the blueish and on the y-axis we have the reaction probability of the audience. So we can see that Twitter and Facebook do exhibit quite a different behavior. Twitter shows more prominent pics and while the Facebook usage is more steady throughout the week. So this is basically users that connect to cloud and then basically they are like a Facebook account. And we don't distinguish between if they use like a Facebook via mobile client or the web browser. But that's also a good point. Actually that's a great point because one thing that we can notice on the Twitter, these secondary pics which occur during the commuting hours. So I would imagine people are actually going on their mobile phones and checking out before they catch bars or something. And then also it seems use it just kind of drops out on the weekends in general. And on top of this we kind of do a little bit of analysis of correlation and similarity of audience behavior for the same users, for his audience on one network and the other network. So basically for example we can notice that correlation between audiences across two networks is always kind of they are positively correlated but correlation is pretty homogeneous from zero to one. So the similarity of the audience behavior peaks around 0.7. So this tells us that actually the audience behaviors are somewhat correlated and similar but it is actually if you want to post at the optimal time you maybe cannot reuse schedule that you calculated for Twitter to post it on Facebook. So you need to kind of be network specific. And then basically the way we do it like we calculate for a given user we can calculate his audience reaction behavior by aggregating all of his let's say followers all of his audience's reactions so we can look at when all of his audience reacts and that basically gives us like a vector which is mapping time bucket towards the value. And then for a given user that like the vector that we calculate for the Facebook and Twitter and perform correlation similarity on this and this is basically distribution of those correlations and similarity. Yeah, yeah. So for us we combine them like saying it's basically the weight is one like reaction is reaction but you're right maybe like different reactions if you're let's say brand and you want to engage with the customers and you care more about comments maybe you're actually looking to optimize on the time when people are more willing to comment than just before going to sleep clicking on likes. That's a good point. So basically let's look a little bit of audience behavior differences for different locations. So here we kind of pick couple of cities around the world. San Francisco, New York, London, Paris and Tokyo and look at the audience behavior differences for Twitter and Facebook. And we can notice that actually San Francisco and New York like U.S. cities do exhibit like a similar behavior like Tokyo is a little bit off where you can see that actually Facebook usage picks up off hours and also it's interesting to note that comparing in New York and San Francisco you can see that actually San Franciscans tend to wake up earlier and I try to kind of look for some validation of this and being before like did some studies where they actually kind of claim that yeah, San Franciscans do wake up about like 50 minutes before New Yorkers. So it looks like actually like for a given city like there is a lot of factors that one has to take in account and they are like basically connected to the working hours, to commute hours, to lifestyles and it's it would be very hard to kind of build a model that generalizes well across the cities, right? So once you're kind of doing this stuff you really need to kind of dig and kind of look at the stuff look at the schedules that are personalized not like a globalized per city and then so this is similar just looking kind of audience behavior similarity and the correlations within the same city like do users even within the same city exhibit like similar behavior patterns and even there we get that actually kind of similarities for example for New York we just got a little bit about about 80 about 0.8 for Twitter and 0.8 again on the Facebook and then correlations within the cities they are highly correlated like the curves are kind of very shifted towards the one but still depending for the audience there are some guys which are less correlated and within the same city audience behaviors are quite different and then we kind of just for the fun look at the audience behavior in respect of the topics, right? So the idea was like maybe some topics actually get many more reactions in some different times but for topics like some slight difference can be noticed like if you want to really dig deeper into it behavior seem like a pretty similar so topicality maybe kind of important but maybe not in the first pass where you're kind of just trying to bootstrap the problem and now detail is actually so in this graph oh here are actually it's a great point so basically these lines are offset where this is like a line 0 because otherwise all of them kind of start to overlap so it will be very hard to see so this is basically time in respect to the user so if you are in New York and it's midnight basically this is midnight for San Francisco, midnight for New York midnight for Tokyo at the time zone time so it's basically like midnight of the user and now actually detail is going to talk a little bit about the personalized schedule hello everyone I'm Aditya like Nemanja said that most of these personalized schedules most of these schedules need to be more on a personalized basis otherwise you're going to get this kind of you can get an aggregate sense of what are the best times to post for a city as a whole but since each individual within the city is unique the personalized schedule has to be unique per individual as well so in some sense you can almost think of it that the audience profile for a given user is kind of like a fingerprint in that sense that each audience profile for a user is unique which means that the schedule for that user is also going to be very unique so I'm just going to quickly walk through some of the simplified model of how we think about this problem so the way we do it is that we consider a very simple social network graph in this case so let's say we have an author A0 and we have some audience member B0B1 up to BM for that author and for each of these audience members you then have he's connected to multiple authors such as A0A1AN and of course there may be an overlap between these guys and these guys but for simplicity's sake we can just consider them to be separate for now so the way we think of this problem is that an author creates a post and an audience member creates a reaction to the post and we are interested in certain questions that we want to answer the first one is when do the authors actually create posts and since we want to calculate the probability we are now looking at like the historical data that we have so we are looking at in the past what is the time when all the authors created posts given that a certain author created a post then when does this audience member react to this to this post and the third question that we have to answer is given that this user is also connected to a lot of other users what is the probability that he reacts to the post created by A0 so this is kind of a very simplified version of like the problem but it allows you to kind of just build a basic model that you can build upon later so just to get some notation out of the way so there are various kind of actions that we consider for a user so first we want to measure what is the aggregated number of posts created by a user in each time bucket so if you are talking about 15 minute time buckets we want to see what are the kinds of what is the profile of how we created the post there at the same time we want to see what are the posts that are visible to a user as an audience member and given that there are certain number of visible posts you know how to see when do you actually react to those visible posts and if you did react to the visible post when was the post created so in some sense every reaction has a delay and that was the inherent delay that we saw in the previous slides where in twitter it is smaller compared to Facebook so this delayed self reaction is computed as convolution of that post to reaction delay function with the original reaction profile and then given all of these we can then kind of estimate that what is the possible reactions that a user may get in a given time bucket by observing all his audience behavior so the final post schedule then we derive using those estimates as a probability of receiving the reaction on post so this will go further basically the first step that we do is we have to take the simplest version where we only consider a0 and b0 and we forget all the other posts that are visible to b0 over here so we just say that let's say this guy was the author and there are a bunch of people who are connected to him what is the probability that those guys would actually react to something that he is posting and then we react to the post so if we just make this just calculate the sum of the delayed reaction profile for each of this user with respect to a0 we get an estimate of what kind of reactions he would get if he posted in a certain time bucket so this is pretty simplistic so obviously it's not going to capture a lot of information but it's a good kind of starting point to understand further into it we now have this audience member b0 who is trying to react to the initial post has a lot of other visible posts that are incoming to him so in some sense all of these guys are trying to grab this person's attention and so we somehow want to model that as well so the way we do that is basically we compute the visible posts that are visible so we do that particular audience member b0 and this could basically just be modeled as a linear function of all the posts created by so if you go back to this slide so if all of these guys are creating posts then the visible posts can be modeled as a linear function of that because each social network would have some sort of an algorithm of how they show the posts to a given user so if that is the case in this case would this be the total reactions that he gets divided by this possible visible post so in some sense this is the attention model that we are talking about so that is again it captures basically the simple version of the graph but in reality what is going to happen is that you are not every audience member interacts with the original user in the same way so you are going to have friends like these always going to comment on everything that you post or always going to like your posts and then you are going to have the other kind of group of your audience who may once in a while throw a like at you but not really care about most of the time what you are trying to post so the modification that we can do to the original kind of estimates is that we can weight each audience member's reactions by his past engagement with the original user so basically that maybe there is your best friend always 90% of the time whenever you post something is going to have reaction to it then you are going to weight his reactions with that kind of of the point 9 whereas if somebody else who has only reacted to your post like maybe once in the last month is not going to get a very high weight when you are trying to estimate the reactions that you are going to get basically this purple function here is the post to reaction delay function so what it says is that so we are looking at a particular user on twitter and it says that within certain time this is almost died to zero so it is only like this kind of small window that you are actually seeing that reactions appear to your posts so if you do the convolution so here in your aggregated reactions that you see from your past so this is the aggregated reactions the green dash line and then you can calculate your first degree weighted schedule that will look something like this and your second degree which has a slightly different shape here and then you can this is of course the subset of maybe a given day and then you can have this same behavior observed over the week and evaluate these kinds of different schedule so the way we did that is we took 56 days of data that was not seen before so we computed these estimates in probabilities and we wanted to evaluate it on the unseen data so we had around half a million active users in that time window and we compared it to two baselines where the most frequently used observed times where people posted the most often was used as one baseline to see how effective it was compared to these schedule and the other baseline was if you just looked at just plain simple first degree schedules and you aggregated that across all users and said that this is a global schedule that everybody follows how does that compare in most of these baselines usually and the metric that we used was to see how much gain in reactions a user would get if he used some of these profiles that we created or if they just versus if they just posted randomly against these baselines so here we just see that most of these baselines are just slightly above on Facebook just slightly above one whereas the best performing schedule is actually the first degree weighted schedule which leads to almost a 17% increase in the reaction gain followed by the second degree schedule which models some of the second degree behavior leads to around 9%. On Twitter it's interesting because Twitter you don't actually see a very large difference as compared to Facebook so on Twitter the maximum gain observed was around 4% and that is mainly again for the first degree weighted schedule and for the second degree weighted schedule actually does better here compared to the plain second degree schedule so this is basically the evaluation tells us that in these models that we considered the first degree weighted schedule actually performed the best for among our users Any questions about that? So that part is not part of this study as such but I guess that is something that could be analyzed if you probably looked at how the number of followers would grow over time and what events occurred where user got a lot of followers so maybe it was a funny cat video that he posted or maybe it was a photograph of him in some celebrity so I guess a lot of it probably depends on content but it's something that we haven't actually studied so I'm just guessing at this point Right So in this service you could actually schedule your post at a given time so you could say that I want to post even though I'm at this point right now I'm just going to post it later So Nimania is going to talk a little more about the open data set for this study So in this case that was not a factor that we considered because the graph that we are looking at is the reaction graph so you may have maybe a thousand followers but out of a thousand followers maybe there are only 50 who are actually reacting to the content that you posted so it is these reactions that we are creating and creating an interaction graph so you're not actually observing your followers as such you're observing the people who are actually reacting to something that you create Yeah, it's normal across users and followers probably just going to cloud.com and figure out what's the right content to share because usually it bubbles up organically So basically one of the main contributions of this work is actually opening the data set so what we did we opened the data set of post reaction time stamps this includes a total of 144 million posts 120 million roughly from Twitter 25 million from Facebook and this includes about 1.1 billion reactions So a pretty large data set hopefully Dimiya can use to kind of replicate and improve some of these studies and the data set basically includes user IDs fingerprint actor ID fingerprinted post ID fingerprint so basically if it was message ID just like a fingerprinted and then post time stamp action time stamp and then user time zone and if anybody is interested using data set can go like basically to cloud to give cloud open data download it try to use it if you think like we can add something more in unanimized fashion we can work with you and kind of expose even more if possible and yeah the time stamps are a little bit perturbed so the user couldn't kind of use the time stamps to decode and de-anonymize the users but yeah give it a go and let us know if we can help somehow so some of the future work regarding like this problem like one thing that we did in this study we kind of generalize the post reaction filter function for a given network but we saw like the different depending on different factors like topics or number of audience in degrees like there may be some differences so kind of getting like more sophisticated personalized post reaction filter function would be one of the areas then getting more sophisticated second degree model would be another thing that we would like to look more deeper into because in the case of the twitter actually second degree was like the second best model but so there is a little bit more room for improvement there but basically just trying to model appropriately how does user get saturated with all the incoming posts and then the another one is topical awareness well like when you do your post does help and we saw like it can help you gain like maybe 20% so it makes content viral is actually like what you post you know if the content is not really good and you post it at the best time to post well you still won't get any reaction so that's one thing that you want to analyze as well and then content analysis also like depending like if you post like just a message text or video maybe there be like different optimal times to post and in addition there are like a more signals more networks and then in conclusion we saw like that different reactions that different post reaction times are different on the different networks we saw like that on twitter the reactions are obtained like 4 times faster than on other networks we saw that audience behavior varies significantly across the networks as for different locations user exhibit different behavior patterns and then we saw that also compared to the baseline using personalized schedules we can get additional gain of maybe 17% on facebook or 4% gain of the reactions on twitter if you post things at the optimal time and in addition we opened the data set and hope like there is going to be some use of it if you like this talk please go to Archivex you can download the full paper which was presented this year's KDD so