 Hi everyone, I'm Tiziano Piccardi, postdoc at Stanford University and I'm going to present you temporal rhythms of information consumption on Wikipedia, a work in collaboration with Martin Gerlach, a research scientist at the Wikimedia Foundation and Bob West, my PhD advisor at EPFL. We know that Wikipedia fulfills many different information needs. This is thanks to the large offer of content that the platform provides. For example, only in English, Wikipedia has more than 6 million articles. Given the importance of Wikipedia for the information ecosystem, many studies aim to characterize the information consumption on the website. However, it's important to take into account that the information needs are context-specific. Many factors can impact the information consumption. Some of them include, for example, the physical location of one person, like being at school or being at home, social demographic factors of one country, education level of the reader, personal preferences, the mental state. We know, for example, that we read the different content when we are bored. And time. In this work, we investigated this last feature, time, that in the case of readership on Wikipedia is an understudied dimension. Specifically for this presentation, I focus on one ourselves question, which factors influence the temporal rhythms of Wikipedia consumption. To understand and to answer this question, we investigate one month of anonymized server log from Wikipedia in English. The dataset includes the title of the page loaded, the country from where the request is originated, the device user, desktop or mobile, and the timestamp with the relative time zone. In this study, we convert the event time into local time using the time zone information. This transformation gives us, for the first time, a dataset time aligned that describes the interest by hour for each page, without the effect of the time zone. In total, we analyze more than three billion events across six million articles. With this data, we can investigate the daily rhythms. In this plot, you can see the normalized daily pattern affected by the circadian rhythm that, as expected, shows a low activity during the night, an increasing during the day, and a max activity in the evening. However, when we look at the pattern of individual articles, these patterns get more interesting. For example, in this plot, we see the daily pattern of one article about STEM with higher activity than average during the day, while an article about media, a movie in this case, with a reverse pattern peaking during the evening. To model this behavior and capture these differences, we define the concept of divergence, define the relation between the pattern of each article and the pattern of the global average. An article with a flat line as a divergence pattern means that it follows exactly the global average, and it doesn't have any specific attention fingerprint. Meanwhile, values greater or smaller than one represent more or less attention than average. For that specific hour, to understand what factors are associated with a different pattern, we investigate the three properties, the device used to access the page, the topic of the page represented as a vector of 38 dimension obtained from Ores, and the country of the request. We model the consumption then serious with a linear regression that predicts the expected fraction of attention for each hour of the day. To understand the relation with time, we included the hour and all the interaction terms between each factor and the 24 hours. In other words, given the properties of the page load, we predict the hourly deviation from the global average. And since we included the interaction term between each feature and the hour of the day, we can investigate these 24 interaction coefficients for each feature and represent them as a daily time series. So starting from the device feature, we can observe that access from desktop is more common during the day, with desktop being predictor of the working hour. Then investigating the topics of the articles, we found regularities that define what type of content we consume during the day. In this plot, we see on top the patterns of topics associated with STEM that tend to have a similar pattern with higher consumption than global average during the day. The bottom plot shows the average access time described as the circular mean from the time series of each plot. STEM topics tend to concentrate during the working hours, with an exception for space. A complementary picture emerges by investigating media, where most of them are accessed during the evening and night, with an exception for radio. Topics associated with culture show higher variability. It's interesting to notice that comics and internet culture are preferred during the night hours, and food and drinks follow the pattern of lunch and dinner. Similarly, to culture, topics associated with society show some variability, with business and education showing a clear daily pattern and military article preferred as night reads. Finally, investigating the country coefficients, we can notice similar consumption pattern, such as preference for morning reading, or a visible pattern that I liked, country where reader prefer to take a break from online consumption to have lunch. In summary, we learned that Wikipedia consumption follows daily rhythm, so device, country, and topic of the article are associated with different consumption patterns. Wikipedia can offer behavioral insights at global scale that are harder to obtain otherwise. Thank you for your attention, and I invite you to check out the paper to learn more.