 Alessandro, welcome. Thank you very much. Hi everybody. I am Alessandro, I am a data scientist in Idealista. And for those of you that do not know what Idealista is, Idealista is the largest real estate website portal in Southern Europe, in terms of content, but also in terms of website traffic. And we have presence in Spain, but also in Italy and Portugal. And basically what we do in our data crew, that is basically like a small group of data scientists, data engineering, or also platform engineers, but urbanists, also we have special data scientists. We all try to answer to several doubts and questions that all the users in the real estate markets can have, like not just in terms of finding a house. Like you may think about like, where do I want to live? Is there like a neighborhood that is close to what I would like to have? Or how much does it cost? Or does it value this house that I really like? But also in terms of the supply side. So all the people that they try to sell a house, first of all they think about which price should I set the listing, but also they would like to know more or less how long they should wait for selling their house. Just a parenthesis, we also work for those actors that actively try to invest in real estate. So big players, they typically ask questions like where should I invest, how much money should I put, in which typology of a residential firm, sorry, residential estates. And so we cover all these type of questions, and we try to produce some products that could help them in these type of situations. In this presentation what I will do, I will talk about one of these projects that we have worked so far in the few months ago. We started like six months ago I think about this, and what we want to talk about is how long does it take to sell my home. So we focus more on the supply side of the real estate market. And when we think about this type of question, we really think that it is a very tough question to think about. And I like always to talk about the story of the penthouse of Tommy Figer. So as you know probably Tommy Figer is a very famous stylish designer that is like a very popular in world and so on. And Tommy had a penthouse in the Hotel Plaza in New York. So you have to imagine like a Manhattan Fifth Avenue close to the Central Park. This is a super luxury hotel. You have a super luxury apartment. You can have a look also in Google about this apartment. It's super amazing. So everything is stylish. It's like the surface of the apartment is 520 square meters that if you think about is like huge. Like the bathroom alone has a surface that could be the same surface of my flat here in Madrid. And in a sense and everything is really super deluxe. There is also terrace that shows to the views of the Central Park. And what happened is that at a point 2015, Tommy wanted to sell this apartment and he listed it for 80 million dollars. It was not so lucky though because his time was passing by there were no real interested people to this apartment. And this has made him to think, okay, I'm going to drop the listing price. Particularly I dropped the prices several times. And for instance, like in 2017 he dropped the price to 50 million dollars. But again, there was no possibility to sell this flat. So eventually he made it to sell the flat in 2019 but for a price of 31 million dollars which is about 28 million euros at the exchange rate of the 2019. So in a sense, if you think about that, this is a bit of a tricky situation because from one side you have like a monetary cost. So starting from 80 million dollars you get down to 31 million dollars which is about 60% discount on the initial listing price which is a huge amount of money. But from the other side also there is an opportunity cost. So the time is costly and imagine what he could have done with that money if he was able for instance to sell the flat in three months. So in a sense there is a problem there and this is not a problem just for Tommy but in general for most of the people that try to sell houses and in particular also in Spain, in Italy, in Portugal but everywhere in the world. And when we think about this problem we really get to think that it is very hard to sell a house because think about that. So when you start, you want to sell a house you say, okay, what is the price of my house? So imagine that you live for instance in Madrid like I live in Puerto Al Sol. What I will do is to say, okay, let me see in Idealista which are the prices of similar flats in the same neighborhood. So I will take like, I go in Idealista I look at some flats, I say, okay, I have like five flats which are more or less similar to mine and the average is like, I don't know, 500,000 euros. And okay, that is something that for me is more or less fine to have like an idea of the price of my flat but think about the case of Tommy. So Tommy has a super deluxe penthouse which is located in the other plaza of New York which is mostly unique. It is really not possible to find through like the Zillow which is the equivalent of Idealista in the United States several listings of super deluxe penthouses in the very center of Manhattan. So in a sense, it is very an art task if there is no information about similar properties to get an idea of which is the value of the house and this is also a problem that we as a data scientist we know how to overcome with artificial, with machine learning, with regression models and suppose for instance that we arrive to get to a value for the price of the penthouse of Tommy. Now we have another problem. So we know that which is the value of the penthouse of Tommy but now we face a trade-off between the listing price and the marketing time. So in particular what happens here is the situation that from one side if you want to sell fast you may have to decide to sacrifice a bit of the earnings of your sale and you will have to reduce the listing price. From the other side instead if you wanted to earn a lot of money you will be able to say put the listing to a higher price but you will have to wait a lot of time until somebody is going to be able to buy your house. So in a sense there is this trade-off, always a trade-off is solved by an equilibrium point which is in the middle and this is what really we are interested in. We want to know how price and marketing time relate and how we can by adjusting price speed up the sale of a property. In a sense in order to help the people like Tommy our goal is to understand when a property gets sold so we want to predict a time to sale but more importantly we want to predict why a property gets sold. So which are the characteristics of the property or the area or the surface like also the advertisement where there are many photos in the advertisement that makes the property to be appealing to the public. And what we do is first of all we take our data so we have millions of advertisements between Spain, Italy and Portugal and we analyze every advertisement as if it is where a living being. So in a sense we think that when there is an advertisement that enters into the market is like the birth of the advertisement and then when eventually you are able to get to the sale and the advertisement is no longer used this is like the death. And in between the birth and the death you have the life of the advertisement which is generally called the time on the market. So what we care about is really this variable which is the time on the market. Data scientists we will try to predict this variable time on the market and it's our target variable and we will try also to understand which are the features of the property that are able to determine the time on the market of a flat. So in a sense what are we going to use in this type of question is survival analysis. So survival analysis is a branch of statistics that basically tries to model and predict a variable which is a time until an event occurs. So it is called the survival analysis because in general originally it was most employed by medics that were trying to understand what was the survival rate of some patients before they were dying. So the time before the death that was the variable of interest and this is why it's also called survival analysis. But we have to think more generally that this type of methods that are original of the medical field are generally used in other fields like for instance think about a sales department that wants to study whether a customer is going to turn or you may think about like a manufacturing plant that wants to study the life of their machines or also you may think about any other type of market in which there is a sort of a liquidity you cannot sell your product very easily, you have to wait some time like you may think about the market of a used car so you want to know how long you have to wait until somebody is going to buy your car. So in a sense what we are going to do is the same we are going to borrow this type of technique and we are going to apply to the case of the real estate markets. And in particular let's think about this chart that is shown in this slide. So in the x-axis we have the time, measure the years and the y-axis we have the percentage of individuals which survive over time. So we have here three curbs, one is for the humans, the second one is for the birds, the third one for the trees. As we generally know we know that we as humans we survive more than birds and birds survive more than trees or in general trees intended as plants. And in a sense we apply the same concept using advertisements. So think about like two flats, one that is fantastic, it's a low price, it's high quality and it's a low price is what we think like a bargain. That is going to be a sort of a plant because it's going to have a survival rate that is going to decay very fast to zero over time. On the contrary like a flat that is all dirty and the price is not even appealing will be sort of like the humans. So it's going to stay in the market for a long time. So in a sense what we do is to try to model for every flat that we have in our database in Hidalista a survival curve. So these curves are called survival curves and why they are important? Well, they are important because they deliver the result that we want. We wanted to talk about expected time on market and this is the type of result that this type of tools can deliver. So when we think about the flat or the earth like living beings we think about risk factors that may affect the life of these ads. So in particular these risk factors basically are some events or characteristics of the property that change the probability of selling the property itself. So for instance think about the case of a property that is old and needs reform. In a sense by intuition without using the data you may think that one person would not like to live there before or at least would like to live after doing a reform. So in a sense in general when we think about a flat that needs reform we may think that the characteristics of needing a reform is going to lower the sale probability and therefore is going to increase the time on market of that flat. Or instead think about an advertisement that features 3D virtual tour. So instead of just having like photos and some text you have also a 3D virtual tour that allows you to inspect every angle of the flat. So probably this feature of the ad will help to get the interest of the users which are trying to look for a house and they could increase the probability of selling the house. So in this case also this is a risk factor that in a sense increases the risk of getting out of the market of the ad with a sale. But you may think about several types of their risks. So think about the price. This is more or less what we were talking about with the case of the Tommy Penthouse. If the price is very high above the market in particular what is going to happen is that the people are going to choose the other types of flats that are cheaper and therefore having a price that is above the market is not a good signal for having a fast sale. This is just by intuition. Then we have to look at the data if this is the case but this is what we really think about when we talk with friends and so on. And no schools in a neighborhood also is something that could affect. They think that people like to live in carters or in districts in which there are schools so they can bring the children. And in that case, not having a school in the neighborhood is also something that may affect negatively the sale probability of the flat. Or think also to factors that may affect like all the European countries as a whole. Think about like the European Central Bank that lowers the interest rate. So when interest rate is lower, what happens is basically that the mortgage is cheaper. And therefore the demand for mortgages is going to be increasing and this really pushes up the demand for flats. In essence, you may think that when the European Central Bank lowers the interest rate that is going to increase the sale probability of the flat and therefore reduce the time on market. This is about the demand but also think about the supply if there are a lot of ads which are similar to the one that we are listing in the portal what is going to happen is that the probability that we can sell our flat may be reduced because people that are looking for a flat may look for other types of flats which are also posted. And finally of course this is the case also for the COVID. Imagine that in Spain when we got COVID in March 2021 with the lockdown the demand for flats dramatically fell and this has generated an increase in the time on market of all our flats. Then when the lockdown left off what happened is that people started to demand again the flats in particular they wanted flats with a terrace because they were saying like yes because if there is a second lockdown at least we are going to enjoy the sun and so on. So in a sense all these factors but many more that are not listed in this slide may affect the sale probability of a flat. In idealista we have several amounts of databases in which we can try to relate these risk factors with some proxies somewhat. We are not able to capture all of them but I think that we can do a good job in capturing the key ones and when we can't we try always to do the best proxy as possible. So just for let you have an idea of what is going on let's take the case of Madrid and let's compute like the median time on the market measured in days by groups. So we take like our sample of ads and we started doing some segmentation which is the typical thing that we will do by doing like an exploratory data analysis. And let's focus first like in the case of the advertising channel which is the top left chart. So in that chart we have that in the case of ads which are posted by individuals that could be like me or you the median time on the market would be more or less like 3-4 months like 105 days. And when we compare the same median time on the market with the case of ads which are posted by real estate agencies we see instead that the days there are 84 like almost 3 months. So in a sense we see a difference and we tend to say that the agency ads sell faster and this is the evidence that we see. But let's think about that. So can we conclude just from this chart that if now we put our listing through our real estate agency we are able to lower the time on market of our flat? Well, I don't know whether you can think about that but the answer is either yes or no. And I will do a spoiler so the answer is no. And the reason is that we cannot conclude that there is a causal effect of having a real estate agency putting a listing of our flat in the portal and lowering the time on the market because in a sense there are some confounding factors that are affecting the comparison. Particularly think about the following situation. Like real estate agencies they know their business and they know that they have to target a select group of properties that they want to market. For instance, suppose that they say that they want to market only properties which are in the city centre of Madrid because they know that more or less are those properties which have a higher demand. So in that case we are computing the median time on market in a specific subpopulation of all the assets that we are considering while instead in the case of individuals when we are selling our house we cannot choose which house to sell. We know that maybe we live in the centre or not and we just sell our house but it is a different population from the one of the professionals. So in particular what happens is that if we look for instance at the chart in the bottom right what is there is a segmentation according to the demand that we have in the area. So we have like our pool of ads and instead of doing the segmentation by advertising channel we just look at the segmentation by list per ad in the zone. In general more lists means that there are more users that are contacting the announcers and therefore there is more demand and more ads it means that there is more supply and therefore the ratio between lists and ad tells you which is the disequilibrium between supply and demand. So what happens there and if we look to that is that in the case of low demand the time on the market is about 100 days while it is 84 in the case of ads which are located in high demand. So in a sense what I was saying before about the agencies that target the flats which are located in the city centre like with high demand that could be actually the case so we have an indication that these two variables are correlated. So in a sense in order to give an idea about the situation we are comparing like two different types of announcements of advertisements and what we would like to do is to compare the same advertisement with the same characteristics like a surface, a number of bedrooms, a number of bathrooms whether as a terrace and so on and the same characteristics also of the area in terms like the demand, the supply, the number of restaurants and so on and once we compare two exactly identical flats we want just to move one of the characteristics that is the fact that one flat is being advertised by a real estate agency. So in that situation yes we could think about these numbers as causal effects and in order to do this kind of comparison we need a model and this is what we will see later but before of that just to say that this does not mean that doing an exploratory data analysis is not useful. So we have just to think about which are the limitations of this type of analysis and we have also to consider which are the advantages because in general in this case what we want to see is whether there is enough variation in the target variable that we want to study. For instance what we want to see is like that when we look at the typology the variation ranges between 84 days in the case of homes which are like multifamily units and goes up to 122 in case of country houses. So we wanted to understand at least where the causal effect could get but this by far are not considered like a causal effect. Other types of information that we are using is like the state of conservation. So we have like a second hand says faster especially if reforms are needed. So this again looks a bit strange because why reforms why a flat that needs reform should be more demanded than one that is like a new development. Well the idea is the same story of before with the real estate agencies. If you think that flats which needs reforms are those which are located in the old town of Madrid and at the same time we know that in the old time of Madrid we have a higher demand because people like to live in the center we get to this type of result. So in a sense we have always to take into account this type of results when we want also to discuss. What we really care is to think about the causal effects we want to understand when and why a property gets sold and therefore what we are going to do is use a model and the model that we employ is a coax proportionalizer model which is considered like the workhorse in survival analysis. The intuition of this model is more or less what I was saying all the time in the previous slides is that the sale probability of advertisement depends on the characteristics of the property but also of the area and the beauty of this model and this is also partly why this is like the workhorse in survival analysis is that the model is very simple. So if you look at the equation and you take logs of this equation you basically see that the probability of selling a house is linear in the various characteristics of the property and therefore it's easy to retrieve the betas and you know already which is the contribution of every characteristics. This is not what I'm saying is that this model maybe is less precise than other types of models because you could fit a non-linear model to this equation like a decision tree but we like simpler models because they tell us a story. So they tell us not only a prediction about when a property will get sold but also why. So we can also understand which are the driving factors behind. And what we do is to fit the model with idealistic data or residential announcements and in particular we use the most recent data like the last two years and we do also rolling regressions over time to avoid drifts and our target variable is the days of market and the features are property characteristics I have listed some of those like as an example like the price, the area, the typology, the state of conservation and so on and then we also have some zone characteristics and I've listed some like the demand like the average price for the same typology and so on. So the model is trained and it's going to estimate the betas that I was telling about and it's going to learn which are the key characteristics that make an advertisement being a potential for sale. One cool thing of this model is that it does not deliver a point but a function as an inference. So when we think about traditional supervised models like a regression model we always think that you give some inputs about the features and you get like a number like for instance we think about our machine learning models for evaluating properties. What we do is to include some information about a certain advertisement with properties, information about the area and so on and the model is going to deliver a point which is the evaluation say like 100,000 euros then we can also give some confidence bands but it still remains a point. In the case of a model of this type that we are using we have a probability distribution as the output of this model that for every potential time on market it tells you a probability and therefore this is very cool I think because in a sense you can play a lot with this type of distributions. In this case for instance take about this chart you have a say that you want to know which is the probability to sell exactly in 97 days and you get like a 1%. So this by itself is not so useful maybe but this is the starting point for constructing metrics that are much more interesting. So given this you think about for instance taking the median. So when we take the median of this distribution basically we are calculating what are called like the expected days on the market and as we know what is going to happen is that the distribution is going to be cut in half. Suppose that the median is about 120 days what is going to happen is that 50% of probability the property will sell in less than 120 days and with 50% probability the property will sell with more than 120 days. So if you think about this it looks like a bit coin toss. I give you an expected days on market and say 120 and with 50% probability it will sell in less than 4 months and with other 50% probability it will sell in more than 4 months. This maybe is not so useful especially for the business case and therefore what we think about is that we can do more. In particular what we find really very useful is not only showing the expected time on market but also sell probability as the probability that the probability will serve before an arbitrary horizon. Particularly we select an horizon of our choice say like in 90 days and we compute by integrating the area underline the curve all the probabilities to get up to 90 days. So in particular in this case we can see that there is a 16% probability that the probability will sell in less than 90 days. What we can do with this well given the model being so simple we can really compute counter-factual experiments very easily. In this case for instance what I'm showing is like for a flat of two bedrooms located in the city center of Madrid we can see how the probability of selling the flat or also renting the flat varies when we vary for instance some characteristics of the flat. In particular we may see that actually real estate agencies do sell faster than individuals and these are causal effects and the same of course for instance when we do a Renault when we do a Renault we see also that the probabilities increase. So in a sense this is like a not very super techy model but we think that it's a very nice building block to start thinking about more articulated models. So to conclude in the end if we think about maybe for Tommy was not so a big deal to wait in four years because in the end he's super rich he has a private jet but this is the case for many of our customers what we do in Idealista with this type of model is to daily assign an expected time on the market and say rent probabilities to every property which is listed in Idealista which approximately amounts to doing more than two millions of inferences every day and at the same time we also help our colleagues of Idealista data in reaching their bulk automatic evaluations for portfolios of big real estate players. So in that case not only we assign an evaluation for their properties but we also assign a probability to sell their properties in a given amount of time and finally also we help our colleagues of Idealista tools in improving their products for real estate agencies by providing metrics based on counter-fatal experiments and very cool that we are working on now is like you have an advertisement and you want to see how by adding some multimedia effects like a professional video or a 3D virtual tour or a 360 virtual tour that could improve the demand for that particular advertisement and this way it's going to increase the probability of selling that flat and there is a lot of things to do and we are not so many but we are very motivated to work on this type of fields so thank you very much Thank you so much Alessandro I like you are very motivated it's fascinating the subject the possibilities we have a few questions I'm going to put them all together in one Alessandro and you can answer whatever or however it goes because how developed is the model is it live could be used by customers as we speak so they can benefit of this what if any flaws to the model and have you already included the Covid data how Covid has changed because you said you include some factors but how many more factors can you include and have you taken into account the Covid it has changed the real estate market a lot so all this mixed together and in whichever order you may like I start from the end yes we thought about the Covid actually we have a very cool indicator which is our demand index that is exclusive of the Alista that is what I was saying before is computed using the list of the of the users that are trying to find a flat and what we see is that this index which actually correlates a lot with other types of indicators to track market conditions fell dramatically with the lockdown so with the 15th of March so in a sense we are not using an indicator of Covid but with this index we are using this as a proxy and just to say also that what we saw already before people were claiming that is that the demand was going up like in June regarding the flaws of the model so the model is very simple so in a sense the equation is linear in the log hazard rate which is the probability to sell the flat so in a sense we would like to explore like some non-linear frameworks like Decision 3 or Random Forest something of that kind in order to see how the model improves in precision it's true that we are going to be struggling then to do the counter-fatual experiments because we will lose the betas in a sense but it's of course something that we have to go there and regarding the counter-fatual experiments we are actually now working with the list of data and with the big real estate players that are asking us to do some sort of experiments like what happens if I reduce the price 5% how fast how fast is going to be the sale of that type of flat so in a sense what we get is the feedback directly from the clients so we are not doing experiments itself but we are getting the feedback of these players and in a sense they seem like fascinated about this they think that this is magic but this is not really magic this is just mathematics statistics they tell hard work Alessandro Alessandro Galesi from Idealista thank you so much for bringing us this survival analysis in housing markets which as he said it was at the beginning in origin from the medical sector but that can be applied to a sector totally different like the real estate so as he said there is a lot to be done so off you go Alessandro and we hope to see and to get back to you thank you so much thank you