 Good evening, everyone. Can you guys hear me in the back? I'll take that as a yes No, I Think this is probably the best I can do here. So What is the talk about revolutionizing travel through data In this session, I think what I'm trying to cover is how we actually use data within the travel industry what kind of decisions we make based on that and in turn it actually tells us what to build and what not to build right and The way I'm trying to tell the story here is so we at orbit started using big data in 2009 so we went through multiple iterations of Technology platform and with each platform came a lot of use cases that we actually solved and In the process we tried to leverage a lot of the analytics concept here So by the way, I'm Raghu Kashyap My two kids were pretty excited to see that they could actually find a child beer in India so I I've been with orbits a little over 10 years now. I'm from a tech background so I help support the technology platform for orbits worldwide and Also the application development side of the world. So last two years. We've been in orbits India So we've been Working on both the data platforms and the application development So by the rays of hands, who's heard about orbits? All right, quite a few of them good. So we are not the gum company. Absolutely not We are not the drink company. Although we bought the naming rights from them So that's an interesting story there and lastly, this is very specific to Bangalore There are actually another company called orbits in cunning ambrose. So it was quite interesting recently Somebody actually ended up in cunning ambrose instead of mg road. So that's not us So what we do we are an online travel agency, right? We sell air car hotel packages insurance travel extras attraction services. You name it, right? And we have multiple brands throughout the world. We are actually one of the top three travel agency in the world So you have Expedia price line and orbits and orbits is predominantly in North America But we are we have a presence in Europe and Asia Pacific We actually have a lot of other Business units that we work with not just the travel leisure sites, right? So what have we been doing in the industry What have we been contributing back to the community? I think at orbits We use a lot of open source and we actually give it back to the community so obviously some of the authors of these books work at orbits and We also presented a lot of the conferences in US and Europe Europe so the interesting thing is there are two things that we actually open sourced which is more of a data platform One is called graphite the other one is called Irma Irma stands for extremely reusable monitoring API So it's mostly you use for your monitoring applications and the visualization that comes through the graphite tool So why is travel important? I mean as you guys I mean I'm sure a lot of you guys travel, right? So as you travel you see a lot of options available out there whether to book search Places to stay places to go. So if you really look at the industry, right? this is the fastest growing sector in the industry and The numbers clearly shows that like the travel alone is around 2.3 trillion dollars, right? US dollars and the impact travel has on the overall economy is close to these 7.5 trillion dollars now. How does this relate to my talk here, right as the scale is big So is the data that comes with it, right and we have tons and tons of data that comes as part of this Internal data external data. We actually buy data. So it's a combination of a lot of things So what was some of the challenges we actually face, right? I mean most of you guys have faced similar problems I'm sure and today we've had some really great presentations a lot of them are trying to solve similar problems similar technology But for us we actually face the same problems six seven years ago, right? So the data sets that we used was not as big as to how we would leverage it The multi-dimensional aspect of the data that was another challenge for us So you must have heard some of the site analytics data, right our tools Google analytics site up site catalyst Web trends so most of the companies use this right, but there are a lot of limitations with that And that's just the site analytics data So if I ask you somebody or if you ask you guys saying what might be the conversion ratio or conversion rate for Travel bookings in general What would you guys guess as what percentage is it? any takers One five actually you guys are pretty close. It's actually 3% So think about the 97% of the customers who don't book that is where the optimization is, right? The 3% know you guys they want to travel they want to buy so they actually bought with you but where you can really leverage data and Where you can actually really optimize it is the 97% of the people you don't know why they left You don't know why they didn't buy you don't know what their mindset is So that is what you want to really tap into and that's exactly what we've been doing for the last seven years, right? And the other problem we had was sort of a controlled environment I'm sure you guys have heard about BI data warehouse a lot of organizations use this and certain places it's harder because you have a very strict controlled environment and It's harder to get to data and you need set of people that you want to use to do certain things So that was another challenge. We actually had and last thing last but not the least its speed, right? for us or for any online company the speed at which you need to make a decision is Pretty quick, right? That's what you want to look at So if you launch something today, you want to know in the next hour whether it's working or not You don't want to wait for three months or six months and those were some of the challenges. We actually faced So we actually went through our first iteration of Hadoop platform. This was in 2009, right? So I'm sure this is a common theme that most of the people have used Initially where you take a lot of the raw data put it in your HDFS You do your parsing whether it's map reduce or pig or whatever it is you dump it into your hive data warehouse and then you actually take it and put it in your actual warehouse, right? That's pretty much what we did. There is nothing different here So when we did this there were a lot of things that we were able to leverage, right? Even though it was the initial iteration for us We actually called it Owlitex 1.0 So there were a lot of things that we were able to tap into this architecture So the next few set of slides what I want to walk through is what are the use cases that we were able to solve? Using this platform and what were the insights we got and how did we leverage analytics on top of that? so lifetime value right I mean this is one of the Critical indicator that any online travel or even offline for that matter would love to understand better Right, and if you know better customer is the king, right? I mean you want to please your customer at all the time Gone are those days where you don't treat your customers well? You don't treat your customers. Well, they're gonna go somewhere else, right? So you really need to understand your customer. Well, so what we really did was we focused on acquisition Retention so acquisition means where are we getting these customers from in online marketing? There are multiple channels where we actually get them from right and Retention is that okay, so how are we retaining our customers? Are they coming back to us? Are they not coming back to us? Why are they not coming back to us? So all these questions we would need to answer so what we really did was we took a lot of Attributes related to the customer for example the channel where they came from How long they were on our sites? How soon did they return when was their first purchase made what channel was the first purchase made? So you take all these attributes or X variables and build your regression model out there, right? And when you do that you associate a value to a customer That's the customer lifetime value and the way we associate the value is in dollars, right? You can take it as five dollars ten dollars fifteen dollars whatever it is so now what we were able to do with this is We have call centers where customers call right? So you look at the tiers of these customers and route them according to that through your IVR system That's one aspect of it. The other aspect of it is you you tend to promote a lot of things with these customers whether it's coupons Promotions so because you know these are the guys who are very loyal to you and the third aspect is loyalty right for us as orbits We are very focused on loyalty. So one of the things we do is called or bucks right you book hotel car air You certain percentage of that is given back to you as points or or bucks and which you can use for your future travels So these are some of the aspects of it. We were able to derive Based on the lifetime value of the customers So I thought briefly about channel, right? Have you guys heard what a marketing channel is? Yes, yes, okay. Let me let me just give a 10-second intro to marketing channel, right? So you have a co You have our PPC which is paper click So you go to Google you search at the top and at the right you get results in a different color, right? Those are paid advertising that we call PPC then in Direct is somebody who comes to your site directly You have display channels where you look at let's say you go to CNN comm or you go to Mentor comm you see ads for us. Those are the display channels So there are like eight or different ten channels where we will have to optimize it, right? So allocation of Bookings to a channel is a huge deal for us because any online industry Almost fifty percent or more than fifty percent of your spend is on marketing Okay, and if you don't allocate it properly, you won't know what's your revenue per click You don't know what's your revenue per customer, right? So that's why attribution matters a lot for us So there were a lot of models we play around with last click first click equal attribution time decay So you take every customer and you look at what channels are they coming from you try to play around different Kinds of algorithm in terms of which channels should get the credit for right because in the end based on that We decide whether we want to spend more here or less here that matters a lot in our marketing So this is an interesting use case for us mind you this was done in 2010. I think Most of you people are aware that people with Mac spend more money than with people with Windows It is an evident truth, right? What we did was we took it to the next level. We tried to figure out how does it impact travel, right? So we really tried to analyze how the behavior is between users using a Mac versus of Windows, right? And what we actually came out was pretty interesting people with Mac tend to book a higher rating hotels and They prefer a lot of hotel amenities that are very Luxurious, okay, so you look at hotel amenity for example, they want to have a poolside view They want to have free Wi-Fi. They want to have lots of aspect proximity, right? Those are the things they look at so what we did was we actually tried to play around with the sort results, right? When you do a hotel search you want to show them what they want to see rather than Showing a generic result that everybody gets to see right so this way you are Personalizing a little bit more and the second thing we did was Sorry, the second thing was more around the recommendation engine, right? So you take the same attributes you understand your customer and you actually recommend them based on Their likings so these are some of the things that were Pretty amazing when we initially saw that But it is it is consistent in how the behavior is actually and and in fact this one was interesting because this really let The personalization aspect of our site we do a lot of personalization based on the machine learning And so what we do is we take the data we build algorithm We eventually funnel that back into our applications through real-time systems EFX so every freakin X If you know statistics X variables matter a lot right so you want to consider as Many X variables as possible to really derive your models so for us hotel sorting is everything right? I mean there is a saying where The thing goes like this, right? I mean where do you hide a dead body? And the second page of the Google results Right so nobody goes to the second page. That's exactly the same concept for us in hotel, right? Everybody wants to see the results in the first page not just the results Relevant results to them, right? So for that we actually spent a lot of time trying to understand How do we want to place our hotels? Where do you want to place it? How do you want to sort it? So we do premium placement. We do Enhanced value sort a lot of different ways. We actually do it So what we did was we took all these factors in we started building regression model around this, right? So we figured that for us between 3 and 5 was where most of the sort results were booked or even clicked for that matter So based on this what our revenue management teams do is they actually play around the The settings of the sort in real time. So there is an application which actually Does these analysis and pumps the data back enough to the revenue management team which actually in real time? They change and play around the triggers for this So predicting hotel stay that's another Interesting topic for us, right? So by the way this one is actually Chicago lecture drive This was the winter of 2012, right? Everything came to a complete standstill the reason I have this one and actually the Super Bowl is it matters on the stage because when you look at how people book How how they prefer to book or how there's a concept of advance purchase window and an advanced search window, right? How far in a head people search for it? and usually it's anywhere between 15 to 60 days, right? So you want to figure out, okay? so you want to figure out what day of the week it matters a lot and What time it matters a lot what kind of sorting rate ratings that matters a lot So the interesting thing about this is on this day when there was a snowstorm The same day bookings skyrocketed, right? Everybody didn't know where to go. They left their cars in the road like this. They actually took their mobile Booked a hotel close by walk to the hotel This is why it's very important for us to figure out the weather data into the picture So we pull in a lot of weather data to analyze the seasonality and figure out how people stay or what is the Potential pattern of people staying there And obviously the location matters, right? So Those are some of the business use cases right in the first iteration We also built like one or two tools that really helps us Are more rather helps the developers more in terms of their operations and development So one such tool we actually built was called business monitoring tool So I don't know if anybody recognizes this application, but that's actually click view Which is an in-memory visualization tool so what we did was we hooked up click view directly to our hive system and What we did was before the hive system there were map reduced jobs, which would actually run Algorithms to really figure out the blended rates of our error So what they would do is they like every site has errors, right? I mean you do a search it fails you you try to book it fails We categories them into different error buckets The reason we do that is these error buckets has are the blended rates has Monetary value so to it associated with it So you want to go fix the first one that impacts your revenue the most right for example if the color is different and There is another bug where the bookings are failing. Obviously you would Prioritize the booking failure first right so we did this as a overall blended rate margin which kind of buckets like Thousand plus errors that we get from different operations mind you our system depends a lot on third-party systems, right? GDS is I don't know if you guys I'm sure you guys have heard about Amadeus Apollo Galileo These are all GDS is right our systems depend a lot on these systems and these systems are built on Mainframe so you can imagine their systems and the complexities that involves to work with these systems What else all right there is another tool that we actually built out this is actually where we leverage flume so One of the things that we used I would probably say around 2006 or 7 was Plunk and now Splunk is Pretty good. It wasn't as good at that point of time So we used Plunk for two years and we figured out There were a lot of issues with that and it really didn't scale up to what we had with the data and everything So we actually what we did was we we funneled all our application logs So we use we used to use JBoss container at that time now we use actually Tomcat So we used to funnel all the container logs of the application server logs into Hadoop directly through flume and We actually did it in two pipes one went into hive Sorry HDFS and the other one actually went into H base. So with H base as you are aware You're gonna structure it a little bit better Compared to HDFS. So we built. I mean, this is the most ugliest UI you would probably ever see But it did the job right and as a developer you want to find out something wrong in production You would go look at it. It will run a map reduce job get you what you need boom. You're done The other thing we actually which I don't have here is we also built something called another tool But that was much faster because we indexed it and we indexed it only for the first six hours Because if you're troubleshooting a production issue, you don't want to wait for the map reduce job to complete You want it much faster than that, right? So that tool would give you data only for the six hours, but that will get the job done when there is a crisis going on So what did we learn from our first iteration, right? There were a lot of things we learned from our first iteration one is we kind of decentralized this model where Every development team had access to this they would go do their own analysis They would go do do their own visualization We called it the decentralized model, right? But then the challenges that came along with that was Basically, it's harder to scale. I mean you can't really scale with that approach. I mean somebody will use Maybe Kafka or somebody will use something else. It's really harder to scale So the second problem we saw was data governance, right? data governance is Absolutely critical when you deal with data because what it means to you It means something else for somebody then you lost the whole purpose of how you deal with data here That's a huge problem when you're looking at let's simple simple example is conversion, right? So we have around four five different brands and Probably around 30 different websites each of these guys when they say conversion ratio would mean something else for them and As a leader it would get really hard for them to oversee the overall business that is why data governance is very important and that's one of the challenges we saw and the scalability our first version Was interesting because we we had an FTP server sitting on our desk which would actually hold terabytes worth of data and There was no backup. There was no Redundancy failure. There was nothing right and if this guy Forgets and plugs the thing out then done. We don't have access to data So there were a lot of problems with that. So what we really did was okay So we we did one version and we said that okay, let's go to version number two and we're gonna do a little bit more Put some thought into that Would say we said, okay, so what is your foundation, right? Your foundation is mostly your data infrastructure and then what do you do after that? So you want to do descriptive analytics descriptive analytics is nothing but building reports You want to see what happened yesterday? You want to see what happened this month this week last month everything then you talk about predictive analytics, right? Okay, now, you know what happened all this while. Let me try to figure out what will happen next that is all about predictive analytics Then comes your machine learning, right? With machine the key difference with predictive and machine learning with predictive analytics It's all the analysts sitting in our organization running models and regression analysis but with machine learning what we do is we do the same thing but we actually Let the machine learn through the process we keep feeding more data to it And that's kind of how we saw the vision of what we want to take this data into the next step, right? This is actually an interesting tree map of our cluster So we have a script that actually generates this tree map for us But it's interesting because every time we run into an 80% threshold that's when we start thinking saying that hey we need to add more nodes and We add it and we kind of publish the next tree node out there. So the current cluster is Roughly around four petabytes. I would say But that's kind of where we are so right now we're in the process of moving from a four terabyte disk space to a six terabyte disk space and Bumping up from 128 to 256 gigabytes. So that's kind of what we are thinking through So the next set of few slides talks about the use cases that we solved with iteration 2 or version 2, right? So this is actually a public-facing website. You guys can go check it out What we try to do is we try to help travelers. We try to give them as much information as possible All right to do this we actually do a lot of analysis and publish it out so that people can make Informed decision in terms of where they want to travel what they want to do It's sort of like a research but helping with the data rate, right? So we build this tool which kind of does a lot of things one of the thing is city recommendations which basically says hey I want us I live in Bangalore. I want to go see what's the close by where I want to visit or maybe I'm in San Francisco I want to go see something close by so that's kind of kind of a algorithm That's built out to really leverage a lot of the data that we have to give this in the hand of customers So SEO SEO is search engine optimization, right? So you go search a keyword on Google The chances are you're always clicking on the first link. Why is that because it's SEO optimized? Well, you are getting the relevant content of what you need for us It's a huge deal because like I said earlier 60% of our spend is on marketing So we always want to pay attention to the paid Channels first where you pay money? SEO is free, right? I mean people argue it's not free, but for all practical debate. It's free So we want to understand how do we basically Talk about SEO which is more about kind of giving the credit to SEO Where it's due so that we can actually invest more in SEO, right? So one of the things we did was this called multi-touch, right? So you you look at your channels you look through the journey of one to 30 days By the way, that's an Indian industry standard where people look at 30 days for channels So you look at that and you try to figure out, okay, how much traffic am I attributing to SEO? That's one step But the second step really is to figure out what is the influence of SEO on the other channels now? We are able to actually show a lot more value to SEO which kind of Justifies why we spend more on SEO so Like I talked about GDS is where we have a lot of dependency and third-party systems One of the things that we are contractually obligated both monetarily and contract-wise is look-to-book ratio, right? How many times you look or search and how many times you book? We have to be very careful on look-to-book ratio. Otherwise, we end up spending a lot of money or paying them as penalty So we cash a lot of our results our data within our house So now once you start cashing it you need to understand, okay, how do you optimize your cash, right? You don't want to just say okay after eight hours just Recycle the cash So what we did was we did a survival analysis of the cash for hotels We broke it down into two markets or rather two groups A and B and Each market we put or each group we put eight markets when I say markets its city for hotel, whatever and then we started kind of fine-tuning the Test group, right? So you have a test group and a control group So when you start finding the test group or the control group then you really see how you can optimize or you find that sweet spot For your TTL, which is time to leave right with survival analysis it's it's more about when does the event occur and that's when you figure out you want to change the cash or not and Based on this we were actually able to maintain our look-to-book ratio on a pretty steady level without impacting our Cash-it ratio and the search results that come out here Keywords right this is another area where we spend a lot of money every Organization online organization spend money on bidding basically we all pump money to Google. That's how it is, right? And and with SEO and SCM pretty much you sell your soul to Google. That's how it is So Keywords right you want to bid for keywords where it makes sense where people click on that keyword and come to your site Otherwise, it's a waste of money, right? So you really try to target this one. This is called the long tail Right. This is where your high probability of conversion is and that's because not everybody's bidding here Right, but if you go here The conversion ratio is less and the competition is too high everybody simple example is Cheap hotels, right? That's pretty much here That's pretty much there because everybody wants to bid for cheap hotels and when I say everybody the all online travel companies I'm talking about But when you go here, right? You want to say, okay? I want to search for I don't know maybe native village in Hesargata, right? That's the long tail keyword because not everybody might bid here so your Opportunity of identifying a customer who is actually looking for something what he wants and the chances of conversion is higher here So that's kind of what is important that we really leverage both from a conversion ratio perspective And also a spend ratio perspective. So basically spend ratio is in nothing But like if you spend $1, how much are you getting back? Typically, it's always losing money rather than getting money back in a CM So all right The next iteration of our technology platform, right? So now we had okay pretty sturdy Hadoop infrastructure We had a pretty sturdy Analytics framework data governance was being addressed even though it's still not addressed completely But we were in the right path of our vision, right? Then we started talking about real-time So all these Hadoop jobs are pretty cool because you get what you want, but they're all batch processes So how do you really look at real-time and what do you do to get to the real-time? So there are multiple things that we tried out. I'm gonna talk about one use case here So we actually use AWS and Redshift So there were two use cases we were trying to go after one was landing page optimization And the other one was again, I think it was a little bit more about improvement from our old Tool that we built for logs. So what we did was we use Redshift and AWS to really figure out Okay, let if we roll out a landing page Let's say we build a new page and we put it on the site. We want to see immediately how it's performing So the that framework basically kind of gives us So that that this redshift and AWS kind of gave us that real-time capabilities of how we Build or look at things the other case we were trying to get out of this Which we actually were able to get out is campaign analysis So every time you want to do promotions you want to do campaigns you want to see immediately how it's performing, right? There are a lot of things that gets tied together your revenue your spend your Customers the visits everything so you want to see that in real time These are the two big use cases where you want to see things in real time other than the operations part of it, right? When when people say real-time analytics You really need to go back and ask them the question. Are you really gonna use real-time because? Business folks cannot perform in real-time. It's hard for them to change their strategy change their Tactical initiatives or whatever it is, right? So they always look at previous days data But there are tons of use cases where you need real-time data and that's what we were actually trying to get to with this With this so I Think what what I really wanted to convey in this session other than the use cases and how we actually iterated through all the Platforms that we built out was the first and the foremost, right? No matter how cool your technology sounds, right? Hadoop Kafka you talk about lens everything you don't use this data. You might as well take it and throw it in a dump That's basically the message, right? You have to leverage it to make your business decisions or to change Your business so that you can actually benefit from it. So that's basically the lesson number one the second one is Early in the game, right? Figure out what your strategy is what your vision is in terms of your data platform How you're gonna leverage is what you're gonna do it stick to it, but be agile Change through the process, right? I mean you saw as we went through three iterations We are still going through a lot of iterations. Don't get hung up on saying that. Hey, I invested here I have to continue investing it fail fast. That's very important the last one, right? I mean, this is the most trickiest one where I'm sure a lot of sales people will be very happy build versus buy You have to make a right decision when to big build when to buy you don't you want to stick to your core business, right? You don't want to go build something which is not your core business I mean it is pretty cool to build it But in if the returns are not justifiable you need to figure out build versus buy So that's pretty much the story at orbits, which I wanted to share with you guys And if you have any questions, I can take it. I know it's a long day for everybody people are tired I can also stay back to answer a few questions if you have any Thanks for the talk My question is about Mainly the hotel industry uses two types of business one is the group and other is a transient business The thing that I talked of Search and bookings happening between 15 to 60 days in advance That is mostly for transient business the group business happens way in advance What's the other business you said group business business that is a qualified business. Yeah. Yeah, so Is there any way orbits uses this information because group in business probably would not happen through orbits only the Non-corporate customers would be booking through Orbit, can you give me an example of group business what you're talking about because I know what you're talking about I just I just want to understand what group business, right? So let's say I'm having an MIDC area and I'm having a Hotel which is close by to that So all the companies in that area make a contract with my hotel saying that if if we give you a certain number of Bookings in a year. You're gonna give us a discount, but those bookings happen really in advance They are like planned visits of customers and stuff. So those bookings probably won't go through orbits But they're going to affect the number of bookings that come through orbits because finally they're going to Refer to a constraint capacity constraint of the hotel itself Okay, so there are there are two pieces to this right one most of the talk I did was more around the leisure business. Okay leisure business is like common man like you and I go booking That is the leisure business. We also separate corporate bookings We also separate affiliate bookings like a simple example is American Express, right? If you go to American Express and try to book something you don't know But everything behind the scene is actually orbits platform. So all those bookings flow through us So for us, it's it's a matter of our own data plus all the external data that we actually try to type Right because one of the key concept of analytics is competitive intelligence Let's say you're growing at 5% and you feel really good saying compared to last year I'm growing at 5% but if you look at the industry You're actually at the bottom people are growing at 20% right you got to do that competitive intelligence There are a lot of data sets that we buy actually Which kind of gives us the competitive intelligence which we use a lot of those things to decide how we do and what we do things Right and sometimes hotels also do their own forecasting and optimization of prices. So do you? Give this data back to the hotel saying that for the next 60 days. There are a lot of We do we definitely do actually so one of the things we do with all our partners hotels airlines is we actually tell them saying that hey This is how your hotel is performing because they need an incentive to give us a better rate That is an natural way of negotiating better prices, right? So we pump a lot of data back to these partners Yeah, whether it's airlines or hotels to show them how they are actually performing or not just performing by bookings Right. We are talking about where are the customers coming who are actually booking your hotels? What kind of amenities that they like in your hotel? We kind of give a very elaborate Thanks, I have one question Yeah, I will upload the slides. Yes. I have one question. Have you done? segmentations on the demand forecasting and occupancy analytics How you how did you? The segmentation part of it see there are a multiple ways we do segmentation of data, right? So when you're talking about just the are you talking about the hotel stay or the occupancy or which one are you talking about? The occupancy basically right so the way in travel industry We segment is multiple ways depends on the brand simple example is one of our brand is orbits.com, right? We are predominantly in us, right for us looking by country as a segment makes no sense So we drill down to buy city by market But on the other hand we have another brand called hotelclub.com for them They operate in all of Asia Pacific including India, right for them country matters a lot So depending on the brand we do a lot of segmentation. So we do market segmentation. We do basically channel segmentation We do basically the CLTV segmentation saying that hey, are they tiered customer non-tiered customer? Where do they fall? So they're like, I don't know maybe n number of segmentation that we use actually How did you arrive demand forecasting? I'm sorry demand forecast. So demand forecasting is a pretty interesting topic when you say forecasting Is it the visits forecasting or the booking forecasting? So booking forecasting is pretty easy for us because there are multiple things you look at you look at seasonality You look at the industry. Okay, you look at our past history and you look at all our future roadmap And you tie them. I mean, it's it's it's ridiculous how we do the amount of time that we spend on forecasting Right, I mean we spend like three to five months just doing forecasting You look at all these things we do definitely we do apply actually. So unfortunately, we don't apply these Thing in a wider scale. So it's it's in a closed because it's a pretty confidential stuff So it's a closed group which where they do all these kind of forecasting But they take all these aspects of it not just what we have even outside. It's like simple example, right? So today and tomorrow and after is fifth elephant We take things like this into considerance to figure out What is the hotel occupancy around Bangalore because people are traveling up from outside So there are a lot of factors we consider that exactly So click we are in high with something we did in 2010 I think or 2011 I would say There are pros and cons with that if you go back in history from 2009 to let's say 2012 2013 There are not too many visualization tools that really support Hadoop. I mean even today for that matter, right? So click view is something it can connect to any database, right? So we actually used thrift And regular JDBC connection to connect to hive Yeah, directly through hive You should send me a note. I'll tell you how to do it Yeah, I have a question No, no, so I told you we have a GDS think about GDS as a database, but not in our control. It's outside, right? Lot of people have access to this database not just as Expedia price line named 20,000 websites When we search something we want to get inventory, right? See these are all inventory that we don't own It's owned by airlines it owned by the hotels GDS are the common place where everybody files their ratings So we basically go to GDS and say hey, give me the results for hotels for this city for these dates When we do that, we are doing a look to them, right? Let's say I do 1000 looks today and I do zero bookings the look to book ratio is pretty bad there That is why we optimize it by cashing a lot of things in our thing But when you cash anything right forget hotels you have to optimize the TTL If you don't optimize your TTL, then the whole point of cash is lost there That is what we use survival analysis to optimize that. Yeah, I have two questions one question is about When you try to personalize what are the first initial two to three features which you try to personalize Mac was definitely the first one Okay, and second question I had was about you said about you are purchasing Outside data sets right so can you tell us so what are the data sets? So there are a lot of data sets that are available that you can purchase Some of the data sets that we've used is whether data you can purchase that there's something called walk score You can purchase that walk score is very simple, right? you you look at a hotel and It considers a lot of attributes and calculates the proximity from the hotel like how close is it to the highway? What are the movie restaurant? What are the restaurants around or the movie theater it comes up with a score for that hotel so we use that data and then We've used a lot of third-party data which is specific to marketing channels But these are some of the external data that we have obviously social data, right? I mean you have Facebook, Twitter, YouTube. You've been speaking about Seasonality effect, right? So basically it is a context external context that you're feeding into your recommendations Yeah, absolutely. I mean that's You heard the answer I give this guy who's sitting here is competitive intelligence is very important for us because if I don't understand the Seasonality I can't forecast and I can't optimize anything So we we actually look at a lot of season there are a lot of industry that publishes these kind of data right in US especially The beginning of the year they publish all the conference status. So you know Las Vegas is a conference hub You want to optimize your hotels in Las Vegas? So you are gonna consider all those things what you do is you're gonna say hey in March There are these conferences which are very popular. Let's go negotiate better fairs with these hoteliers there Let's go grab some inventory with these hoteliers. So those are the optimization at the runtime I'm asking about from the implementation perspective. How do you feed in this data to your model? How do we feed this data into the model? Yeah, the context and there are there are two ways one we do offline one is sort of an online Which is what feeds into your personalization, correct? But most of the things that you asked is all offline because you can't really take that and it doesn't change every day, right? So you mentioned something about predictive analysis and machine learning at the top of the pyramid Personalization was one of the use cases personalization How do you perform with so much data at your hand? How do you perform machine learning at that scale? So we we actually the if you remember my first iteration there We had something called a decentralized model, right? So we have like our tech team is around I'll probably say around fine. Do we have to cut off? Yeah, please come down. It's anyways like a personal discussion here. Thank you