 Sama-sama, kami mempunyai 50 negara-negara dan 50 persen diserahkan. Jadi kota yang baik adalah, saya suka menggantikan itu. Jadi jika anda perlukan lebih banyak detik, anda akan datang nanti. Untuk kata-kata, kami mempunyai 12 perniagaan yang lain. Dan Srimaz dan Gruber akan bercakap dengan kami. Jadi kami tidak perlu membuat ini jika ada masalah kami. Boleh saya beri keputusan kepada Srimaz? Baiklah. Baiklah. Saya hanya mahu beri keputusan kepada Srimaz. Kerana jika dia tidak memperkenalkan diri, dia tidak memperkenalkan diri. Tapi dia tidak memperkenalkan diri. Jadi saya akan beri keputusan kepada Srimaz selama beberapa tahun. Jadi kami berdua bekerja di MAPR. Sebelum saya bekerja di MZATA. Dia adalah KTO dan Co-Founder di MAPR. Mereka adalah pemilik dasar yang terbaik untuk distributedan sebelum saya. Ini bukan rumit, saya tidak bekerja dengan lebih banyak. Dia tidak bekerja dengan kadar banyak. Tapi dia mempunyai kebanyakan dan kemakanan yang menarik. Saya akan beri KTA beri listap untuk diri Srimaz. Kami akan memperkenalkan sistem angkoh. Ia adalah sistem distributif file yang pertama. Kemudian anda kembali ke syarikat Spinnaker. Syarikat Spinnaker. Ia adalah sistem distributif file yang diperkenalkan oleh Netagap. Ia adalah salah satu syarikat yang mempunyai bahagian San. Seperti sebuah syarikat. Selepas itu, ia bergerak ke Google dan mempunyai sebuah syarikat yang lebih besar di Google. Ia adalah syarikat yang lebih besar di Google. Selepas itu, saya melihat di Google. Ia berkata, tidak, dan PC tidak akan berhenti bergantikan dengan anda. Kami akan memperkalkan Google dengan syarikat. Jadi, ia bergerak ke map R. Saya menggantikan map R dengan syarikat yang lebih besar di distribution. Kemudian ia bergerak ke map R. Ia adalah syarikat yang lebih besar di Google. Jadi, saya rasa semua orang di sini mempunyai sebuah syarikat yang membuat sesuatu. Dan ia membuat sebuah syarikat. Ia adalah beberapa orang yang saya boleh bercakap dengan. Bagaimanapun bagaimana anda tahu tentang sesuatu, Shrivels tahu sedikit lebih banyak. Mereka berkata, ya. Mereka berkata, ya. Jika kita bercakap begitu percaya, anda percaya bahawa dia sebenarnya tahu sesuatu. Jadi, saya akan berbincang dengan anda. Kita akan bercakap tentang sesuatu di channel ini. Kita akan bercakap tentang bahagian di sana. Kita akan bercakap tentang bantuan komponen, dan saya akan bercakap dengan mereka dan bercakap dengan mereka. Anda boleh bercakap dengan mereka dan bercakap dengan mereka. Jika mereka ada kemampuan, saya akan berikannya. Jika tidak, saya akan bercakap dengan mereka. Baiklah. Terima kasih, John, untuk menghiasi saya. Saya akan bercakap dengan mereka. Jadi, kita bercakap dia pada tahun 799. Selepas sepanjang hari, saya tanya kepada saya, bolehkah kita bercakap tentang sesuatu? Saya tidak tahu apa. Jadi, saya menghubungi pembentangan yang saya ada. Saya mengubungi beberapa pembentangan yang diperbaiki. Jadi, saya menjual pembentangan yang diperbaiki. Dan saya rasa lebih banyak pembentangan tentang pembentangan yang diperbaiki. Tapi hanya untuk beri kepada anda perkara yang kita buat untuk berjaya. Kerana kita selalu berjaya. Jika anda berjaya dengan ini, jika anda berjaya dengan itu. Jadi, apabila kita berjaya, dan ini adalah pertanyaan yang sangat menarik untuk menghubungi pembentangan ini, itu tidak mengingat, kerana tak sepatutnya pembentangan itu adalah pembentangan sebenarnya orang. Dan adalah kepada elemen unit方u yang berada di kedua-dua. Dan pembentangan ini berada pada kedua-dua. Jadi, apabila pembentangan bergerak? Apabila pembentangan bergerak apabila kita bergerak, jika penerbangan itu mula-mula masuk. Dan ia agar peperiksaan yang sederhana. Apabila pembentangan bergerak? Well, dia adalah pembentangan yang berada di kedua-dua. Apabila ia berlainan, ia mahu dipercayai. Teruskan. Dan ia perlu dipercayai dengan pasang yang tidak mahu mencari pasangan dari pasangan. Tetapi, mereka berada dalam konflik. Di sana ada konflik. Ia adalah jika semua pasang-pasang-pasang-pasang, anda tidak akan menunggu apa-apa untuk pasangan. Dan jika anda mahu pasang-pasang tidak mencari pasangan dari pasangan, anda akan mencari pasangan dari pasangan. Dan ia adalah masalah untuk mencari pasangan dan mencari pasangan. Jadi, apa yang kita mahu mencari apabila kita mencari pasangan yang tidak mahu mencari pasangan? Saya tidak akan bercakap terlalu banyak tentangnya, tetapi saya akan memberikan masalah. Jika anda melihat pasangan yang betul, saya akan mencari pasangan yang betul. Di sini. Anda dapat melihat. Pada masa hari ini, ini seperti pada dua hari, anda mempunyai pasangan set-7am, 2pm dan sebagainya. Saya hanya membuat pasangan ini. Tapi ini seperti itu. 80% sebuah pasangan yang dipercaya. Anda mencari sangat percaya. Tapi ini seperti sebuah pasangan yang betul. Tapi kita akan beri percaya. Tapi kita akan beri percaya. Kita akan beri percaya. Tapi apa yang anda lihat adalah, anda melihat pasangan di sana, pasangan di sana, dan ini adalah pasangan di sini, pasangan di sini, hanya pasangan di sini, di sini, dan kita akan pergi untuk bekerja. Tapi apabila anda mahu pasangan, jika anda mahu pasangan, ia seperti itu. Anda mahu mencari pasangan yang betul. Pada masa hari ini, anda akan membuat pasangan yang betul. Jadi, anda perlu pastikan bahawa pasangan yang dipercaya pada sebuah pasangan yang betul, pada setiap kali. Setiap kali, kami mau beri beri sebuah pasangan. Kita ingin mengatakan bahawa semua pasangan yang lain saja akan dipercaya. Kami melihat pasangan yang mendayang, ingin meng turuk jemap, bahawa kita berusaha, diek di mana, kami mencari masalah yang dikatakan. Kami mencari pasangan yang mendayang, dia akan mengpulang dengan kita. Kami berusaha mementukan iaan-kian untuk mengambil pasangan yang betul. Kami mempunyai pasangan yang lebih baik dari itu. Kerana itu masa yang berlaku apabila kereta ini berlaku. Dan keadaan itu akan berlaku. Jadi kita selalu cuba mencari apabila ia akan berlaku. Kita menggunakan masa untuk menggunakan semuanya. Jadi untuk menjadikan infrastruktur kita. Jika kita perlu menjadikan keadaan yang tidak biasa atau keadaan. Dan jika kereta-kereta berlaku seperti ini, biasanya kita hanya menggunakan keadaan yang lebih tinggi dan tinggi. Kerana mungkin di sini awak ada sebuah negeri. Mungkin di sini itu salah sebab terlalu tegon. Tapi awak tidak seorang negeri. Dan awak sebuah negeri. Memang awak sebab sebab sebab di sini. Jadi kereta-kereta sangat jauh dengan masalah terlalu sebab dengan tegon. Canak. Kereta-kereta ini sangat berlaku. Tapi jika kita bisa menggunakan keadaan yang terlalu tegon. Tapi jika kita bisa menggunakan keadaan yang terlalu tegon. dan mengalami kebukaan yang tidak dapat dilakukan. Jadi kita boleh mengadakan untuk berumur. Dan kemudian saya dapat memperkembangan keras yang dihubungi sehingga kita boleh memperkembangkan supaya kita dapat mempunyai lebih banyak di sini pada malam ini. Ini adalah jaya yang kita buat. Pada plan data atau plan data yang ditutupkan, kita akan berbincang dengan halaman. Ada setiap kawasan yang panjang. Ia adalah sebab infrastruktur yang berlalu. Saya akan kembali di dalam bahagian yang ini. Jangan beritahu saya, tapi kita mempunyai setiap pembukaan yang kita boleh berfikir dan bukan satu, tetapi 10 daripada mereka di tempat yang berlainan. Jadi, ia sangat... Jadi, mencuba jauh-jauh untuk kamu. Ya, jadi kita telah mencubanya sekitar 5X per tahun. Selepas 5 tahun. Jadi, anda boleh mencubanya bagaimana ia berlainan. Jadi, setiap pembukaan telah melakukan apa-apa yang mereka mahu. Mereka tidak tahu apa yang lain akan berlainan dan ia berlainan sedikit. Pembukaan jauh-jauh adalah sebahagian daripada jauh-jauh. Kami bercakap tentang map, jauh-jauh, pembukaan jauh dan segala-galanya. Ini juga adalah sebahagian daripada jauh-jauh. Ini adalah sebuah pembukaan yang menarik dan tidak banyak orang tahu tentangnya. Jadi, saya hanya fikir saya akan mencubanya 5X per tahun dan kita boleh berbual. Sebelum saya beritahu, saya hanya ingin faham Persembunyi Pertamaí yangalu beritahu semua orang, anda akan mencubanya. Tetapi, tidak ada yang benar-benar menggantikan persembunyi ini. Mereka menentu pembukaan jauh-jauh. Apa yang anda belajar adalah persembunyi Pertamaí jauh mengubangnya sebagai mengubang pembukaan jauh. Pertamaí jauh mengubangnya bukan dengan sebuah negara yang memutuskan kembali yang baik. Tapi, kembali kembali yang baik. Anda dapat mengubang semua perkara baru. Ini adalah sebuah penyelamat kelas. Ini adalah Karney Mellon dan Robert, sebuah pengekaran. Ini sebuah pengekaran yang mempunyai banyak kelihatan yang berlalu dalam kekaraan. Bagaimana jika kita boleh melihat perang, bagaimana jika kita boleh melihat kelihatan kelihatan kelihatan kelihatan mengenai suatu hal, ini adalah video yang diperluatkan. Sebelum 4 tahun lalu, mereka beritahu pengek yang melihat kelihatan kelihatan kelihatan kelihatan kelihatan untuk mengenai suatu hal. bagaimana masalahnya untuk memasukkannya atau memasukkannya? Dan apabila anda melakukan ini dengan banyak percaya, ia tidak terlalu sukses dengan masyarakat tradisional. Jadi apabila mereka bermula menerima feedback, masyarakat positif, bahkan bagi manusia, masyarakat juga adalah perkara yang sama, tetapi apabila ia mempunyai kecepatan kecil, ia memulai belajar. Jadi, saya akan memberikan contoh yang sangat menarik. Saya akan pastikan bahawa ini akan berjaya. Jadi, Mari kita mulakan. Kenapa kita akan mulakan? Ini akan berlaku di atas. Mereka berlaku di atas. Baiklah. Tidak. Bagaimana dengan itu? Baiklah. Bagaimana dengan ini? Bagaimana dengan ini? Baiklah. Baiklah. Mari kita cuba lagi. Baiklah. Ini adalah penyakit yang berlaku di atas. Penyakit biasanya berlaku di atas. Tapi penyakit ini berlaku di atas mencari 30 degras penyakit. Dan sangat susah untuk bahawa manusia membawa ini. Dan anda dapat lihat bahawa sandar ini seharusnya tidak terlalu teruk. Ia belajar ini berlaku di atas. Jadi mereka cuba itu di penyakit yang mereka mencari. Ia adalah sebuah klat. Dan apa yang anda lihat adalah bahawa bahawa bahawa 10 degras penyakit, ia tidak boleh dilakukan, ia diberikan. Ini penyakit yang berlaku di atas. Baiklah. Jadi apa yang mereka lakukan selanjutnya? Boleh kita membuat ini belajar 5 degras penyakit dulu? Jadi 5 degras penyakit di mana ia terlalu susah. Dan ia adalah penyakit penyakit yang berlaku di atas. Dan anda dapat lihat bahawa ia berlaku di atas, tapi ia berlaku di atas. Sekarang apa yang anda lihat selanjutnya? Di video ini, ia adalah perkara yang sama, membuat sebuah 20 degras penyakit. Dan sekarang, dari sebuah langkah penyakit, ia telah belajar. Dan sekarang anda melihat bahawa keadaan berlaku di atas. Sangat berlaku di atas. Jadi ia membuat penyakit penyakit di atas. Jadi keadaan berlaku di atas itu bermaksud anda menghubungkan keadaan sehingga sejauh masa. Jadi anda sebenarnya mengambil keadaan dan ekspermen dan anda mengambil keadaan sebagai ekspermen yang berlaku. Jadi anda membuat keadaan berlaku di atas. Saya mengenai itu. Jadi, itu apa yang kita lakukan di sebuah keadaan. Jadi, ini adalah contoh sebuah keadaan. Ini adalah apa yang sebuah keadaan berlaku. Ini adalah apa yang sebuah keadaan berlaku. Dan anda boleh melihat ini sekejap. 1, 2, atau... anda boleh melihat ini sekejap. Ini akan menarik. Jadi bagaimana cara berlaku di atas? Jadi apa yang terjadi pada masalah? Anda memahami objek. Ya, semua orang faham. Kemudian anda perlu mengajar apa yang ini akan berlaku 1, 2, 3, 4, 10 saat. Kemudian, apabila anda boleh melihat ini, bagaimana kita harus mengambil keadaan? Kita perlu mengenai itu. Itu adalah tiga masalah. Jadi mari kita lihat perasaan, hanya mengenai objek. Jadi, ya, kita boleh mengenai semua perkara ini. Tapi lebih daripada keadaan, orang dan tiga dan sebagainya. Anda dapat mengenai keadaan berlaku, keadaan berlaku, keadaan berlaku dan keadaan berlaku. Ini adalah perkara yang anda dapat mengenai di atas. Macam perasaan. Dan perasaan perasaan tidak mengadakan segala banyak perkara. Tapi ia adalah perkara yang satu kerana ia di dalam keadaan dan ditutupkan dengan banyak perkara. Perasaan Zipro? Dan begitu sahaja. Jadi kita jenis kebenaran mengapa ini perasaan itu pecah? Ini asin perkara. Itu adalah perkara yang sama untuk memperkesan ke sebuah tempatan. Apa pilihan yang semalunya? Kita juga boleh menghargainya seorang perasaan brief untuk minta kepada saya tiga perkara yang betul. Pada itu, ialah salah satu perkara untuk memperkenalkan apa yang berlaku. Tapi ia hanya perlu dilengar. Semua jalan-jalan berlain dan sebagainya. Jadi apabila kita dapat mempunyai objek itu, kita boleh mempunyai bagaimana ia akan bergerak. Jadi bukan hanya bergerak. Bagaimana dengan kecuali perubahan? Bukan hanya bergerak, bukan? Atau apabila ada sesuatu yang berlainan yang akan berlainan? Kita perlu melihatnya di setiap jalan. Tetapi pelajaran boleh meletakkan kerana ia bergerak. Tetapi kadang-kadang kita perlu mempunyai banyak objek untuk mempunyai sesuatu yang akan berlainan. Untuk melihat kecuali perubahan, dan kecuali perubahan yang berlainan yang berlainan atau berlainan, dan kecuali perubahan yang berlainan, sangat kecuali perubahan yang akan berlainan. Walaupun ia tidak bergerak, kita perlu mempunyai perkara-kara yang sama. Adakah perkara ini akan bergerak atau ia akan berlainan? Saya tidak mengerti perkara yang ada di bawah di sini. Mereka semua bergerak. Baguslah, bagaimana kita lakukan kapan yang akan berlainan? Kami juga mengenai kecuali perubahan. Pertama, kita perlu mengambil kawasan yang berlainan. Sebelum perubahan, kita perlu mempunyai perubahan. Kita perlu mengambil kawasan yang berlainan. Pada perubahan, kita perlu mempunyai perkara-kara yang berlainan. Kita perlu mempunyai perkara-kara yang berlainan. dan ia boleh menjadi kursus atau berkumpul atau berkumpul di tinggi atau tidak. Kemudian, setiap yang perlu dipercaya dan mengejutkan setiap subsegment navigasi harus dilakukan. Dan itu adalah masyarakat untuk mempelajari alasan. Jadi, subseksian itu adalah navigasi yang dipanggil manual planning. Dan rata logis harus diadakkan kemudian. Jadi, keputusan berkumpul, harus dibuat. Untuk menurut, apabila lelaki kembali, anda boleh lelaki atau mengejutkan? Bagaimana dengan kecil kecil? Dapat masa dipindahkan? Bagaimana dengan kecil kecil atau diperlukan? Aupaya yang digatakan kepada s updates data? Baiklah, bagaimana pengaruh? Berapa banyak perempuan ini berhasil? Bagaimana dengan kecil eike-eike? Pertanyaan kecil, jadi anda tahu perjalanan curve untuk mendengarkannya cepat. Jadi, bagaimu perjalanan panjang adalah suatu kecuali. Bagi kecuali kes unforgettable. Now the next problem is how to QA the stuff. That's another huge headache. So some things are pretty yes, no kind of QA questions. Car went off the road, you hit somebody, you went through a red light. It's passively very easy. But the other things are much more qualitative. So for example if you're doing the hard stop or a hard acceleration, either your car parts are wearing out too soon, you're going to skid off the road, and the passenger in the car is pretty much throwing up his food. He's going to get dizzy. So how about taking it easy? You go too fast and you're too close to the car in front of you. The driver in front of you is going to get really upset in the United States and pull out of there and shoot you. Or you're driving too slow. I mean how many times you've been stuck in the car that doesn't want to go fast and it's like, 20 in front of a noisy one going. All of these actually require machine learning to tell you is this actually a good experience or not. So even the QA itself is machine learning. How about you went too close to somebody? If you go too close to a biker or a pedestrian or somebody, you're going to get very upset. Unless you're parking close to a kerb or a parked car or a pillar, you're quite okay to go close. In fact, you need to be as close as possible. So it's a very... It's not unique to learn this. What is okay was not okay. So let's look at the data problem here. So what you have in the car is the sensors. The sensor logs record what the car saw. The quad did all the sensors. It's just blind recording this one and I'll just record what they did or what they didn't notice. The driving logs are what the car actually did when it drove. So when you test the next version of software, you can take the sensor logs and table and that tells you and produces a set of driving logs on a simulated car hardware. And so now you have two sets of driving logs. This new version of the software drove slightly differently. So you bring it up and now you have to compare to see how did the new software perform on those soft points I talked about. It's not really about pass or fail. It's like, how was the driving experience? And so, it's a massive map produced problem. If you have an idea, let's say we, we fix some core software that let's say it drives through an intersection. How are you going to test this software? It's a very complex piece, right? So we will test it on 50,000 different intersections in different weather conditions. Sun, rain, snow, you know, sleep fog and so on. In different kind of geographic environments, right? The surroundings have to be urban, rural, forest, sand, snow and so on. How about different times of the day? You know, night time how something looks is very different than how it looks during the day. Or sun in your eyes versus sun away from your eyes and so on. And how do you say when is it raining? And to give you an idea of the scope of the problem, let's take a look at how much data is generated. A single car generates about, and this has been very, very optimistic. Today it's much bigger number than this. 250 gigabytes an hour per car. Which means if it drives for eight hours, you have two terabytes per day per car. Later. Just a sense of loss. If you have 1,000 cars on the road, you're getting two terabytes an hour. What do you do? Right? So this is just a tip of big data. You guys were thinking of big data, now you're thinking of big data. So anyway, with that introduction, I thought I'll go back to this diagram which I think Jon wants to talk about. Thank you very much. So this is data science SG, but I think this is actually more geared for big data SG. You can see yourself more of a data engineer perhaps than a data scientist, right? Ya. Although I did teach a course in machine learning recently. I can try to use this. Direct learning on it. Go ahead. So I did teach a course in machine learning. And I'm covering those. It's this morning. Ya, ok. Ok. I'm American, I speak loud. Ya, I think I've done infrastructure most of my life. Didla went to machine learning when I was at Google on search site because that's how search figures out how to rank things. And a lot of this is learned automatically. And actually at that time now machine learning in the last 3-4 years is just lead through especially in speech recognition and different text generations of this. It's just gone through through holes. It's a massive advancement in the future. So how many people feed themselves and put a roof over their heads by using machine learning as data scientists? Ok, if the guys on my team aren't raising their hands they're fired. So how many people are data scientists? How many people are data scientists? How many people are really shy data scientists? Ok, how many people know what machine learning is? I think that's a good number. Ok. Ok, so I'm not going to go too far back let's start out things with Uber. You've been at Uber how long? About a year now. So what has changed? What have you changed since you've been at Uber? What has changed in the business and how have you adapted Uber to the challenges that the business has? So one of the biggest things in Uber is how quick the data is available for analysis. So when the ETL problem is massive that is we get huge amounts of data almost 500 terabytes a day of data coming in from all your cell phones and to move that fast and make that searchable would take 24 hours. And now we've cut it down to 1-2 hours for the really important data and financial data and all that and the pricing information is down to literally a couple of seconds. So we can be different. So there's an 80-20 room to all data just like everything else. 80% of the data is junk. 20% of the data impacts 20% of the business. So with that data we are moving like in the past. And that actually was my first thing in Uber. We were talking about that a couple months ago. We've got a lot of data coming in and we've got incredibly high bandwidth costs in some of the countries that we work with. So it can be really prohibitive to move a lot of the data. So we're looking at staging some stuff in data center so it comes to data center and then 20% important immediately. But I was asking you that you collect everything that goes to a centralized place right away. Correct? Yes. So that's a tremendous amount of data and it takes you a little bit of time to actually process that. So we use Kafka everywhere. So the first problem for us is that we are running on somebody else himself. So the drivers they don't want to pay too much for their data plans and we don't want to burn too much of their three-page SIM card to use it upright. So we are very conscious of how much data we send. But then we move it very fast so we run it in the cloud so that it's a very quick response across the world. And then we move the data center. And it's fully replicated for us in data centers 100% of the data. So is it your own infrastructure or are you hosted in the cloud? So combination. So here's the real simple math. If you do it yourself it's going to be a third of the cost of doing it in the cloud and you're understood the rest of it. If you're going to just struggle to start in the cloud and move to on-prem I think that's a better approach. But if you have a lot of data the storage cost is going to be much cheaper on-prem than in the cloud. But then there's elasticity in the cloud that you cannot. For example, we launch a new city. We don't know how successful that's going to be. So why would we put this in the cloud? I'm sorry, why would we build a data center near there to put it in the cloud? We kind of mainly put a data center. Do you ever dynamically spike into the cloud when demand jumps up? We do a combination of several things. So absolutely to go into the cloud as well. We shut down back in services. So we start using all the resources for a survey for interactive and worry about the data processing later. We shut down and other things take the CPU So you've got some sort of containerized system? Yeah, we don't use docker. We use containers. We just found it to be more flexible. And yes, it's containerized. Using mesos. We build our own framework on mesos. We're in the process of improving it. There's a lot of the frameworks and mesos and we plan to open source that pretty soon. Not pretty soon, maybe in the 6 months or so. Can you speed it up a little bit? We're doing some stuff with mesos and let's have a day to sign it. We could because I don't want to take it back. I think that's probably will help. If you guys want to contribute, I'd love to help. Conversely, in our regular engineering, like the web stuff, should we be afraid? So, we're using docker. So the only problem with docker is I found a lot of complications but not much benefit. It's not as flexible and extremely complex. It's pretty rigid but it's a lot of little things to tweak which are not very important. So we use containers directly and so once one team figured out how to get the right set of combinations and it was like, hope you guys can do it. So almost everything we run inside that and it's working well. So, what part of this diagram gives you nightmares at night? What keeps you awake? What worries you? This is a pretty complex system. This is simple. Or do you want to just talk a little bit about how the system works? So, we're going to build the system to talk and do as you like best. Just say that it's not beautiful but once this commenced some laws that I would want. We have a lot of alerting systems. So we have the systems for Latina. It's built using SAFSA and now we're looking at it's at Flink and we've built a bunch of NPOs in another open source system. So, and then this system builds a lot of metrics that are very, very immediate. So, for example, search that how we do pricing is based on this system. Or things like fraud detection and all that. So, we have a lot of consumers downstream that consume from this path. MF models are built in this path. This is the actual tricks and everything. So, accounts, driver accounts, trips, destinations, pickup points, geocoding, all the different tax rates in every city for pricing and so on. We massively started and we visited ourselves. We trust no DB and minus field completely for reliability. So, reliability was number one concern. So, we didn't trust any of the no SQL databases on that aspect. So, we built something on top of my SQL to show it to the server and we have thousands of charging minus fields. Not even map our DB? Well, before my time. Okay. The Fandra was most of something and they said we should use the Fandra. But it's good for the area where you can afford to lose. So, things like metrics and so on. It's true. For me, it's eventually inconsistent. So, all the algorithms are not it leaves a lot in this area. So, and the Fandra now has this paxless thing but it's four message exchanges so it's but it's good for things like metrics collection where if it's okay if you lose a few but trips and all that money is here. And there's a bunch of other you don't have to charge in my SQL but you don't need it. I mean if you have regular data you can just use regular most trust of my SQL so we just do that. Both were close. So, and what are the fees to go inside an airport to leave the airport across a bridge? So, we get the data sheets from different documents and generally. So, that interaction happens here for a service API. So, before you go into that I want to ask you a couple questions here. So, you guys you run a lot of microservices. I watched a video by somebody else they're saying 1200 there's probably a lot more in these days. So, how the microservices related to the databases? Do they share databases? Do they have individual databases? How do you guys structure that? So, a very good question. When Uber started it was 116 services. So, this one big massive it was a quote that it was a trend in the front. It was slow, crazy, didn't scale very well. So, then the pendulum swam from a single piece of block to the other end where everything is a service now. So, you have 2,000, 3,000 microservices. I think the pendulum was going back again to reduce the number of microservices. We don't need that. But everybody runs our PC so they did it. And we didn't stock up if we wanted getting the business going. So, just do enough to keep your nose above water and grow it fast. And you have 3 months to get something done and you get rid of them 3 more months and so on. So, language we use node. Here, a lot of the services are Python node and services. Actually, there's no C++ in Uber. It's all Java. And now we're going to go. Because we want very high productivity and development productivity is more important for us than efficiency of the hardware usage right now. They don't change you, worry about it then. Right now, it's like stay keep your nose above water and keep going. And so, we use our own scheduler. But think about it like Amazon Lambda. You make a call and it's going to get scheduled. Yes, they use the same databases database. So, one service will do an update on our database to generate a trigger and that trigger is either propagated through Kafka or an internal human-side database and all the parties that want to do something based on that will now want to do very simple. An example is let's say a trip completed. Right now, we have to go into private ratings credits advising all the calculations to see how do I exhibit my incentives for any given coupons or if I have friends, there's all that and that trigger. So, we try to do very minimal what we call the inline pack so the experience is very nice and everything else we try to post after the online experience. We don't really just keep the front end I'll let you move down the diagram shortly, but hold on. So, there's things that happen in this application API, services winds up in the database triggers, messages that go to other systems that have to act on that, correct? So, how do you do that? Is it really a trigger or you're reading the bin log or does the application do a write and then So, trigger base is not here it's over here, this is our home code so this is not SQL, it's sitting on top of my SQL but it's more like a key value store it's shuttered and logically inside there's a queue and so when you do something an event is generated that event is then put inside a queue that can have a lot of subscribers you can have half a queue or the other queue uses Python-cellular queues but you know what that is celery, Python, no? It's a framework within Python it's a dispatch framework it's almost like this case plus a task manager, plus a bunch of other gems but it's it's a reliable queue and as reliable as Python will get and it will trigger other microservices from something and that kind of cascades and sometimes we do it for things like we want to build second-rate text on the store if you want to build a second-rate text that's also done as a cascading event inside this so you have a service that builds the second-rate text but so then what we do is we pull the data out so as soon as the truck completes we get all the information so you do not do both loads and HDFS it's all messages that it goes like this it goes like this so you have these guys as a subscriber so all dashboards you know exactly how many trips are completed which geo and which part if there's a drop in revenue or drop so we know what it should be the demand should be the demand drops and we know there's a problem somewhere and that's the first way we find out immediately that there's a problem there may be some software rollout or maybe it's somewhere else based on what's happening correlating this data and this data and then we move to loop and here we create basically a warehouse and we have our tables literally 10,000 tables that essentially we're all these tables one for one so the development and project is very nice if you're a developer service API, you write your service API and then you can start the data is already arriving it's already there click, click, click, it's done we use Avro but the data is in JSON so we do some schema validation and stuff like that we build high tables here we use Presto for interactive query how you follow big ETLs then there's a theme that takes that raw data that converts into model data so the raw data is available immediately again 80-20 rule 20% actually here's a 90-10 rule 10% of the raw data is actually valuable 90% of the business is ETL because it's too much data otherwise so when you say 10% ETL like for example so for example we don't ETL level wide events don't care about it, just throw it away it's there, you can query if you want but let's say pricing, driver satisfaction customer satisfaction revenue information fraud information how many logins did you change your credit card information, did you change your taxel number just a quick question about a few of us so how do you decide which 10% is important so that's a very good question by the way so the question is how do you decide which 10% is important I don't actually I don't know because I cannot give you the ROI the return on investment of what I'm doing here it's the business line that has to tell me that if they spend this much money they can justify why they're spending this money so as a data scientist I have no clue is this important or not so if you want your data to go quickly you pay for it so there's a charge back kind of mechanism to say that if you're the guy who's responsible for signing up new writers and you want the data process in a way because I'm not going to take the responsibility of figuring out if this is important or not if fraud is only 0.1% and let's say my revenue is $1.5 billion and my fraud is $10 billion it might be cheaper if the fraud happen and try to fix it so it's a tradeoff and it's not my decision so as a data person business lines have to decide that they won't do something with it and they may be happy it might be productivity issue so for example some teams like the city operations teams that we had in university there's about 10,000 people if you count all the 600-700 cities who are working there well we need that many people and group productivity is 10,000 every day so it might make sense for those situations to model more data because it's group productivity or make more tooling because it saves their time so there's a lot of not just a lot of a lot of angles to do so let me ask you a question so there might be cases where to do something for one group it might not be efficient but 10 people might need summarization or aggregation of data and if you do it for all 10 of them so even though one group may not be able to pay for it it's actually a good thing to do it because there are many groups using similar data and yes, we do look at that very often so you're asking me what an organization structure about the people behind this which is kind of where you're heading to so let me tell you how this works because that's the way to figure this out so we have people here which you call the pro team these are the pros, that is these are the guys who have been in Uber for at least 2-2.5 years who have worked in every city as city operations and then they are the best of the best effect to run helpers of the data center so we have a pretty large protein about 60 to 70 people that know what operations guys want and they say this is how this should work and they are not data centers but they know what it takes to run the business from operational perspective and when they send an email out the other 10,000, they listen to it I mean, those are handpicked guys and they are very very good at what they do they drive the decision with the product managers who decide the data product managers that decide how this won't work it's not so it's very and that also stops from all those 10,000 people talking directly to the engineers so they can channel somebody ask for this they can tell it's a useful thing or not a useful thing or it's a very common pattern that we need to optimize for so they figure those things out so it's but you know what we're probably off by 50% after all that and every company has this problem and are we using our data correctly is this the right data is this data even the right things we're using to look at it is the data quality correct actually this is a lot more complex in this simple line where are we losing the data how many records are we losing did we say go back let's backfill again because Harouk lost something because I mean previous the thing we trust and then we backfill we get enough duplicates and we have to figure those things out right so it is I think everybody has this growing pains that's why we do it in one team and then enable it for everybody in Uber so that they can do it but that's interesting as you said 50% of the data we get right we know what's important I think that's one of the biggest challenges of the 10% that's probably right X% and that's probably another factor that has to be done but we haven't done but it's there historically we don't know who's consuming it so we can't touch it so a lot of people here are getting into data science and they're learning and they do a Kaggle challenge or they do a Coursera course and it's really nice when you have a data set given to you with features that's beautiful you can run algorithms on it and you can show how good you are but one of the things is what data do we want what data do we need what data is important where is it being generated or can we do something to generate data that would be useful for us and that's a real challenge sometimes you have to take a risk and say okay let's try capturing this data somebody might say how much money are you going to spend to store that data can I we're here to listen to you but I'm just trying to get you to talk actually that's a very good point you bring up which is we have a very powerful experimentation framework we've set up here so the guys, the decision makers and criticism which I didn't say here's an award I created award means a group of whatever you're talking whether it's drivers or passengers or type of car or jeo or whatever and then run an experiment on pricing or product or something and then you can say what percentage of the rights should be that 5%, 3%, 10% you can run ABC just alphabetically and on alphabetically there are 5 experiments simultaneously and on 500 experiments simultaneously and you can classify totally automated that's all there to learn now after we do that if we forget about compute the experiment the worst case is they make an experiment instead of saying okay now make the code change and you're done with it because in the experimentation code there's a reader automatically inserted into the code so it's very sophisticated on that part I'm pretty impressed I didn't bother because they're already so we'll do that sorry just to be clear on that point you're saying that that test group are actual end users and it will affect things like pricing they see or is this purely simulation so you're AB test team with real users so I can say 5% 5% of all users in Singapore I'm going to see this place or they're going to see this this new product or they're going to see this something else and then we'll see how it works out like overview was tested that day overview was tested that day and within overview there's so many things you can do experimented with actually during lunchtime it was delivery time important or food is important so what we did first was we in some areas we reduced the menu items and had the restaurants pre-preparable only 10 items today but delivery is under 2 minutes under 3 minutes so we get cars preloaded with the food so they picked up the food at 10.30 am or 11 am draw around waiting for orders and they would deliver in 2 minutes so faster than you've been climbing under elevator and more food was set we did that other one was ok you want a big variety and then does that and then delivery pick up on the 3rd or pick up inside it was all done without great experiments and those are interesting decisions you make now but you have to think of the experiments problem also is like people don't turn these experiments off and the data is coming in and we're stuck with it and the problem is for me to go and justify to my boss oh we need another 1,000 i gave you that last month i said yeah i gave but it's gone oh who took it so that's why we have this charge back system to point you know those guys you go and talk to them they're using it if you need money go and ask them for money so that's been our only comeback and then there's a garbage issue you have a team that set up temporarily to do something they set up some data that's coming in and then the team just span it now what do we do with it right so we have now started doing i think i mean it's a bit drastic but we will delete your data unless you renew in the business once right which is kind of completely sends all the administrators into the nightmare because they never delete anything so anyway coming back here this is all buzzwords so we build our own query system with cursors and everything so we run maybe a million queries a day on a massive query reload so we have combination of higher presto and the database and we run it this is how everything behind spreadsheets and everything runs and this is every city of business so they can use this to figure out combination of this so for example in the united states we use jiffy loop for all of our we may have to come in car inspections jiffy loop is a chain of 10 minute change so if you're a driver you can go make an appointment or just drive into a jiffy loop and get it done we need to know if the queues are getting very long at one jiffy loop the wait time is 2 hours in this one and it's in the 10 minutes or another example is this time the super bowl what happened was super bowl is the biggest event in the united states in terms of a show or you can think of a dangerous concert or a basketball game around the stadium you have a lot of cars because they know they won't come the drivers already know there's going to be money to be made and pick up people because there will be trauma person but the police close all the roads half the roads are cut off because of traffic jams and only a few roads are operation how do you tell the drivers that their pickup is over there but you can't get there this way because google maps is sending you this way you have to go this other way that's happening for a year so i'll sit the operation guys and that city can figure out what's the right route and see if the traffic jams are happening and the drivers can call them and go this way so they manually override the google maps it's totally there's no other choice things like that so that happens through here so it's very operational and also very machine learning which is generating models piper uzi piper is a workflow engine we built to uzi is just i don't know how many guys how many guys know uzi how many have heard of uzi and i took a look at it and i don't want to use it okay so it's xmab xmab is not meant for human consumption or machine consumption i don't know what you want i can't deal with xmab so we wrote a much nicer one with a nice GUI youtube for people to use instead of their data pipelines so this is really a workflow engine that you can set up a data pipeline and do some various yields and processing and then feed it to yourself we don't need to be involved in how you do it similarly you can set up machine learning part jobs run this query on the raw data with high or something produce a subset run this machine learning model generate a model and put up a dashboard and do it every 6-7 hours do it every 8 hours update the dashboard 3,000 such dashboards run automatically so the goal here is to automate like crazy so we can do whatever they want we wonder if we can do it so we can do it do it without there's something wrong with this mic alright anyway so with this so you allow people to do queries you allow people to build a pipeline you allow people to run machine learning and part jobs and what limitations do you have on that? we don't do so we do not give so we do not allow so we don't give direct access to people to either of these systems they don't have so we have full cocoon for people so they cannot bring down their business we have all in terms of working temporary hires so they can kill the rest of the system by doing something and I might just say that's your average employee in many cases and oh he's a smart guy he's not an average, he probably wouldn't even die but he doesn't know what he's doing here so can't take the risk see at the end of the day with Uber there's a lot of people inside and that's what for many of them it's a full-time job and there's a million of them and we just have to make this work correctly all the time but you have a very hard problem here which kicks me off at night so all of us and we know this for sure we all have nice big create checks but if you're an employer or you make a mistake it will be a $5 mistake three or four times in a row but it makes it up to you the next page you start to mistrust him right? even one mistake, right? it's only $5 and if you get $10,000, it's $5 think about it privately you make one mistake like that he thinks you're doing it deliberately he thinks Uber is not going to be trusted and that memory remains for a long long long time it doesn't go away so we try to bend over backwards to make this exact I mean, it's very important to get that exact payment money has to be exact it can't be over, it can't be under if we overpay somebody we can never get it back he's going to say no, I did it, your system is wrong and I did the trick your system is wrong so I'm going to argue so we just need to eat that this this is the same that's the way things work here you get money into people's hands and it never comes back so how do you isolate so that's where the valuable data is there's people that need access to that data so how do you what buffer do you build between the people who have to do queries and the data that you don't want them to run rampant down so we have a mission control everywhere so you have a mission control here you have a mission control in the top end you have a mission control here you have a mission control there a white flag so for example we don't want an awful lot of our files and hdfs you know hdfs can do whatever you want to do your files suddenly we hit 90% capacity so the system is going to go wrong or our mpp database suddenly this full got to 85% we were in trouble and then what we need to do is start leaving data or figure out who caused it so we have governors everywhere no single team can do so much and so we are bared for misbehaving happens all the time and sometimes our own stuff is misbehaving and we have to watch after that and we just learn by getting bared so you see something running too long and you kill it well we get an alert something that we said was alerts alerts for everything everything we watch but that doesn't mean at the time that the watch is set because we got burned because we forgot to put a watch it's not like we were so smart we put 5 alerts and other 5 go off so and then once you fix it the alerts are actually useless so we have an alert system that's overloaded with alerts watching for alerts that probably never going to happen and then we get issues but it sounds it sounds like TSA Transportation Safety Administration in America it's like yes take your belt off because somebody try to put it in a belt take your shoes off somebody try to put it in a belt and I just won't put it in the shoes of the belt that's reactive with this it always may do one and suddenly it will hit the roof yeah so luckily we're not that reactive with the TSA but we do kind of see if it's going to happen a couple of times but the challenge we have this is the standard now we have the self driving cars of all the set of things that I cannot put up here because it's a little bit too complicated but that system is really massive here so just fun facts Hadoop how many nodes are you running what's the typical machine like typical machine 24 cores 128 we're going to do 24 drives about 40 8TB supernova we have a few hundred clusters so it's plus is the biggest cluster but the whole setup is completely duplicated in our data center and this half is used to move directly across to data centers so it's all active all active both sides are processed inside the agency and if one goes down we had no fail over time so the database we built our own reputation stuff and you're doing some work we're going to came oh yeah we cannot afford to lose a single transaction because of the payment problem like in real time so and we have to make sure it's in both places so there's like there's like 25 different variations of axels running across different kinds of services we have all every kind of new final technology probably kind of and that's kind of a bit of the annoying thing because it becomes very hard to monitor and very hard to administer so we have a very big explosion of technology and we started now an effort to consolidate but our goal always is to make sure the business runs and they do whatever they need to get going and we will address it once they get the business out of the market and it's trying to expand and is it also because the cost the question is isn't there a bit of the cost that they've caused in these technologies or is it I don't understand we should have the explosion of the technology so the explosion of the technology happens and it's because some business team comes and they say we want to build this team we want this set of features we can get there as fast as we can and as per what's your roadmap and we might have other many things and by the way we don't know if we should take this because we are like more of a platform it's only a one-off we don't want to make it a platform we maintain it but then that comes as more if we use it so by definition they're going to start as a one-off because I'm just starting like that as a one-off Kafka and this were not one-offs these were all designed but we went from Postgres to MySQL we were first to run Postgres in scale and do MySQL and we didn't get a problem and I think this is just about a year and a half so one of the things we talked about before I was curious how do you expose the existing the data to one-offs and to innovation that's not currently part of your operational pipeline so the question is how do you data to the one-offs or the other way around how do one-offs make use of this so their jobs somehow get into Kafka so we actually provide a schema service we will not take any data here that doesn't have schema we just tried the no schema approach with JSON and all that and just blow up your faces because people just started doing crazy things like those structure inside a bar chart inside a single string which would be a full JSON structure 25 fields that one feed would be 3k or 4k because it's a full JSON something garbage there and then it comes all the way here and it goes through an operations research because it's all a single column and if you do the CPU profiling it's all in string operations but it has no actual database work same thing over so we kind of put that in place we didn't change the code so they do JSON but we validate it before it gets into Kafka the schema service will say this schema needs to I mean I'm thinking more about how to pull stuff out of HDFS for my own one-off analysis so somebody wants to do exploratory work just a curious business analyst wants to suck some of your HDFS data out but the main thing is through here or through here so a business analyst wants to run a simple ad hoc for a reason and so on he can use his local list to do whatever he wants he can use his local list but from here he can do Python and so on Python R Spark, so Spark these are the things that support and run through other things we actually have started supporting GPUs as well because there's enough stuff in machinery to do that so they can run the combination so we can say we can say it over here and run it to this part and set a full pipeline to do that and so this pipeline can say run it where you can run some Spark and put it back into a combination so if you just govern on two things govern on security and govern on how much low you get apart from that apart from those two things you have full access to it and the reason for bringing it into one place is entirely because of what we just said is that we don't really want 500 plus fix, we really want one we really don't want the data mark because it's getting extremely old so this is actually 12 clusters of databases and now we're going to do something that your battery ran out what we do over here is maybe 24-25 different data marks individual clusters so so i think it's probably we use vertical so performance is extremely good with vertical and but that means a high degree of modeling so this process is heavily automated by this thing again on how the details are happening so the data flow from here to here can happen under an hour for some data because the models are pretty clear what you thought about how we lay out the tables and all that and we slice and dice in different combinations for different kinds of customers so just 25 per cent for different kinds of over 500 servers what's MPP massively parallel processing database MPP databases MPP just started massively parallel processing so you just charred it and it manages the query across the shards so i'm going to let you guys ask some questions in a minute but i want you to talk about one thing because i thought this was really interesting this is something that we were running into and we spoke about this last time you were in town and i thought your answer was really interesting so now you look at all this it looks really nice it's a beautiful system what happens when something breaks okay so you've got these data sources over here when you've got these business users over here to get all these critical business processes something breaks okay it might be a developer omitting a critical field or it might be something you know some system goes down and then the data is not being transmitted but let's say it's they change the schema i don't know what happens to us is one of the developers just decided to change the date and time format without telling anybody okay it's time to 2016 dash whatever and it's like oh great so if something like that happened to you how do you deal with that because here you've got pipelines and you know like that one field if it's like you know transaction or end time that can be replicated to hundreds of different tables and you know thousands of different use cases and reports what do you do what do you do that worries about that kind of stuff that entire they love it because they're like kind of guys who so they do so we have very simple checks before even data is about to get in right here and i talk about schema service dumb checks hey this is an age it has to be between 0 and 99 so you can't be like 500 in an age or if it's an average it's below so so we do that for new service of course it does make sense to turn this on for everything because once we know it's pretty good we want to turn it off so like i said it didn't show us because service that will do that that can dynamically tell your client just check for the schema to validate this before you allow the data to be used is what we used to have in those kinds of problems and then you'll find like just start dropping all the records so we have also counts that we track across the company records and the age of the records so we can tell which pipeline got software and it's graphically shown so there's something called a data book we have that shows every table set service work and as we start tracking the lineage automatically so it tells you this column came from these 5 other tables which has joined from these other 5 tables there's space for of course to just continue to write comments to say okay so it's actually a webpage that will show you all that and you can move a cursor on that and so if you're querying the data it's not sure that you can actually see why it's not sure because it is 4 hours late and the data hasn't arrived yet and so maybe you should not be querying or maybe you can query directly or whatever so it's not, this query size is not that automatic where it's somebody who figures it out but we have enough data quality measures to tell you where I mean maybe we built it for our own debugging issues because we were getting the kind of things you were saying if somebody would mess it up and you know being at the bottom of the protocol and for example where our CFO was using test data to figure out to show some financial results think about that and it was some test data that was showing up and actually this doesn't look right, what's wrong and he said you guys messed up the data so we went and then we found it was some test data that was showing up in his expressions and for us it looked like good data but you kind of have issues like that that have to be again, all those alerts we put in would have found nothing so now we have another batch of alerts and sometimes they can't be loaded but No, I'm just saying we're doing similar sort of things, so the next version of our system is going to have a schema registry so when people publish data that the rest of the business has to use, that's a contract with the rest of the business that they'll provide and we have actually put very strong rules on how you can evolve this schema you can only add fees if not remove fees if it gets duplicated then we will go and backfill all the new fees that all that happens automatically because the ETL is also generated and the reason for that is if the CFO does a query up there and it doesn't work who's he going to go after he's not going to go to the guy back here developer here on this end it's like this guy did it you're not taking responsibility but one of the interesting things you mentioned about this before was if something does break a lot of times you'll just continue to capture the data everything works fine but hold on the definition of what that field means that's another problem so it's the same thing how many hours somebody worked the definition of how versus how it may be different because they use different tables and if you have billion records it's going to go completely way up on this so they think they're looking for the same metric but because they use different basis for depolation of something and the data is inconsistent that way so it's another data qualification we're building actually something called metric definition that tells you how it not tells you it's code that you're required to use that if you want to generate this metric so all users of that metric are going after the exact same definition and 500 such metrics so there's an entire metric team just from that right and they will define exactly how this metric is defined and that's how the rest of the world has to use it so that becomes that those things are adding towards data quality data quality has to improve and other people won't believe your data so do you guys publish metrics on the quality of the data and how do you how do you do that, how do you communicate the metrics our data to be 100% accurate so we will tell you every table what's the latency how much data is up to date what are the errors we found what are the record counts and it's there so we're not making anything up we publish it every day both from that data book dashboard as well as email list so we have full record of what we publish going back for whatever it is any outages we communicate I mean really it's not I think roll up and fix it and and I think everybody is trying to record but it's going to happen so we just have to and say okay this is how we do it and make everybody aware that's the biggest challenge of data how do I know what I have we do it a little bit more when is it whenever whatever moment you cut off you're going to have 5 transactions 5 over here and that's the nature of a pipeline it's not like I'm going to turn off the tap here and wait for it to come so Related how do you ensure that the data is coming in at the same time from different tables actually it never does so and we send you no data ever comes in at the same time from the perspective of life I mean it doesn't happen to us in real life I get electricity bill even the credit card bill is wrong they say you get this addendum like afterwards they say okay I fix this I'm not fixing those in search of the credit card because the time they print the credit card bill it has to be done 2 days because it takes time to print it and send it to 100 million people I mean you first of all getting paper statements and then you will see that and they cannot change the statement because it's now a matter of record so they have to do it in the next statement so everybody I think it's a fact of life I'm going back to that standardization about the metrics and queries how do you expose it so we put it in elastic search elastic search is a big part I didn't show that we have so this is the back end really the back-back-back end there's a lot of stuff on top usability is very nice but a lot of effort for visualization team that uses elastic search to index out of the metadata is kind of what you're asking and look for it and I can't show them to you because it's internal stuff but there's a lot of effort put into that I think developer productivity is a key thing for us it cost us too much to have people struggle to get their stuff going just cost too much because if they have struggle to get their stuff going then the rest of the business also suffers so we have a pretty big developer productivity just worrying about other developers most of the people it's part of and they worry about things like language support we have four languages already Go, Python, Java we have to support that anything we do support for all four languages database that support for all four languages hyper support for those and now Scala as well and this has support for that so we have to use all those languages and everything two of the points that you've mentioned about database schema and developer productivity I think through your comments it's clear like a new database like MongoDB although they claim to have a lot of it's easy to program with them and developer productivity is high but looking at it it sounds like actually that's the one thing we don't use for it's surprising Shreve has got opinions on this I thought so that's why I asked so we use every new SQL here so we use HBase, we use Cassandra we use React we use Redis we do not use MongoDB and we do not MongoDB has all kinds of problems that's why I wanted to know because Baidu has recently moved off to MongoDB from high SQL so we are actually considering MyRocks you know what that is How many of you have heard of RocksDB? How many of you have heard of RocksDB? No, nobody has heard of RocksDB Good RocksDB is just a library It's okay, we need to talk more RocksDB is this library that Facebook built it's a single node there's no RPC in it it's just a library you can build a server on it but it does, it's a level DB database with some very good optimizations and it's lightning fast and it gives you a table basically a key value store Facebook uses it for a lot of things The Facebook page took InnoDB and RocksDB underneath MySQL and there's a lot of blog posts on it so now the combination is called MyRocks they've been testing this for about a year a year and a half and now they've deployed it everywhere Facebook is probably the biggest MySQL deployment on the planet and we're working with them so they also use Vertica logs working with them on Vertica I think we have more advanced than them on Vertica we have more advanced than us on this thing so we're going to start testing that now and it gives you about I think it's twice as fast but it's monitoring transactions I don't know where Alibaba is using I do where they're using or where they replaced MySQL with Mongo but the important data is not in Cassandra it's only in MySQL or Postgres Do not trust anything else So you mentioned Facebook just now with regards to using a certain database but they have a memcache layer on top that basically serves their quick request as soon as possible So the question was Facebook uses memcache T with MySQL and do you do something like that Yes, we do we have a cache layer In fact, our cache is a little bit more aggressive and we know what kind of data we're caching it's not like we are data aware caching so if this thing goes down there's a cache layer in the front this way is cache and it's heavily sharded cache so if this thing goes down we completely operate our cache so we have an instance where we have a cache and cache so things like user profiles files and other kinds of cache it doesn't change that often so that has happened we have a situation where administrators have an accident event and they messed up the database whether it's MySQL or Cassandra it's not going to save it so that's happening we recover from that we pack them up every half an hour it's backed up we test restore every day we make sure it's completely automated how we pack up now we restore so all the backups are tested continuously so we know if there's a problem that we can recover within half an hour so operationally we've done very good things very strong operation and how are we running on time how are you running on time these guys will stay as long as you talk and from what I've seen you'll talk as long as people are listening so this could go on forever no what where's the beer man they used to allow beer in here we can go get some beer afterwards how much data do you plan to create your own maps can you say that again you plan to use your own maps so are we planning to use our own maps rather than google maps eventually it's a various reasons for it but i think number one is accuracy because as you've seen pick up points and then so landmarks are not exactly right and Singapore is not so bad you go to India and then it's horrible you go to 33 and one third is the address or behind is some temple and the second row from there so that's one of the reasons and now we're officially dead so is this oh now we're back which one are you going to pose i got it i'll turn it off okay but along those lines you guys gotta be generating a lot of really valuable location data what are you doing with that are you monetizing it are you exchanging that with the mapping companies for some value okay okay okay i mean that's something that we run into at lazada you're talking about go down the street turn left literally somebody posted oh my god i can't believe the package got delivered go to the police station when you're looking at the gate go to your left walk down the street there's a fountain right walk 100 meters look for the silver gate go through that it's the second door on the left the package got delivered you know i mean it's it's amazing but you know how do you geolocate something like that you don't you know it depends on people's i think i'll try again so it's dead so a max of data for server private cars and that's the primary reason and two maps and there's long side benefits that you can get much more accurate maps for server private cars or accurate to the crazy accurate crazy crazy like to 1 cm accurate so that accurate maps are not what you can see but i'll leave it at that one more question but package drone actually a remarkably large amount of our packages i've delivered by motorcycle it's just the nature that's how you move through jakarta if you put it on a car we'll just sit in one place for a day anyway in india any more questions so i guess we're done do you have any words of wisdom that you want in part on the people here or a question you'd like to ask them and if no then i'm trying to yeah jet lag i know so i'll leave it at that i had a presentation morning at 9am today the word of advice word of advice i don't know i mean you guys is also smart okay alright so big round of applause