 Okay, as Roy mentioned, my name is Mark Huksusman and I'm working for eBay for almost four years. And I'm managing the data architecture, metadata management, enterprise warehouse, business intelligence, this kind of area of my responsibilities at eBay. And we will spend half an hour, we don't have a lot of time today, just talk about how metadata management actually helps to build, come on in guys, sorry, come over, how to build agile data warehousing in business intelligence. Because I think many of you know what is metadata management system, how it's important, but it's very important the value of what is metadata management system actually bring to enterprises like eBay. And I just want to spend a couple of minutes to explain what amount of information we have to process at eBay at this time right now. And we are actually managing 50,000 products, we are dealing with more than 60 million transactions a year, sold in goods. And we really have a big data. When people talking about big data, they're talking about maybe thousands of transactions, millions of transactions, we're talking about billions of transactions. You can imagine what information we have to process. And just information to have this and store this information doesn't help. It means we need to build some kind of tools and environment how we can analyze the information and actually increase our revenue and work with our customers. So we will spend a little bit of time, I will just explain to you our basic warehouse environment. And I'm very proud to say that we are managing right now the biggest Teradata system in the world. Our data warehouse is built on Teradata platform and we have two main systems in Teradata. One is what we call Enterprise Data Warehouse. It's mostly used for BI analytics, custom reports and so on. And another system, what we call Singularity, it's mostly built for research and deep analytics data mining. You can see the sizes of the warehouse, it's around 50 pentabytes of data, what we're processing. Pentabytes, not terabytes, not megabytes, pentabytes. At the same time, a couple of years ago, everybody knows what is Hadoop. We have to process unstructural information. We're receiving lots of unstructural information. And we built a humongous Hadoop systems. Right now, it's around 50 pentabytes as well. And this system is growing as we're talking about right now. And Hadoop is mostly used for our discovery and research and some very deep analysis. At the same time, we have to be agile because this humongous system will be processed and our business customers actually talking about give us information. We have to do this research. We have to do the analysis. Right now, we cannot wait half a year until you will move some information from one system to another one. We need to access right now. So we provide, in this humongous teradata systems, what I just described you, we built ability to almost ad hoc give end users ability to create virtual datamars. So for them to create a virtual datamars, they have to understand what kind of information they have to bring to this virtual datamars. And why this is one of the reasons why we have the metadata management system metadata management system that end users can go to our online tools, select the information, put a request, and in a couple of hours, database is created for them with data itself. This is why we created global directory, what is our red name for metadata management system, what we call global directory. As all of you know, possibly that big corporations actually operate in some silo mode, that many business units actually working completely independently, they have their tools, their environment, everybody likes wiki pages and so on. And sometimes one business unit have no idea what other business unit is doing, or how they calculate the information and so on. And eBay was not a surprise the same, did the same many years ago. And around three years ago, we kind of realized that we have many departments, business units calculating the same metrics or some kind of output information and using different formulas for example to calculate every selling price in our warehouse. And information was spread around different websites, share points, wiki pages, we didn't have any standards in place, and we realized that it's kind of slow our ability to grow our business and we decided to build our global directory or metadata management system to enable kind of common language that people can talk is actually metadata management system what we create. And we started very small and we did like what I call baby steps that we kind of move from one site to another site and integrated everything. And I would recommend everybody who would like to build the metadata management system, you cannot create something in one shot because it will be very difficult, it will be completely impossible to do, but you have to have something vision and you have to understand what is your end goal, but in the same time you have to start somehow to small and do this integration in different ways. We built our metadata management system to support technical metadata and business metadata. And I will show you later on how we integrated everything together. As you can see on this is our global directory or metadata management system consists of two main components technical metadata and business metadata. And we fully integrated with our online transactional process system what is built in Oracle. We have data informational data platform what is our data warehouse is fully integrated and integrated with our reporting or business intelligent tools. And we're using mostly micro strategy and tableau and other BI tools as well but they're kind of not so popular in a company. And for data modeling power designer, this is kind of our basic platform and we created many different standards to speed up and make our processes available. We kind of initiated this is main ideas how to be agile because everybody talking about yeah we have to be agile, we have to be fast, but you can be fast only when you have standards in place. When you one team talking to another team on the same language and we are not talking about English, we are talking about technical language, about the same terminology that used across enterprise. And you can imagine that eBay is not only one company like Roy mentioned GSI, it's a part of eBay. And they came completely from different world and we are integrating them and they will talk. They won't or they don't want but they will talk at the same language as eBay talking. And basic ideas about our success story that we are creating agile database modeling. Everybody kind of maybe know about the agile manifesto and database modeling completely not in part of agile manifesto. The people who created agile in 2001 I believe, somewhere in Utah, they even didn't think about databases, data modeling. They were pure Java developers and they kind of talk about how to build the Java applications but not databases. So we come up with approach how to build the models for our data warehouse because we have to manage a lot of information. We have approximately 20,000 tables, hundreds and hundreds thousand of columns. So you can imagine that this amount of information without appropriate model, you will not be able to deliver anything to our customers. And here is main key successes that we have to achieve and how we organize our work in a way that we can actually support our customers. I'm running data architecture steering committee as part of data governance component. I don't want to say that we have a very good data governance model. We're not so mature at that point. eBay is 15 years old company. It's a teenager actually. But we are moving to this direction. And we created a couple years ago data architecture steering committee that people can come over and talk about how to manage data, all the issues. We do have data governance, kind of data stewardship, data ownership that just kind of in the beginning stages. We support product development methodology, common dimensional models. And I will show you how common dimensional models actually help us to build self-service analytics because a very important point here to enable our end user to generate reports on a fly at HAG. Because before, in old way, business came over to technology, built me a report here, built me a report there. Half a year we're building a report, they come over and say, okay, we don't want any more of this report. All the stuff, it's old already. We have to start another project. So we created environment, the end user actually can go to metadata management tools. This is why metadata is very important for us to generate reports almost on a fly. And for sure we have global directory, technical business metadata, store compliance reporting. This is very important point and compliance reporting because metadata is good as it's fresh and it's not old. And it's not outdated. If metadata is not right, everything what you are doing is not right. So we have to measure how our metadata is good. And this is why we produce a set of compliance reports, how to see how metadata is actually close to real life. And when we started this process, I would say our database model metadata was around 60% compliant. Now we're running around almost 100%. 99.7, 99.5. On business metadata we 98% compliance. On security metadata we 100% compliant and so on. It's kind of very close to 100%. Here on this diagram, on this picture, you can see how our common dimensional model actually help us to build our self-service analytics. And our metadata management is a key component of all our integration points because all the models are available in metadata management tools. So as end user, you can come over, take a look on the metadata, pick it up, your tables, your dimensions, fact metrics, and by using micro strategy, generate reports almost on the fly. And when we started to create common dimensional models, actually we started this just a year ago. We used to have very traditional normalized, denormalized tables in our warehouse, our environment. People used to create reports, technology used to create reports for our end users, basically by demand and a new report, a new table and so on and so on, very traditional way. But we realized that I already mentioned half a year report, another half a year, another report. So we kind of use our layer of architecture, the more normalized way to build our common dimensional model and conform dimensional models that actually we created the shareable environment and we're capable to go from one business domain to another business domain and compare the value and metrics and use the dimension across enterprise. And by that time, micro strategy actually come up with micro strategy cubes that we actually, when we give the ability end user to generate reports, it's kind of a little bit pre-built reports, I would say, what we call guided reports, that all this information sitting in micro strategy cubes, so performance is great and we're capable to achieve the result in very fast mode. Any questions, guys? Go ahead. Yes, yes, it's a good question. I just repeat the question, if you guys, how we created our agile model-driven methodology and how we actually integrated our database model in all this agile processing. It's very important when everybody talking about agile and we have two weeks deliverable, so it's very difficult to fit into this kind of stage. So when we started to develop our common dimensional models, we kind of, and our architects is very knowledgeable people, we went back to business. We start talking to business and we are, our architects, very knowledgeable in the business processes as well and working in collaboration with our business users. This is very important component. And we're not building the model like from the end to the, from the beginning to the end. We're building the models in a way that we can grow them and expand them flexible enough to expand. We're starting small and build the skeleton of our future model and after this we propagate and propagate and propagate. We're not creating tables all the time. We created some structures, dimensions, and we reuse them all over and over and over. So we identify our main business components, business units, business areas, started from the conceptual and logical modeling and after that we could create our physical models. Did I answer your question? I know it's not easy. And, and to be honest with you, we keep outside our conversation with business from these two weeks deliverables. And if you keep this in balance and talk to your business and realize what business need before these two weeks they will deliverable. So when this deliverable comes you already know what they want. This is important. Another question. Good question. We usually have over night loads. We don't go like, we don't have real-time warehouse. If I can answer it, this is kind of very honest. Some data is more like we have our transactional data. I just want to have time is ticking. Our tracking metadata and tracking data is kind of hourly by hourly, but other information, customer information, other information, transactional information is daily loads. How the granularity different levels? We do have common dimension on the transactional level and on some depends on the dimensions we have weekly, monthly as well. And here just only one of examples of how we are creating our common dimensional models. So that we have, for example, before we got shipping information and shipping department responsible for shipping PI reports and so on. And we have customer departments who are responsible for our customer management. What is BBE is a bad buyers experience abbreviation. So it was before this we created the system, the analysis how dependency shipping of bad buyers experience was not available. Shipping did their work, customers they did their work to create some kind of reports that bring this couple of different metrics to analyze this information. As I mentioned before, it's a huge project that we had to generate and so on. By using common dimensional model and a set of tools and metadata management tools, we provide the ability to our customers and when I say customer, our business analysts and folks that they can actually come over and using our online web-enabled tools to generate reports and build the analysis and sees the dependency of shipping information and our bad buyers experience. For example, if shipping is great, bad buyers experience actually going down and opposite. If we have issues with shipping, our bad buyers experience measurement is going up. It's one simple example of cross-domain business intelligence. Yes, yes, yes. And here on this diagram, you can see that in the middle we have our global director or metadata management system. What is actually a keystone? You can see that we're providing ability to our business to go to our system and we're creating self-service analytics, self-service tools. It's very important to provide this kind of ability to end users to go and manage our business intelligence, metadata management tools by themselves. We build the platform, we're going away from people or technology develop some specific reports or some specific set of programs and so we're building the platform for end users and give in power end users with set of tools that they will able to generate business intelligence by themselves and metadata management is a keystone. As we go like business user can go to the creating business terms in our metadata management tool, business terms, metrics, formulas and so on, document all this information. We generate models on top of these business requirements and business terms and metrics. We store all this information in a global directory. Actually, this interface is like real time, as soon model is created, you save the model, it's immediately available in our global directory and our reporting metadata is fully integrated with global directory as well. When I say reporting metadata, it's microstrategy metadata, tableau metadata, any reporting tools what we currently have. End user by using global directory or metadata repository is capable to select and pick it up and choose what information they would like to see and how to generate this new report and all this data. So we are capable to create these models and create guided reports using dimensional models and microstrategy tools. Any questions? Huge volume of metadata, I cannot understand. So a customer or a lady or something wants to find some information, I'm assuming there's a huge amount of potential entities that they can choose. How do you help them find out their data? Good question. I'm going, I have training because people are changing, people are coming, people going, people more familiar, less familiar with the tool. So I just completed training in China, we have office in Shanghai, we completed set of trainings in our San Jose office in Seattle and so on. We educate our people how to use the tool, number one. Number two, we're providing some search capabilities, very robust search capabilities to end users that they will able to kind of pick it up the information in the right way in very fast mode. Number three, we integrate our tool with some environment what business users are more familiar like wiki pages and I will show you that we're not creating wiki pages anymore as we used to create them before. It's we creating dynamic wiki pages fully integrated with our metadata management repository. So we provide again the platform and tools to be for end users that they don't have to even think about how to pick it up, see this information or that information, it's become very natural for them. Here is kind of a data flow, very kind of very high level, data architects play a key role in creation of technical metadata. As soon as they creating data models, we store this all this information in global directory or metadata management repository. We have all the data sources from different places like wiki pages, our databases and so on store in our global directory. And what is interesting here, the global directory, it's not just a collection of information, but it's actually engine. I would say it's magic is happening all the time that this whole metadata coming from different places actually, we create like a special engine that working all the time and connect some pieces together that we, for example, we connect our technical metadata with business metadata. As soon somebody created a business term, for example, or abbreviation or kind of metric with a formula, we collect this information if report is using this, so we collect, connect this information together and we know that okay, this specific metric is used by X number of reports and what columns actually it's stored, this specific metric in what columns in what tables it's stored in our model. Yes. Yes. Yes, it's a right question and for sure ontology, for sure framework, for sure platform. And as I mentioned before, we're using self-service analytic portal and publish out corporate compliance reports. It's compliance report is our measurement. It just shows the value what we provide to enterprise. Self-service analytic portal, it's a tool, it's actually a very user-friendly environment where end users come over and actually find the metadata and use it to generate reports to answer your question. Yes. Yes. Yes. I will talk about it. Here our kind of very high level business domain model what is fully integrated with our metadata management tool. You can see is nothing new here. We have the same as everybody has customer domain, trust and safety, marketing, advertisement, motors, infrastructure domain, shipping domain and so on. It's, I think each company has the same stuff but you can see from many different links here that you can just click on it. It just give you a specific interface. You can click on this diagram and you will be able to drill down to specific table, column, model and so on and so on and so on. So you can navigate from different places. This is what answer your question how end user can understand various metadata. We give different channels for people to navigate to the right information. And here some kind of examples of our global directory browsing and some other information. Yeah. To answer your question, we built our platform using ASG, Russia. I don't know if you guys familiar. They have somewhere here. I think they have a booth here. You were able to take a look on what they are doing here. And I would say without Russia, yeah, we could survive. We did a lot of customization. We're using this kind of a baseline what Russia is capable to provide. They created a specific tool for business metadata but not a lot. We did a lot of customization. A lot of customization. And here example of compliance report that I just mentioned before and you can see that we almost 100% there. And the amount of information tables, it's 7,000, 10,000 tables, 100,000 of columns and so on and so on. It's another kind of point that I already kind of talked a little bit about Viki integration because everybody talking about Viki, our business analysts, I don't know how in another company they like Viki a lot. They just go there writing a story and so on. But Viki become outdated very quick. And I just want to make sure that when they create Viki pages, they're fully integrated with our environment. So we're creating this kind of creating a special macro, I would say, very, very high level. It's very easy to use, very kind of straightforward macro in Viki pages. And any end user can kind of put the specific code into the Viki pages and actually create full interface with our repository and Viki pages become dynamic and pick it up all the time, the fresh information. This is very important. Any question? In global directory, in Viki pages, you have to have password and everything. You have to have rights. In global directory, for sure, we give different permissions for different end users. We have role-based authentication and it's controlled for sure that everybody can actually look into the system and enter some information. But it will be not available for other people to see until this information will be approved by the right person. It's kind of a workflow, approval workflow for sure we have. Any other questions? Here are some examples of Viki pages integration that you can see. Here, it's a Viki page and it's this Viki page taking information from our metadata repository. And business glossary, meta glossary application, it's actually built on top of Roshad ASG. This is what actually we're almost using whatever ASG provided to us. And all the business terms, business metrics, workflow, external docs, everything is available in this application and is fully integrated with our technical metadata. And here is our data hub or I would say this self-service analytic website that global directory is part of this entire EBA set of tools. And end users can go to this portal and actually navigate to the right information. We build set of web services to empower our end users and they can select the information that they would like to see. Any questions? It's I would not say demand by demand. The question was how long is we actually created our system? I don't want to say that we created. We're still creating. We're still working. We're still integrating. I told you that we started very small. First application what we created, it was model management. We just incorporated our database models. I would say more that we even didn't have database models for warehouse three years ago. Now we do. And the amount of work, we spent like three years in a row with very limited resources to Fox in India. This is entire team who created whole application and all those integration points. And we're going not demand by demand, but we're talking about maybe platform by platform. Like model management, one platform. I'm talking about all the logical models, physical models and so on. Metaglossary, business glossary is one of the first applications that we implemented. It's still evolving right now because for sure the company as EBA can you imagine how many different business terms we have and different companies. Security team came over and said, okay, we have to be such compliance. And we have no idea how to manage the situation. So we help them out because we already have model management in place. Now each column, can you imagine 7,000, 10,000 columns, each column has a flag. What kind of security value this specific column has? We have different like PAI information, PCI information, so on. So each column has a specific flag. And from the security perspective, 100% compliance. Operation Fox came over because Teradata system what we have, you imagine it's humongous. So we incorporated all our operational metadata working together with Teradata's team. And so this is kind of, I would say, platform by platform. Any questions? Go ahead, yeah. Okay, as I mentioned to you that we have Teradata, we have Oracle for warehouse, we have Teradata and we have Hadoop. For our OLTP system, we have Oracle. So data dictionaries, basically data dictionaries. From Oracle, data dictionary, from Teradata, data dictionary, DBC, what is called actually inter-data. And we're not enforcing, for example, referential integrity in warehouse because from performance reasons. And this referential and domain integrity on the primary case, foreign case, is available only in our database models. So for some ETL processing, they require to have this primary case and so on that we have both way integration. From one side, we see all the operational metadata in our metadata repository. From other side, we supply the information through Teradata Fox for ETL help them to process ETL processing. So it's two ways of communication. Excuse me? Runtime? No, it's not runtime, daily. It's take time to load, it's take time to generate daily. Any question? Business glossary is built on top of Rashad platform. Rashad is ASG, the company, and Rashad is, I would say it's a software platform with no SQL database, object-oriented database, and they created an application on top of this database. Yes, yes, yes. The question was what platform we're using for business glossaries and how it's integrated with Viki. So we completely integrated Viki into our environment, number one, and we're using Rashad as a software platform. Any questions? We don't deal with governance. This is a problem. We are a teenage company, and when I... Hi, Tony. Actually, it's a very good question. Two years ago, the first person who actually said data governance, it was me because I came from old school, and data stewardship, even people are afraid to say data stewardship in eBay, and now we do have data stewardship, at least some progress we are doing. I don't want to say that we don't have governance completely, but for sure it's not mature, and it's a lot of work to do there to become to more kind of go from this teenage to some kind of mature level. On the other hand, if I start thinking about, and I came from old school and many, many years of experience in this area, and I start thinking about maybe we don't need this data governance in an old way because we're still agile because when we're creating something in half a year, in a couple of months, it's become already outdated and old. So we have to be very agile in the right way, but for sure you have to have set of platforms and tools that will help you to build the governance almost like self-service governance. That if you create a platform and a level of tools and software and ability for people to do self-service, so they become like a data governance in a place by themselves. It's a very difficult topic. I know that people have different views, but to say, do we have data governance? Not really. Any other questions? A lot of work. It's magic happen inside metadata repository that we have a couple of smart programs running. I would say like this, the metadata is good as less manual intervention. We try to avoid any manual intervention in our metadata management. As soon as people have to data entry something, they will never data entry number one, data become outdated, not right, and so on and so on. We are trying to eliminate any linkage, any manual intervention to these integration points. We're using a lot of algorithms, a lot of processes, a lot of standards, and standards actually help us to build this kind of engines inside this metadata repository to link business metadata and technical metadata. As soon as, for example, every selling price, we have a metric ASP, so it has a specific formula to calculate, and we have engines that go to our reporting metadata, looking for ASP somewhere, generate the link, and create this linkage between these two places. As in a reporting metadata, we have ability to find, okay, we have a report. I can show if we don't have, we have five minutes, I can demo you on this stuff. We have reports, and you can see the columns what this report consists of. This information doesn't exist anywhere, only in our metadata management. If no questions, I can show you right now, if you guys are interested. Sorry, I have to sit down here. This is our tool, for example, our model account, and here you can see all the tables, columns, all the information here, all the description, and you're capable to edit. Here, a list of all the tables in this, oops, what did they do? This diagram you're familiar with this diagram, correct, guys? We have providing different searches to our metadata repository, and we fully integrated with search engine, what solar is the search engine that we are using to speed up the search. If I click account and say physical model, and I can select model, table, column, everything, whatever you would like, and very quick, you find this account information, for example. And you see all the diagrams, you can generate diagrams here, you see all your tables, all this data lineage, we implemented recently data lineage functionality as well. For example, I can show you, this is kind of a diagram, you click and you see where all the data lineage for this specific table where it came from. We don't have a column data lineage because we don't use an ETL tool in the right way. We don't store the ETL metadata at all. We do it instead of ETL, we're doing ELT. So we created a specific way to go, at least to figure out from what table this table came from. You can go through the source, this is very simple data lineage, I can show you some diagrams that you will not be able to understand here. I don't, but some people do. Yeah, yeah, no, everything what you can see right now, it's, Russia is for me to only a repository. Nothing else. Everything, it's custom developed. All the UI, all the engines, everything is custom developed. We use from Russia only ability to integrate with our database models. And do the scanners, they were using scanners for Teradata and Oracle. Scanners, it's kind of interfaces. You can do by yourself. Yes, yes. Two folks in India, not even here in the US. There's very small team and we use a little bit consulting time with Rashad, ASG company, maybe, and we went live the first release of the software, went live like in three months after we started. So we have to be agile. We integrate right now PayPal, we integrate traffic.com, ROGSI will be the next one that will come here. We have product management compliance report already mentioned, tracking metadata, it's a lot of stuff. All this, you can see a different level of integration points and application. And where is my reporting metadata? Yeah, BI metadata, microstrategy for example. Here's a list of reports and I have no idea what I will pick it up right now, but any report, I don't know. And I just picked up any report, guys. Trust me, I have no idea what I picked up. And you can see here description of this report, location of this report, version, all this metadata, what you need. We don't have data stewards here and primary product managers because it's within data entry. This is a point where people have to data entry, never do. If automatically everything is working fine. And here this is exactly what we do automatically that we have, we define all the columns for this specific report. And you can click here and you can go to our column metadata. And it's done automatically. This is what Engine is doing for us, that we can from the report we jump to the data model and figure out where is this specific table column in a model with all the descriptions and all this stuff. So we link all this stuff together that and it's working fine because it's automatically done. Otherwise, you know, nobody will do. Oh, yes. Oh, yes. Yes. Our models, we're using Power Designer, but you can use any database modeling tool. And as soon as our repository for our models, it's not Power Designer repository. You can have a database Power Designer repository. But our repository is actually Russia. As soon model is created, we check out model for modeling. Your data architect modifies the model, check it back into Russia. It's immediately available. We have a approval workflow so that if junior model or created something, a senior architect approves the model. Yes. Business people do a lot of data entry as well. Like, for example, there's comments here, correct? We don't have any comment. This is ability we give people, business folks ability to data entry this information. And they don't go even to this interface. They're capable to do this through the web portal, what is more user-friendly interface. Okay. I think, okay. More questions? My information here, I can give you the business card. Let me go to this presentation. Thank you guys for attending. Yeah, here my information you can give me a call. Don't call me at night, but any other time, please.