 Alright. Good morning, everyone. My name is Michael. I'm the Product Manager for Twitter Cloud Platform. And my name is Vino. I'm a Software Engineer in the Cloud Infrastructure Management Team. First off, thank you for coming. Together, we're going to be talking about how we build a chargeback system to help improve the overall resource utilization and efficiency of Twitter infrastructure. Unfortunately, I don't have a clicker, so I won't be walking around. So I'll be standing near the podium. So we'll start off with the agenda for the talk. I'm going to start off with an anecdote of how what prompted Twitter engineering to actually move into a more service-oriented model. This would provide context on the problem we are actually trying to solve. We will then deep dive into the chargeback system itself, talk about the engineering challenges in building that system, the product side of things, which is the reporting and things which come out of the impact it has had on our customers. And finally, the future around how chargeback plays a crucial role as part of a larger unified cloud platform. With that, let me just start off. Do you guys remember this particular event? Anyone? Okay, some people are nodding their heads, so my assumptions, they do know this. So this is when Spain won the World Cup, the Soccer World Cup back in 2010. This was a momentous moment for Spain, but at the same time, it was quite crucial for Twitter too, simply because we used to run, we were one of the largest Ruby on Rails deployments at that time, and we aptly called our application the monorail. At its peak, the monorail could serve about 3,200 tweets per second, which is a pretty paltry number if you compare it to what we're doing today. That said, that was a 5x increase at that point compared to the TPS. We had pushed or we had did everything to push the monorail to support that TPS request. We basically threw machines at the problem to reach that scale. We spent significant engineering hours tuning the Ruby VM, building deployment tooling, and even building tooling to detect failures. Unfortunately, not everything was rosy. Twitter did go down numerous times during that event, which begs the larger question out on what truly was the problem. We couldn't simply depend on the monorail for its reliability and the developer productivity problems. Given that routing, presentation, and logic were all tightly coupled into a single system, it became extremely difficult for us to debug, scale, and contribute code to the monorail, which brings the next evolution of what happened at Twitter. Twitter engineering basically reviewed what had happened after the World Cup, and we decided to re-architect our entire stack. We decided to move to a more service-oriented model, which ended up basically decomposing the monorail into multiple microservices. The gains were pretty significant. We were able to address many of the plaguing problems from the monorail. For example, it provided a clear way to define ownership of various services and even charter for the teams. This was crucial as we were scaling up to support the product. It was easier to know who was working on what. We built services that focused on specific aspects of the Twitter product. For example, the who-to-follow section is actually a separate service, which recommends users as to who you should follow based on your follow graph. The interesting piece is we were able to, like, separate that out in terms of fault isolation. So if who-to-follow would actually go down, it wouldn't impact Twitter the entire product, but rather that feature would just be gracefully degraded from the entire product experience. We were able to build and scale services independently, too. So who-to-follow service would not be seeing the same traffic as our timeline service, which is the bread and butter of Twitter.com. Finally, there was a significant improvement in the developer productivity side of things. Microservices made it extremely simple for developers to build, iterate, and launch their services pretty quickly. I want to share a quick anecdote from 2013, which reaffirmed our investment on the microservices area. Does anyone know this? Okay, so this is a movie called, this is an anim movie called Castle in the Sky. It's hugely popular in Japan. And there's one particular scene in the climax where the protagonist casts a spell. It's the word ballas to destroy a city. The moment the scene was played back in 2013, Twitter users actually took to Twitter to tweet out that word at the same time, which actually helped Twitter reach a peak TPS of around 143,199 tweets. This was 28X more than our average TPS we used to see that day. It was staggering. Our systems did feel stressed and many of them failed, but fortunately, the experience about creating a tweet and viewing a tweets were still alive and it worked well, because many of our core tier zero systems were abstracted out from the other supporting services. So if you did try to tweet, it would work is just that the product experience all around would not be working as effectively. But that said, people were still able to consume and create tweets, which is the most important part to us. So you can imagine hundreds of such events happen on a minute, an hourly and a daily basis on Twitter. It is a platform for live conversational and events. Tweets per second is not the only metric which matters to us. And to kind of re-trade on that, I want to share this particular event which happened 2014. Ellen DeGeneres tweeted the selfie out from the Oscars podium. So it didn't register the largest tweets per second, but it got re-tweeted so many times so fast that Twitter actually, the particular profile page of Ellen was broken and this tweet was inaccessible for over an hour. This was a huge set of zero back in our world. But it shows that tweets per second is not the only metric which shows the scale of Twitter, but it's also the follower graph, the network effects, which really shows the scale of Twitter. With that said, and given the uncertainty around the events, which may become viral, the important conversations which happen on Twitter, it is important for us to be always reliable, available for our users. We had to build a platform which is scalable and at the same time, extremely efficient. Our goal was to make sure that our engineers can actually do more with less number of resources. So fast forward to 2016, Twitter infrastructure has grown both in size, scale, complexity. This is platform engineering at its core. We are responsible to build the libraries and the services that make up Twitter as a whole. At the bottom, sorry, at the bottom we have the data center management layer. The layer above that is our core infrastructure services. On one side is the compute service, the other side is the storage service. On the compute side, we use Hadoop for batch compute. We use Aurora and Mesos for basic long running services scheduling. Mesos is used for host management while Aurora is actually used for the task and job scheduling. On the storage side, we have Manhattan, our general purpose, high throughput, low latency key value store. Blob store is another service used to store large blobs of data. Graph store is a service we use to store the social relationships or more like the social graph of a user. At top these, we have an array of platform and application services. This includes our monitoring, cache, messaging systems, the pop sub, our tweet, and even our user service. We also have a data analytics platform which allows users to interactively query on the large sets of data we collect. I also want to share that platform engineering is responsible to build frameworks and libraries which enable engineers to write code. For example, we have built a high, we have built an RPC framework called finagle to do all sorts of RPC calls within Twitter. We also built Scalding to make mapperies really simple when you're writing on Scala. And we also built something called Heron, a real time compute framework which essentially does streaming on demand compute to power our ads and recommendations infrastructure. There's actually a talk happening on Wednesday about Heron. You can actually learn more about how Heron works. It's an open source project now. We do billions of events on a daily basis. And finally, we've also built a lot of management tooling to make it extremely simple for developers to build atop our infrastructure. This means various self-service portals, a service directory which actually tracks all the services running at Twitter. We built Chargeback, which we're going to be talking about more. We've also built a deployment orchestration system to power our CICD pipeline and config management and many other things. All in all, this is really what platform engineering does and that's the group we are part of. And I want to show how many services actually run atop the platform. This is a snapshot taken almost two years ago. I don't have the latest snapshot. But as you can see, we have tens and thousands of services running atop Twitter, Twitter's platform infrastructure. We have hundreds of teams and this particular view you're seeing is actually a tweet being created. So the first request hits our router, the reverse proxy router called CFE, and then our PC calls are made to like over 20 plus services to help create that tweet. So given the scale and size of Twitter, it's kind of important to really understand what is really the overall use of infrastructure platform resources across all of these services. How do you know who's really using what? Given these number of services and the number of teams at Twitter, it's extremely important to understand how we can start capturing the utilization of resources per team, per project, per org. And finally, how do you really incentivize the right behavior for these engineers, the team leads, the managers, to do the right thing and using our resources? To talk more about how we address that, here's Veena. Thank you, Michael. We built a system called Chargeback. It provides the ability to track and measure infrastructure usage on a per-engineering team basis and charge each owner their usage costs accordingly. As Michael mentioned earlier, Twitter's infrastructure is very complex. Multitenancy was a first-class citizen on each of these infrastructure and platform services. Our perspective on the problem is that these infrastructure services lacked a consistent way of identifying customers consuming their resources. They offered a variety of resources that were not defined in a consistent way. There was a need to model resource as a unit of abstraction. Keeping this in mind, as we started designing the system, we identified top four challenges. Number one, service identity. We designed a generic service identification abstraction that provides a canonical way to identify a service across infrastructures. We built a centralized system that integrates with infrastructure services to help create and manage identifiers. Number two, resource catalogue. We work with the infrastructure teams to identify and abstract resources that can be published for developers to consume and build. This system generalizes inventorying these resources. Number three, metering. Each infrastructure cracked the consumption of resources by each service through their service identifiers. As the service consumes resources, the metrics that are written about usage needs to be extracted and consolidated. So we built the classic ETL data pipeline to collect all the usage metrics to aggregate and persist them in a central location. Number four, service metadata. We also built a service metadata system that keeps track of ops and other service related metadata. Overall, these make up a unified cloud platform which we will discuss more towards the end. Now that I talked about the top challenges, let me deep dive about each in detail. Service identity. What is service identity? A canonical way of identifying a service that consumes resources on various platform infrastructures. Let's take a look at the problem in detail with an example of Aurora and Hadoop. We see here both Aurora compute and Hadoop batch compute have very different identifiers. For example, compute has ideas role, environment, and job name, whereas batch compute has an ideas role, pool, and job name. I as a developer wanting to request both Aurora and Hadoop have to go through very different provisioning workflows. For Aurora, I have to use a self-serve system to request resources. On the other hand, to request a Hadoop resource, I would have to follow the old file ticketing can some with the right team. The core problem here is that there is no single system to provide a canonical experience to provision Aurora and Hadoop resource. To complicate matters, we also have multiple ownership tracking systems such as LDAP groups, email, etc. And with the system, one may have a very up-to-date user. On the other hand, it may have a user that is no longer with the company. Overall, there was a need for an identity management system which not only helps provision identifiers but also tracks ownership and is the de facto source of truth. So how did we approach this problem? We first designed an entity model which I'll speak more about soon. We envisioned a single pane of glass for developers to request and manage all of their projects and infrastructure identifiers. So again, if I as a developer want to request Aurora course or Hadoop compute or Manhattan storage or any other infrastructure resources, I would be able to use this system to do so. This results the disparate workflows by hiding the complexity of provisioning identifiers. This is made possible by providing interfaces and APIs that each infrastructure provider can integrate and implement. The end result is a platform that became the source of truth for ownership identification and enabled various other use cases such as service-to-service authentication and authorization. So what happens under the hood for service identity? Identifiers are scoped per infrastructure and they are associated with a service account. The backend that pass the service account is modeled to be pluggable and in our case we use LDAP groups. These service accounts are primitively used for authentication and subsequently authorization. Service is the binary that runs that uses the identity when it runs in an environment like production, devil, or staging, etc. Services roll up to a logical grouping called as projects. Projects belong to a team or department or call center essentially some entity that can be responsible for the dollar spent. Let us take a look at an example. Here we have our revenue which is a call center and ads serving and ads prediction or some of the teams within the revenue call center. AdShard is one of their services that uses the service account adshard again. The last row you see is a client identifier that is composed of the infrastructure and the client ID for that particular service. The team owns several services and we can create multiple service account based on environment for access control needs, etc. The same model can support all the infrastructures including Hadoop, Manhattan, or any other platform service. Now that we have a brief understanding of what service identity is, let us take a look at our next key component resource catalog. Resource catalog defines a consistent way of identifying and inventing of resources of various platform infrastructures. A resource in this context means either a physical or virtual entity that is consumed by applications, thus driving the overall capacity of the physical or platform infrastructure systems. Some of the problems around resource catalog are lack of clarity on what is a resource and how it is consumed. The dynamics of underlying infrastructure and different ways developers use them creates challenges to keep a proper accountability for the consumption of resources. Need to capture resource fluidity. With the utility computing model the resources are no longer a permanent entity both from infrastructure or service consumer perspective. Infrastructure resources evolve over time and developers wanted to consume resources for a specific period of time and be accountable just for that usage. Better support to model abstract resources. As we discussed earlier, things would have been much easier if you were computing cost on bare metal machines. But once we moved into a service-oriented architecture we had to consider breaking down these resources into more granular components like CPU, queries per second, tweets per second, etc. And these resources only went more abstract as we went higher up the stack. The need to define a TCO. Last but not the least we need to define the total cost of ownership of a resource over a unit of time. Here is an Aurora config example to better understand the need to capture resource fluidity. This config file clearly defines the resources that will be consumed as cores, memories and disk by this job Aurora schedules on mesos. In addition to this granular information, this job has a lifetime for itself that is currently not captured in this config but can be derived from the data that exists in mesos and Aurora. We created a model understanding the fluid and transient nature of the offered resources. This allows us to easily extend our current model to capture GPU and network if Aurora decides to offer them in the future. Now that we can create an infrastructure resource, we need to create accountability for the usage of this resource. That is to figure out how to charge these resources out to its owners. This was particularly challenging and we had to build policies and frameworks to ensure that the unit price of a resource is the total cost of ownership. Very similar to public cloud, we had to account for change in unit price over a period of time. This helped define the grain of the measured data. The grain is exclusively determined by few constraints such as source of truth, volume of data, etc. At Twitter, we decided to go with Day as a Grand Larty. Let us come back to the question how to define unit prices. We started defining unit price from the bottom up that is from the bare metal layer where we incurred true cost to buy physical machines. We calculated cost per server per day which includes Kpex and Opex cost of machines, license cost, headroom, inefficiencies and human cost to operate these resources. It is important to keep in mind that TCO evolves over time in account of depreciation, etc. In order to track these changes, again, we designed the entity model to capture these unit prices with a time varying dimension. These form the basis to help calculate other service unit prices as we move up the stack. For example, while defining a unit price for Aurora resource, we calculate the unit cost taking each of these components shown here such as operation overhead, the headroom, what are some of the excess quota and reservation like all the quota that is used scores and unused scores used while computing the unit cost. We designed an entity model that can capture the challenges in resource fluidity and unit prices agnostic of internal and public clouds. Let us review this bottom up. The unit price of a resource is captured in offering measure cost. Again, this is a time varying dimension that can change over period of time. The offering measures is nothing but the resource itself. The offering measure and offering measure cost alone doesn't make much sense without a context. To provide that context, we have offerings, we have infrastructure service and providers. Providers can be either an internal cloud or public cloud like Amazon, Google, etc. Let me show you how the entity model is able to capture both a simple and a complex infrastructure service within Twitter. Here we see, I have Alora for both Twitter or public cloud. This is more common. Alora is our infrastructure service. Computers are offering and core days is one of the offering measures and we have a unit cost for it which is again time varying dimension. We have our batch compute which is Hadoop and they offer multiple offerings such as storage and processing cluster which in turn has GB RAM, file access, etc. as their offering measures. Each of these offering measures have their own unit costs. Now that you have learned our entity model around resource cataloging and how to identify the owner of a resource, let us discuss about the next challenge, metering pipeline. Metering pipeline intends to solve the problem of collecting and normalizing the quota and resource utilization metrics from the available infrastructure and try to unify them using the resource defined in the catalog system. The end result is data that has utilization metrics from variety of infrastructure such as Aurora and Hadoop that is similar to compare and normalize to the dollar value. So let me give you a high overlevel view of how the system works. We built modular services that capture raw data from various services like Aurora Hadoop that uses a HTTP API and writes that data into a raw fact table. From this raw fact table we use a transformer that queries service identity and resource catalog now to identify those owner who's using those resources and identify what are the metrics and their cost against it and compute the overall cost in a resolved fact. We are dealing with cost and data fidelity is highly important to us. So we built a service called data fidelity that will ensure that the tracked cost is true to the cost and also will inform infrastructure owners or service owners if there is a sudden spike or missing data in the pipeline. Once we have this resolved fact we can generate multiple customized reports which can also be used by various data visualization tools. Now to talk more about the product and its impact and the future of ChargePak I hand it over to Michael. All right thank you Vino. So now that you learned how ChargePak behind the scenes actually works I'm going to share what the product really is and what do our customers use. To start with I'm going to walk you through three specific reports and then also give you an overview on the custom reports we have. So this is one of the first views which every team or organization at Twitter gets. This is the generic ChargePak bill which is computed every month and we send it out at the first day of the month. Essentially what you're looking over here is actually our tweet service the team which runs the tweet service is their bill. Unfortunately I cannot show you any of the dollar figures over there so X marks the spot. With that said they use almost six or seven different infrastructure services to power the tweet service. I cannot show all of it over here but at a high level we have Aurora Manhattan the user monitoring system. As you can see each of the offers and measures like Vino said is listed out in the bill. We also show utilization and then subsequently the cost is computed. That said that is true utilization by the way. The other report we have is the infrastructure P&L. Our goal is to incentivize even the infrastructure owners to be as efficient as possible at Twitter. So what we generate over here is an expense and a spend view for infrastructures which generates a margin towards the end. The goal is to have zero margin. We don't want anyone to make a profit or a loss. That said it doesn't matter because people won't be getting that money to do something about it. What's more important is if it's a positive margin we know that the infrastructure has done some optimizations of some efficiency work which has led to more users using it within the same capacity. We can then like reduce the unit price to pass on those savings to all our customers. Similarly if the margin was negative we then know that the infrastructure or the platform service is actually taken on more capacity or is basically inefficient to serve its customers. This gives us some good signals as still what's going on per infrastructure service every month. This is computed all on a daily basis but we do consume it on a monthly basis. Finally the third report we have is the budgeting report. This is something our finance customers really care about. They work with every organization in the company to build a budget and subsequently validate the spend against that budget. This helps keep and check teams like ads and revenue or even platform for that matter to not overspend their budget. That way we also understand what are the big projects being launched and we can specifically incentivize the owners to plan better so we can capture that up front and then plan the relevant capacity behind the scenes. The other custom reports are specifically for different types of customers are the infrastructure and platform owners themselves would like to know who their top customers are. They would like to know how well are their resources being used by their customers. So we can generate reporting at a very detailed drill down to tell the various identities how well they have been allocated and used by every team and every customer. This helps the infrastructure teams to actually have a more more constructive conversation with their customers. Service owners and the teams themselves would like to know the drill down so which projects and which identities as part of those projects are contributing to the larger cost. The finance team really cares about budget management so they use their tool for that purpose. And finally the execs would really like to see trends and the efficiency work happening at Twitter. They would like to know what are the big projects coming on, how well they have been planned, and how well they have been tracking utilization of resources to subsequently have much constructive conversations with the leaders and the organization. So what has been the impact? We launched Chargeback last year and I want to show I want to take a case study offer a compute platform and show the impact from that perspective. So the Twitter compute platform, what you're looking over here, the first chart is actually the allocated quota for all the customers of the Twitter compute platform. This is between June to September 2015. And as you can see, the utilized course has been flat while the quota actually has been increasing significantly. Which this data actually came out of our metering pipelines which made us wonder what's really happening there, who's actually allocating more resources without utilizing those resources. By August, we launched Chargeback with the dollars and this prompted much deeper probe into what's really happening. It was really interesting to see that the teams within Twitter were very proactive in seeking out that details and actually prioritizing efficiency work over other work. And subsequently, once Chargeback was launched, we saw a significant drop in quota but more utilized course. This was the next four months which followed after Chargeback launched. Overall, we saw that the visibility into the Chargeback metrics in terms of both dollars and the utilization itself helped improve our overall core usage against the quota by almost 33%. This is over the last one year and as you can see, the first three months when we launched the metering system, it was still going down but once we started making things visible across the organization, we had a significant improvement in the overall course being utilized against the quota. With that said, the other impact, generally, is Chargeback itself is true to the actual price. All the unit price which we were calculated are prices which encompass the total cost of ownership on the ground. So there is no funny money concept. It is real money so people do understand the value of it. Chargeback numbers were also used to serve for the purposes of capacity planning and budgeting so we could plan better and allocate resources efficiently. The visibility definitely enabled a lot more accountability than previously. People were more engaged to talk about what they need which helps us plan for events and other large project launches. It also helped improve utilization as you saw but interestingly, it also triggered some interesting conversations about comparing with how well are we doing with the public cloud. I'm sure many companies really care about how their internal infrastructure compares to the public cloud from a cost perspective. We were able to do those kind of comparisons with the numbers we had. It was I wouldn't say it was completely apples to apples but it really got us close in helping compare with the public cloud. And finally, the biggest problem of all it helped improve service ownership in the company. Given that we have thousand plus services it is really it is very easy for services just fall off people's graphs and be unknown services. Chargeback with its numbers helped bring invisibility for those unknown services where people were willing to take ownership and then we could figure out a path forward and either say deprecating a service or keeping it going on and things like that. So with that said I want to talk about the future on how Chargeback actually ties to the larger vision of our unified compute cloud platform. So we like to call this platform Kite. It's a cloud agnostic service life cycle manager. Let me quickly explain what service life cycle manager means in the context of Twitter. We want to enable people to write services in a quick easy way get the relevant resources for the services and launch deploy and subsequently subsequently monitor and kill the service as well. So it captures the entire life cycle of service. Many of the tooling which is available outside focuses more on the management of say infrastructure or other things. We felt that there was a need to manage service life cycle as a whole and thus we are building a unified cloud platform at Twitter for every engineer in the company to go to a single place to view the services they own the projects they own as part of their team to do to request resources to request identifiers manage access control. We also let them view their builds and subsequently utilization reports as part of the infrastructure. And then finally facilitate deployments because we have so much of metadata about the service. We can actually give data about the the success rates the latencies and facilitate deployments across zones and things like that. Our goal is to like onboard every infrastructure at Twitter which then automatically provides different aspects of the life cycle manager to the customer without individual infrastructure owners building it by themselves. For example when we did charge back and we onboarded many of those infrastructure services now their self-service tools are becoming redundant because now we have all the information about their infrastructure to generate a self-service provisioning system. So that way each of them can focus more on the reliability and the availability of this service rather than the management tooling side of things. So this is what we're building at Twitter. It's called Kite and with that I think I would like to end my presentation. I would like to thank all the people working on this amazing tool and making Twitter basically more efficient. All right, thanks. We are open for questions if anyone's interested. Yes. You might want to give the mic. Yes, the question was like we showed some reports at a very high level. Do we have tooling for report? Do we have tooling for our engineers which can give details on the low level utilization of resources? And yes, we do. The custom reporting which I didn't have UI to show for we use Tableau. It's one of the worst tools but we use Tableau to actually generate those level of reports. We can look at the granularity from a team to the various projects they own to even the specific roles which run on the likes of Mesa's Aurora or Manhattan and give them a detailed view of utilization of resources and the quota associated with it. So we can start showing them charts as to what was actually allocated and then how much have they've been utilizing over time. We do that on a daily granularity today. Hey, so it is handled at the Aurora level. We have built a lot of account management policies at the Aurora side of things and the information which is exported around quota and even utilization to a large degree is from Aurora itself because it has a view of all the tasks running atop of Mesa's. We haven't yet migrated to using the Mesa's quotas. Yes. Do you want to talk about it? Can I repeat the question please? I'm sorry. Currently a lot of this since we didn't have service identity a lot of this was done manual and we had reports coming from the system but one service identity is in place and we from that point onwards any new services that are created will be identified through the kite or the unified cloud console itself. We also have migration plans where we will be importing all the existing systems into the kite console to do the back identification of the existing services. And to add more context to Vino's answer we do have an internal tool which creates teams and manages teams as well. We call it roster internally the board name pun and we use information from that tool about the team and the user management aspect of things into kite then try to map all the services to those relevant teams and subsequently charge our bills. Okay, I've got a couple of them. Yeah. So there are two portions to that question. So the question is about how did this inspire the engineering teams to be more efficient? In short I think people were so when we started this there was a lot of skepticism as to how this would work. Yes, I believe that's what you're heading to. But over time we realized that as we start making so the dollar figures make sense for execs and more of the leaders of the group but for engineers it's important to show them the utilization data in the form of free sources which they really care about. I'll tell you one funny story. So when we did this people engineers really don't care about the dollar figure that much but when they started seeing how inefficient their services were with towards the allocated resources people proactively took on efficiency projects and tried to improve their systems. For example, I still know there's one engineer who used to work in the ads team went ahead and slashed almost 25% of the observability metrics written to our systems which saved them like more than I would say 200,000 dollars or something per month on TCO costs and the engineer decided to use that in their promo packet too. So that really incentivizes that that story was like one of the other examples on how people look at these numbers. I hope that answers your question. Just to add to that we also have like engineers when they see that there are different resources that kind of offers almost similar things showing these resource catalog and the cost around it they were more responsible enough to like hey if I have this information I wanted to make better architecture choices so that at the end the services that we are building is cost efficient. So it's like the natural responsibility of the engineers that they wanted to build services efficiently. Do you want your services to contribute to an observatory? Right, so I'll try to answer this and if I can please add on. So when we have the concept of onboarding a service into the chargeback system and that includes a series of steps we expect the service owner to define how they manage tenants. We expect the service owner to define what a resource means in the context of their service and we expect them to like integrate with a metering pipeline to send the relevant data to do that computation. We know which team they work for so it is interesting because when we look at the chargeback bill for that team that essentially becomes their expense and as beyond what this infrastructure with all these details and then we compute a holistic chargeback cost we can then tie that as an income. That is one of the reports which you saw which I think I shared over here. Yeah so so each of these infrastructure services belong to a team and we use the likes of Tableau to actually do that computation today. Do you want to add on something to that? Yeah just to recap how do we calculate the expense on the income? Income is what they get out of charging those bills out and expense is usually computed if you think about physical infrastructure as an infrastructure service it gets its majority of income from Aurora mesos because most of them run on the physical infrastructure which becomes an expense for Aurora mesos and Aurora mesos realize that I'm not using so much of resources because there are so many other services that are running atop me then they we onboarded Aurora intern saying that like if you are expensing so much you might probably want to offer your resources so that that becomes income and that's how like if you see in here that's how we cancel out their income and expense and calculate the margin other things we come we compute while calculating their expenses also like the other infrastructure costs the headcount the SRE headcounts and if there are any license costs etc are also computed along with this while calculating that expense so expense is what they are being charged along with the headroom and everything incumbents what they charge out as their resources and we calculate the margin out of it does that answer your question? Oh no it's not budget it's actually the chargeback amount charged to other teams we can then filter by an infrastructure to see what the income of that infrastructure looks like if I can open up Tableau I can probably show that but yeah yeah it's like pretty much unit cost of their total utilization charged out to the customers is their income not right not that complicated like we want to simplify this so we understand what their inventory looks like so we know what Aurora's total capacity is like through their metering information we know how much percentage of that is actually being allocated to customers to use and we have a unit price per resource if you really just multiply that allocation what is the unit price you can get what their income is like I think you had a any more question here I did have an open source is that an internal tool? I specifically cannot comment on that like but we are we are getting more requests on that so we are definitely trying to figure out a pathway towards open source I'm sorry so we started off with a with a team of two the challenge is really not building the system itself the challenge is more on the design and to cater to the needs of Twitter's infrastructure at the same time accounting for the public cloud world as well so many of the entity models you saw was mostly the time spent on designing that the second part where we spent a lot of time was actually the onboarding process because many of these infrastructure services had to like write specific adapters to integrate with kite but that said I think the first two services once we went past the first two or three services which took about three to six months after that we got an onboarding workflow and everything set out which made onboarding less than like four weeks you can onboard a new infrastructure in two to four weeks so right now I think yeah so right now I think the team's grown out of charge back also into building kites so as you saw we have about five resources right now building the entirety of kite this involves now building the other portions that is the the quota management the deploy and everything else per infrastructure as well which is something we haven't tackled yet but that is what we're designing and working towards right now I'm sorry right that's a great question so if we're onboarding the public cloud like why did we decide to build kite or even charge back right that's the question because it's already available as part of the public cloud is that okay so the difference between kite and the public cloud onboarding so kite focuses more on the service life cycle manager which is an abstraction on top of most of the public cloud dashboards which you see you get direct access to resources like VMs and things like that there are high level services but kite will facilitate what a project in your company actually means so if you're building the next big backend system or UI project that becomes a project in kite and that will facilitate the life cycle of it agnostic of whichever cloud you intend to run on so if you're running on Google cloud you can still see kite as a project and the resources it's using and the charge back for kite itself or AWS or whatever that is and it works agnostic of not only the public cloud but also for the internal data centers that's one of the reasons we had to build another layer of abstraction on top of the general public cloud resources hope that answers your question so regarding the number of metrics that we are capturing it is very different from each infrastructure like for an infrastructure like Aurora might have few like less than 10 metrics if you think about CPU or GPUs or core days memory etc similarly whereas an infrastructure like physical infrastructure or Manhattan can give offerings as it can still have like a same metric which is like a server per day but it can have many offerings which are clusters of machines or machine names so the metrics can vary from few like in the tens to hundreds yeah I think it's compounded by the fact that you start multiplying offers and offer measurements today I think that's where we have more than 200 different metrics across nine infrastructures like I would say in fact not even 200 sorry I think it's more than that 500 plus yeah we have 500 plus metrics again for around that nine infrastructures no we don't actually we store all of this in Vertica today for better analysis yeah so I think the key or what these folks have built is the entity model on the schema on how that actually works I believe we have more than 100 million rows at this point 100 million rows in the chargeback system because we do compute or collect meter data on almost a daily basis and then compute cost and everything based on that per offer measure per offering and so but I would say it's pretty fast cool all right thanks a lot everyone if you have any questions we are going to be available so yeah