 All right. Thank you very much. So welcome to our talk called Evolving a Helper Script into a 180,000 lines of code Python application. Misha has been at Google for about seven years. I've been there for nine months. So we have a very different perspective on the project. As a small caveat, I'll always talk, as us, when referring to the developers of this project, although naturally, since I've only been at Google for like nine months, most of the stuff happened before my time. Still, it's like important to mention that we have a strong no-blame culture at Google. So although we'll be pointing out some pretty nasty design decisions, we're not pointing fingers to people. It just happened, and they had motivations to do it. We had motivations to do it. But we'll more focus on how to fix it. As a last point, the project that we work on is open source. So feel free to check it out on GitHub. So as a small intro, before 2009, Google has a bunch of workstations. And when security analysts get a signal that something fishy is going on, like a virus or a hacker attack or something, then they need to search for files on those workstations. They need to fetch some files. They might need to inspect the Windows registry because Windows is this big hog of security issues. The original solution is to ship a physical hard drive, which is really slow. You really need to dismantle it from the computer and ship it somewhere else. So a better way was to SSH into machines. But for this, they need to be online. They need to be Linux. And so problems start piling up immediately. Like, what if a machine is offline? Maybe it's in a different time zone, or the person just closed their laptop. What if a machine is behind a firewall? If it's not Linux, but Windows or Mac, especially in Windows, you have the registry, you have the NTFS file system, all those specialties. What if I need to conduct some complex logic, let's say check if a process is present, then fetch a file if it is, don't do anything if it's not. And what if I wanna do that on Windows, where I can only run .bat and not a regular shell script? What if I wanna, if I have logic, conducted on the whole fleet at the same time? And if I have a big investigation like that, what if I wanna like assess my preliminary results, maybe even in a nice web user interface? So in 2009, this was only wishful thinking, the tools for that were only starting to appear. So we try to avoid shipping physical hard drives, or dumping them, or SSHing into machines. So what we did is we built our own distributed life forensic software, because we Google and we like to build our own stuff. The solution is security analysts wrote this little Python script that runs on workstations and talks to this other little Python script that runs on a server. This is how GER or GRR, our project was born, and the proof of concept was this like lean, clean, small, innocent 800 lines of code application. Now, 10 years later, it is used at Google Stale for incident response like on a daily use. It's used widely in the industry because it's open source. The code base has seen 8,000 commits, over 30 contributors, 180,000 lines of Python code alone, excluding all JavaScript and other web frontend stuff. It's used to conduct these really complex investigations like it deploys to all operating systems. It has powerful or S-specific features like accessing Windows registry, raw file systems, including parsing NTFS by ourself. We analyze process memory signatures. We have this asynchronous task scheduling that allows to schedule tasks on a large fleet of laptops. We have a secure communication infrastructure that is designed for planet scale deployment because we have a couple of machines now standing around. So we can search for forensic evidence given that there is a signal of a virus or something on the whole fleet concurrently. And you can access this all via web user interface and a RESTful JSON API and more. How it got there was through experimental development. So the original security analysts practiced this a lot where they launched quick experience like, hey, can we do this? Can we launch this? Can we build this? And Python was an excellent choice for them because Python is so good with rapid prototyping. So what we have to say is that the original team members were security analysts, thus they didn't necessarily have the biggest software engineering experience. Also nobody knew where the development is going, thus the big amount of experiments. Looking back now, we say, oh, these design decisions are really questionable, but this is due to the nature of the project. It was really experimental. And this is good because some things that were tried as experiment now were turned into these industry standard power features that weren't possible maybe 10 years ago, but now are. But also there's a dark side. When an experiment lives longer than it should be, and then in production, you have this experimental code that wasn't designed to live long. As a small caveat, this is like talking about 2009 Python development originally. So that was the year by Avatar. The first big 3D movie was released into cinemas. Also, Python 3 has just had been released. Type annotations will only be suggested in five years from now. GitHub is one year old. Travis, the iron avair, do not exist. So writing GER today would look much differently because now you have this great IDE tooling, static analysis. All the frameworks are way better now. What we'll present now is a more or less chronological story of questionable decisions, why they were made, and how we fixed them. So one prime example to start off is using pickle for long term storage. So the original software engineers, they faced the issue, how can we dump and load a class, again, in our web server? And with Python's batteries included, a really easy and tempting approach is just pickle to the rescue. As you can simply import pickle, then instantiate your data class, like maybe we call it client info. It stores some info about clients. You just call pickle.dumps. And it renders most classes into this byte string that you can then unpickle again. So it's all easy. And especially if your code base is small, it's so tempting to use it. So now we faced an issue where we have this data structure that has to be pickled, but it also has to be locked because we need to limit concurrent access at times. Locks cannot be pickled because they only exist in a process space, so pickle doesn't like them. What we did is we added this small patch on top of our existing code base instead of revamping the whole system. We just added a small patch because it's so easy to use. So we implemented pickleable lock, which is a lock that can be pickled, and it's unlocked after unpickling. And this is only a small thing. Now, looking backwards, it's an indicator of a flawed design. But if you pile small patches on top of each other, you might not notice. Using pickle broadly in the code base leads to this sunken cost effect where, once you encounter bigger issues, you still start piling small patches on top of each other. So unpickling is tied to class definition, which means that if you move the class or you rename it and then you call unpickle on serialized data, you're in for a bad time because you will get a runtime crash. And we had many cases where production would melt down because a new developer would maybe rename a class to a better name or move it to a better location. Pickle wouldn't find it and just give up. And this is how they help themselves by applying another small patch on top of the existing pickle logic, where we wrote our own unpickler called robust unpickler that monkey patches into the unpickling logic. And if the object to be unpickled cannot be unpickled for any reason, then we wrap it in this global try accept and just replace it with unknown object. To solve the issue that production is melting down at the cost of just showing in the user interface, oh, hey, we have this unknown object. We cannot unpickle it. This was like a neat fix. But now, things got worse because we constantly had to tell users, hey, yeah, you're old data. Sorry, you can view it anymore because we can't unpickle it because we refactored our code base. To mitigate this, people started introducing aliases all over the code base. And now, you have a class definition in many different files referencing each other. It really turned into a mess. So to sum up, using pickle is an obvious hack, but it's really tempting and easy. Ultimately, it's just impossible to maintain. And the solution for us was a migration to protobuf, which gives you stronger guarantees at the cost of verbosity and less flexibility. So this is how it looks now, where you have this protobuf definition file where you define your structs. And you say, OK, it has a name. It has a version. You define the fields. And then you can code generate data serialization and un-serialization methods. And the usage is similarly easy. But the main thing is the protobuf language restricts you from doing many incompatible changes. Thus, this improves the stability. Now, using pickle is so tempting because it's so easy. And Python metaprogramming is also really easy. Thus, it's also really tempting. So in the same year, metaclasses were introduced. The original developers, like in 2011, mostly used VIM or other command line-based utilities that didn't provide the good tooling that maybe PyCharm does now. So they had this annoyance that many data classes are spent all over the code base. The project is new. So people move around things all the time. And due to the nature of the project, you need to reference the data classes from most other parts of the code base as well. So writing all these imports and refactoring them manually without much IDE tooling is annoying. So maybe metaclasses to the rescue. So please show your hands who knows what a metaclass is. Oh, that's most. Who has written one? OK, not so many. So metaclasses, because we Google things a lot, and I have this great definition from Thomas on Stack Overflow, so thank you. A metaclass is the class of a class. So a class defines how an instance of the class, like an object behaves. And then the metaclass defines how a class behaves. Now, what this allows you to do is that you run code at class definition time to change the class or to change other things based on the class. This is what we did. We wrote this metaclass registry that, oh, so this code is a bit abbreviated. Like, you might notice that base class is never defined. It's inferred from the basis, from init. But we had to abbreviate the code a bit. So what metaclass registry does is, if you inherit from it, then it puts your class in a dictionary, referencing it by its name. You see RDF value, which is somewhat the base class for all our data types in the code base. It uses this metaclass. What this allows you to do is, if you have a client info that inherits from RDF value, then you can easily access it by just getting a reference to RDF value.classes and asking for the name. And you get an instance of the class. You get the class. So this is really easy to use and looks innocent. Now, what people really did is they didn't use any dictionary, but they used the global namespace of RDF value.py. And this allows this great usage where now it looks like all your data classes are in RDF value. Like you just have to import one file and you can instantiate them really easily. And now it looks like less innocent, especially when the project grows and you get like hundreds and hundreds of classes. So effectively, we turned one problem into two problems where static dependency analysis is now impossible. This otherwise static module RDF value now is this de facto dynamic dictionary modified all the time during class definition in other files. Another thing is the test startup time is vastly increased because all the tests reference RDF values, which references all the data classes that you have. And it didn't stop there. Like meta class registry was introduced and now it's so easy to use. You can use it for everything. You have a class that has like 20 subclasses. Oh, great. Add a meta class registry and you have like this nice dictionary of all of them. So another example that I'm showing you is init hooks. Code of many different modules has to be ran at application startup, which is like a plug-in-like system because you have many different modules and they might not be super related. All the code has to be run at startup. So what if we like put all the code into a meta class registry again? So we reuse the meta class registry. We define this init hook, which is an interface with a run method that is ran at application startup. And then we, let's say we run it, initialize a result queue. Great. Let's just make it inherit from init hook and it is automatically executed during startup. Great. What could go wrong? Well, maybe we need to execute some code before other code. So instead of revamping the whole system and make it more explicit, why don't we just add this small field pre that specifies like this simple dependency graph? If you're wondering about the uppercase naming for example of run, that's a bit of Google code style. So now we effectively turned one problem into n problems because now dependency management is hell. Like you forget to import your plug-in in the right file. Bam, your code is not executed. Runtime crashes. Imports now have real side effects. They schedule code execution. The initialization order depends on this dependency graph. It depends on the order of your imports. It depends on the placement of classes in your file and possibly even Python hash seed, like only God knows. We have whole new classes of obscure bugs and we create this culture of unused imports where you have to import a file to schedule code execution. A new person joins the team. Their IDE says, oh, hey, this file has a lot of unused imports. You think, OK, why don't I remove them? Bam, runtime crashes. So yeah. The solution is to move away from all the magic and make everything explicit and verbose. So the server startup is now just this set of plain old Python functions that is executed in a linear order and it makes it really easy to trace everything. Also, you don't have unused imports anymore because you have to access something from the files you import. Similarly, we still need class registries, especially for the user interface. But now these are dictionaries, even better, frozen dictionaries that have explicit keys and explicit values. Everything is explicit. Your IDE tooling works. Static analysis works. People can easily follow. Coming with this use of meta classes is the excessive use of string literals. Now pickle and meta classes are like obvious Python power features, especially meta classes. This one isn't as obvious, but if you use it too much, then it creeps into being a power feature. Python is so dynamic, so you can access most things via their string names and we use that a lot. The dependency of an init hook that you see there, we didn't reference the class because then you had to import it. You just use the name. Meta class registries handles that for you. If you wanna collect a metric, then you just create a counter with a name, but you don't get an instance back. Later on a global singleton stats, you just call increment counter with the same name again, easy. We have this concept of flows. I'll get to that later. Now in this get master boot record flow, we need to specify a callback, store MBR, but we don't point to the function. We just put the string name there. It's easy, just works. So this might have been acceptable in 2009 when Gerber started. Now the whole community shifts toward like more static analysis. Thus people spot this as a bad pattern way earlier, but back in the days we didn't. So now what happens is your dependencies become implicit because you don't import them. You just name them by their name. Static analysis does not catch any errors. Like no error is caught before unit tests. Code introspection is way harder for both humans and machines and refactoring super error prone. Like when you refactor a class and you rename it, then you have to make sure that you rename all class occurrences, but please don't rename too many if one class name happens to be prefix of another one. So the solution again is to use like explicit references to classes, functions, and enums. And Python allows you to do this in an easy way because of the dynamic typing. So please do this. This is how the code should have looked where we have a reference to the dependency class. We get an instance of a counter and we can increment it. We can call increment on it. And in call client, we specify the callback as an instance bound method. Like Python makes it easy and we should use it. So the previous slides said this is from 2011 until now. So we're still working on that, especially the statistics that's work in progress. So from 2016 until now, a bit more, a fresher example. Because girl executes codes on workstations. On the server side, we have these flows, which is like a very long running process. It's a long running serializable state machine with asynchronous callbacks and complex domain logic. An example here is the server instructs the client to collect a file. So at first, the server says to the client, okay, please hash the file. The client hashes a file and sends back a response, which puts this server side state machine into a special state, runs a callback and puts in the client response. And this happens like a couple of times. It's back and forth. Callbacks can take maybe hours to complete or maybe even days if it's a long weekend. So because this can be quite complex, composition was introduced, great. Flows can delegate to other flows via this call flow method. For legacy reasons, this was really expensive for the data store. So instead of revamping the data store and making call flow fast, people said, why don't we just put all state into one state machine, then we don't need to make the data so faster. To do this, Python offers this great tool, Mixins. So Mixins to the rescue. Now when all the literature says composition over inheritance, we did inheritance over composition and this is what you end up with. So you have this file finder flow and it depends on fingerprint file logic, multi-get file logic and glob logic. All three are Mixins and all four of them depend on flow base, some explicitly, because we have to run code on the client as well. We have this composition to client action plug in like hash file, transfer buffer, read buffer, get file set, find. So you have this vast like class hierarchy that is both deep and wide. So you have spaghetti code and lasagna code at the same time. And this leads to tightly coupled logic across classes that implicitly expect like other Mixins to be present but they don't mention it in the class definition. The code is so fragile and complicated, nobody likes to touch it. So call flow was slow, the solution would have been to make it fast, not to abolish the good design patterns as a workaround. Michel will talk later about our data store. Oh yeah, Michel will talk later about our data store. We'll go a bit more into depth but so this is still a work in progress where now we have a fast call flow, now we have to untangle the mess again. All right, so Michel. Hello, I'll take over from here. So before we go into the exciting story of revamping your data store, there's a smaller story of string types uncertainty. So about two years ago, we started to migrate our code to be Python free compatible and the most manual and the hardest part of this migration is a migrating code that has to deal with strings because it cannot be automated. And we found that in our case, we were particularly slowed down by the way we handled strings in our project. So effectively the problem was that inside the project we used a mix of unicode strings of byte strings and we also had a few string-like objects of our own that we were using and developers never knew when they were getting an argument what kind of type this particular piece of string will have. And so a little helper function was introduced called smartStru. So it's called smart because it's smart about the type of an argument, at least it tries to be smart. So if it's a unicode string that it just encodes it as UTF-8 and if it's anything else, it stringifies it. And if you look at an example, so here's an example of a rather innocent function that gets an argument called CLS, which is meant to be a name. And we don't know what that is. It may be a string, unicode, or our own object. So we just convert it on the spot because we know that UTF-8 does smartStru just guarantees that things will work. And it feels like a nice solution until it doesn't because effectively we turn one problem into two problems. So we don't know, we still don't know, when we read this code later, we still don't know what the type was supposed to be. Basically, was it meant to be a UTF-8 string or a unicode string? But now we also don't know. Maybe the developer used UTF-smartStru to simply stringify an object. Like because of this smartStru behavior, it could have been anything. And so effectively to understand the intention behind this code, we have to trace the full call chain and figure out what the intention of the caller of this function was. And then if there are multiple call chains we have to trace them all. And then it becomes slow and it becomes a lot of manual work. And so effectively the solution to this problem is to remove smartStru and to have clear policy of which strings you use and when you do the conversion. And this is for us the integral part of our Python 3 migration effort that we are currently finishing. And now the exciting story of your data store. So we put it roughly in the end of the, in the final part of the presentation because the story of Gary for data store has all the bits that we pretty much already covered in the presentation. So pickle was used there for some time. It definitely used mix sense. There was excessive use of string literals and it was relying heavily on meta classes both for like meta class registry use case that we described, but also for some code generation and whatnot. So it was a little monster that we created. And to give a bit of a history context. So we needed a data store that would work on multiple back ends. And initially that was MongoDB. Then we switched to MySQL. We needed a memory implementation and we also needed to have a Google internal implementation based on Bigtable technology. So Bigtable it's a Google database technology where what you have essentially is a giant table where all the data is identified by a primary key. And you can query only about that primary key. So that's rather limiting, but what you get is distributed database which is very, very fast. So our design was broadly inspired by Bigtable and it was also broadly inspired by an AF4 forensics format. So what we took from AF4 was versioning. The idea of AF4 was that you have an hierarchy of objects with attributes and you can define a flexible schema and every attribute is versioned which for forensics purposes makes total sense. Whenever you do an investigation you want to preserve the full history. So the design of our AF4 data store was such that we had objects defined as Python classes with a bit of a help of some meta class inspired Python domain specific language that we used to define schemas. And these objects were forming an hierarchical tree and each one was identified by your end. So in the bottom of the slide you see a sample of this tree. So we had like a your client object and it could have a flow and a flow could have a result collection. So this is an example of an hierarchy. So what AF4 wasn't designed for was storing a lot of data because each individual object was meant to be rather small and to not have millions of children. It wasn't also not designed for frequent object updates because all the objects were meant to be versioned. And also it wasn't designed for queries on any attributes but the primary key. And so it was used for everything that it wasn't designed for very quickly. And what happened was that we developed a kind of an inner platform effect. So an inner platform effect is something when you have a technology that doesn't do what you need what you need is done by another technology but instead of switching to another technology you bend your own technology and you implement manually all the things that are done in a better suited one. So for example in our case indexes was the thing like we didn't have them, like a big table didn't have them. So we started implementing them ourselves and we have written a lot of code where whenever an object is updated we have to update an index. Whenever an object is deleted we have to update an index and whatnot. And that was fragile, error prone, hard to maintain and using something like MySQL would just bring us the same thing much easier. And here is an example of how the code looked like. You don't have to understand this code. I'm not sure that I understand this code basically but it shows a few interesting concepts. So first it shows the amount of abstractions that we introduced because for example there is a data store mutation pool and mutation pool is generally used to group multiple writes together. And probably do some other things. And then there's multi set and I guess you have to call multi set when you set more than one attribute even though I'm not sure about it. And when you set an attribute you do not just simply provide an attribute you provide a list because probably you can set many values at the same time. And there is also replace equals true keyword argument which it's probably here because we need to replace the attributes and not preserve the history. But effectively what this code was meant to do is to update a user object when an API call is made to GUR. So we replaced it with this. So if you compare two snippets of code in this one like this is our new data store abstraction like the intention is very clear what's happening and the details of a leaky object oriented data store abstraction in the new code they do not leak all over the code base because effectively what happens because we have this really powerful abstraction really powerful object oriented database this kind of code starts to spread all over the code base. So now when you update objects you always have multi sets you have a lot of implicit knowledge and another issue is that when we work with this code for a long time we become experts in this code and we kind of start to know it by heart you know certain quirks of how it works but for people who joined the team this is unintelligible and naturally because it's our own technology like we built it ourselves it has no docs. So code is the best documentation only that it's not because the code relies on meta classes code generation and whatnot and it's very hard to read so for people who are joining the team to make sense of it is very, very hard. And so effectively what we did is that we went from this very abstract data store abstraction from an attempt to build an object oriented data store to a very concrete one. So we switched from having you know 15 general purpose methods to having about 100 very narrow focused ones. So the reason why this approach works very well for us is because we still have to support multiple database beckons. For example we have to work on MySQL on Google Cloud Spanner. And even though both these database beckons are SQL like they have vastly different performance characteristics. So by having these methods a very narrow scoped we can make the optimization part of the implementation. So MySQL code will use the best technique for MySQL Spanner code will use the best technique for Spanner and whatnot. The price that we have to pay for it is verbosity because we have to implement 100 methods every method has to be tested. Things tend to get repeatable because we have count flow results count flow log entries and stuff but the effect on the overall code base is great because the code becomes much easier to follow and much easier to reason about. Also the maintainability increases because if let's say in your monitor and you see that count hunt flows became slow then like you know where to look at you roughly understand what the problem is about but if you see that for example create object may take five seconds sometimes then you have to understand what was the call that triggered it and like what was the combination of arguments or maybe it was mutation pool that affected it and things quickly get out of hand. And one of the very important points about this thing we were only capable of migrating data store to migrating your new data store implementation thanks to the comprehensive test coverage because this kind of migration effectively touches every file in the project and without tests doing that would be impossible. And the thing is that we do write a lot of tests even though as you see from the presentation some of our design decisions may have been questionable at times but we do test them and our tests may not be perfect so we may have test methods that test more than one thing or more than 10 things but basically it's we found out that it's a drastic difference between having no tests and having some tests and what's important about Python is that it makes writing tests very, very easy thanks to its dynamic nature. And just to give a flavor of what kind of one example of our test is a regression test for our API we have to ensure that people who depend on your API can use it reliably and so we introduce regression tests every regression test has to have a run method and run method may run one or multiple checks and effectively when a check is converted into an HTTP call and the response in that is then compared with a golden file that you have. So an example of a golden file is here so it pretty much has a bunch of metadata plus the response. So the interesting thing is that the same code is used to generate these golden files or update them. So back to testing, we have multiple layers of testing. So we have unit tests, we have implementation independent data store layer tests that test our data store implementation against various backends. We have API regression tests, we have Selenium tests for the UI that tests all the user stories. We have end-to-end black box tests for your clients because your clients have to run on all the platforms like Windows, Mac, Linux and so we have end-to-end tests for them and also what's interesting we have self-contained end-to-end tests for Windows VM. So what this means is that we wanted to run end-to-end tests as part of our continuous integration pipeline but for Windows that was hard because you had to orchestrate multiple VMs. Our server is meant to run on Ubuntu Linux and then you need a Windows VM to run the client and whatnot and that was an issue but what we were able to do is that thanks to GERB being written in Python we could run our server code on Windows without changing a single line pretty much. So that allowed us to have a MySQL database and Windows plus the server plus the client all running on the same Windows VM and that makes continuous integration much easier because you don't need to orchestrate anything and regarding continuous integration yet naturally we do a pair and Travis to test to run all the tests and also run Windows end-to-end tests. So an important thing to note here is also a Python packages split because historically GERB was a single package we just had a single GERB package with like 100,000 lines of code pretty much and that was fine until it wasn't because what happens is that even though you have, even if you have a good test coverage the fact that all your code always lives in the same package it creates a culture of everything, reference and everything. Somebody creates a data like a new data type and puts it into client namespace even though maybe it's not used in the client or then it gets refactored and it's not used there anymore and it becomes a bit of a spaghetti of references and the consequence of that is that when you do a change you may affect tests that kind of should have nothing to do with this change for example you change the line of the client code and then boom and a Selenium test start to fail. This is something that you generally do not expect and it took us quite a while before we could split the project into multiple independent PIP packages because we have for example a client component, we have a server component, we have admin UI, we have so called workers and now all of them live in their own PIP packages and the good thing about this is that this way you can clearly separate the code and ensure that the code doesn't depend on each other because if your response server PIP package doesn't depend on your response client then if you install it in a virtual and run all the tests then if there is an attempt to import something from the client namespace it will just fail. So this is a nice way of ensuring that. So a brief summary of like what we were talking about today so Python is a great language for prototyping and it's also mature enough for production and you're benefited greatly from Python's experimental nature because the way your was not and is not a project that was designed on a blackboard and then implemented so it very much evolved in various directions, various directions were tried and this is where Python really helped because we could really experiment fast and work fast. So Python is also mature for production especially today with all the static analysis features and type annotations features and effectively a lot of features that were introduced as experiments spent through this process of productionization when this crazier bits that we've shown in the presentation were removed and replaced with explicit clean code. One of the things is that we only knew exactly what we want to do, like what this explicit and clean code should do because we did an experiment before that allowed us to understand what a particular future should do. So it's great that today modern Python has this large projects friendly capabilities. It's interesting that these capabilities like static types, they do not play nicely with some power features like meta-class code generation features and from our experience basically we think that power features do have to be used with care and our experience shows that we had to eliminate almost all users of like power features like meta-class programming and others that we've shown here to make code more readable, more maintainable and easier to follow. And the last but not least is that tests are absolutely necessary for project growth and maintenance without having them we wouldn't be able to grow the project over these 10 years. Yeah, thank you. And we are happy to answer the questions. Yeah, so we have about five minutes for questions if anybody has one. There's microphones in the aisles here. Hi, yeah, thanks for the talk. We're interesting to see a large-scale Python project. Through this year's EuroPython I heard a lot of talk about optimization introducing some CPython compiled extensions and so on. I think running on so many machines performance also matters for you. How do you think about performance? So in the beginning we didn't think much because like Python was chosen not for its performance reasons but because a security analyst community knew the neutral language well. But in general, a lot of things that we do, for example, all the data store code, it delegates to C. Protobuf's code serialization to serialization delegates to C. So that's not too slow and it's something we can handle. We had a bottleneck, one of the components that had to get data from the clients and that had to be able to sustain very, very, very, very high load. So that component was rewritten and go and because it was a constant critical point of failure for us. So, but the rest of the project is staying in Python and Python's performance is generally enough and we didn't, enough for it and we didn't rely on any specific optimizations technique because it was working for us. Thank you. Two comments, more than questions. One about the metaclassies used to register the classes. This is one of the safest use of metaclassies and I do that all the time. Everybody does that, it's not that your issue. Your issue, you were using these strings directly as names and also you were populating your global namespace with the cloud. This was real mistake, big mistake. Having a register with metaclass is good if you just use that for introspection purposes. You want to know the list of the available classes that you have and nothing more than that. Maybe avoid having two classes with the same name because you think it's confusing, but you should not use that registry to do more that that's I think is your problem. Second thing I wanted to say is about splitting the big monolithic project in pieces. We had a similar problem at work and actually my boss said, no, no, we should split even make different repositories so you are sure that you are not importing, everything should not import everything from everywhere. So what I did in the end was keep everything in a single repository. I have a global namespace, umbrella name, in our case is open quake, then you have sub-module, dot the sub-module and put names like this is base library, common library, good deals, application, so there was a layer of 10 different modules and there are tests that check that you cannot import from a lower level, you cannot import from the upper level and only the upper direction and that's worked very well in my opinion. You're gonna, I will provide two comments. So I, like this was the logic with metaclasses was precisely our logic, you know, that you, if we just register, like the global stuff is a hack, is a hack definitely, but we thought, yeah, we just register things that we actually have to use. So for example, we have things called flows and we have to display a list in the UI, right? So if we just use metaclass and we display everything then it's gonna be fine. The problem is that first when you start, when you have this registry that is generated effectively on import time because classes get registered on import time then it becomes tempting to use it for a lot of things besides introspection just because you have it and effectively the issue is that, let's say you want to write a test and you want to use only a certain specific, you want to have certain specific data in your registry for the test. That pretty much means that you have to do a bunch of imports in order for them to get registered. I mean, you can maybe do that explicitly but because importing is easier, that is likely to be done and then effectively you start using imports for their side effects and this is precisely something that we try to avoid. So. This is really, really bad. Even without metaclass, this is really bad, I agree. Yeah, okay. I'm afraid we're out of time. Let's have a big hand for our speakers, Max and Mikhail.