 Hello everyone, so first about me, so my name is Christoph, I'm working in the QA department of our in-memory database SAP HANA and this talk will show you very, very short introduction about HANA itself because it's important for the rest of the talk and then about how we are using Python to test SAP HANA itself. So let's get started with SAP HANA, it's a relative new product of SAP, it's an in-memory database which means we have some storage engines and column store and row store which is really optimized to run in the main memory of a server and with the optimized algorithm it's much faster to find things to perform queries and so on. And it fits very well for online analytic processing, so you can do normal analytic queries on it or you can also use it for transactional processing. Typical database, you install it maybe on just one system but we can also build up and scale out systems so you have multiple nodes and connect each other to build a big database and then you can distribute your tables across the systems and so on. For you as a Python developer it's actually not so interesting because your interface is most of the time just SQL and then you use the database. So it's basically not so interesting about the very detailed insights but who knows. So HANA itself is written in C++ so not so interesting for us but we deliver a lot of management tools and commands which are totally written in Python and one part of the HANA distribution itself is also a Python interpreter. So I told SAP HANA is an in-memory database which means you need a lot of memory but you can start from the beginning. So we have for example one small express edition. You can start with it on your local notebook. You just need 16 gigabytes of memory but you can also scale out and have a system with something like 48 terabyte of memory and there are real customers which are running such systems and it's still very impressive. So if you take a look how you can use SAP HANA or you can connect your Python application with SAP HANA it's very straightforward and very simple. We have a Python client signed basically the beginning of the time. With the next service package from HANA itself the Python client will be fully supported and has also support of Python 2.7 and Python 3 and then you interact with it over the typical DPR interface. So you open a connection, open cursor, run some SQL and you can fetch the result. But as most of the Python developer doesn't write SQL anymore and personally I can totally understand this, we have also some open source projects or basically there are open source projects to interact with the database. So there's one dialect with SQL Alchemy. You can run SQL Alchemy very easy with FANA and it makes a lot of more fun than writing SQL or there's also another open source project to use SAP HANA as a data backend for Django itself. So let's talk a little bit about testing a database. Testing a database is not so different to testing a software because database is just a software. So we have this very typical pyramid on the bottom. We have unit and component tests written in C++ because it's the main language of HANA. It's the main language of all developers. But if you start writing some kind of integration tests or very complex end to end tests or so, then C++ doesn't fit very well anymore and most of the tests are then written in Python. One disadvantage is that if you write more complex tests, more integration tests, there will be a little bit slower and will be more expensive. So unit has a much faster and much, much cheaper typically. So let's take a look at our development process and how we integrate testing and quality assurance in this development process. A developer puts a change into our GARRET. GARRET is a very famous Git code review system and the whole HANA circuit lives in one big Git repository. After the push from the developer, we trigger some quality assurance processes before the commit even reach any kind of branch. So there will be no commit which will reach inside of a branch without testing, without building. And after the building and test processes are complete and also other quality assurance things like code analysis, style checkers or sanitizers in the C++ coding, there will be a review from a very dedicated team which reviews your test results and at the end they are voting, okay, it's good enough or please try again and please fix some failures. And after the review, your change will get merged into the repository. So to build something like this, I mean it's a very straightforward continuous integration landscape. It's also very common to do this. So in 2010 it was like a very common landscape. So developer is pushing to GARRET, GARRET will notify our Jenkins, your iServer about the new change, Jenkins will look into configuration, maybe there's a job configuration for it and then it will trigger this job, place it basically in a queue and if there are some nodes without or with valuable resources, one node will corrupt the job from the queue and execute them. Very straightforward. Let's take a look and deeper look into what such a job looks like. Such job is basically divided in four parts. So you have to check out the latest source code, you have to build the database from the source code itself, we set up a complete database and then we run the test. Very straightforward and until now looks like everyone basically does continuous integration. And one special thing is already included in the 2010 version, we have a central database because we are a database department and we store all of our test results in this central database and the developer can afterwards take a look on this data via web UI and we can still access on this old data and I can still review test results from 2010. It's actually not so interesting anymore but we are still able to do it. So if you read the description of this talk, this talk is about scaling and how we scaled out the test infrastructure. So the main question is, so why should I talk now about scaling? So because until now it looks like a very typical continuous integration system. So let me try to prove it that maybe we have some experience with scaling. So right now our system is working for 600 developers, totally distributed across the globe. This developer is pushing around 700 commits every day into the system, into the repository. We have around, we have above 30 million lines of Python testing coding in our repository and we are performing actually every day testing of 36,000 hours. So this is basically around four years. So we are running tests every day of four years. We do this on a landscape of small 1,300 Jenkins nodes and these Jenkins nodes are actually not so small. So we are not talking about small additions, we are talking about bare metal servers in most of the cases and we are using around 408 terabyte of memory for testing. It's just about testing. So we have much bigger systems with memory but we are still using 400 and eight terabyte for testing. So let's talk a little bit about how we did the scaling and I just picked four topics for scaling. So one interesting part is test runtime, how we optimized that. Test scheduling is also quite important thing. Defect is also a very interesting area because you cannot move so much data around and then you have to provide a very healthy test environment, especially if you test on bare metal systems. Let's take first about test runtime itself. So this slide shows the runtime of a Jenkins job of around eight hours and you can see the job doesn't fit any more on the slide. So you are pushing and you are waiting more than eight hours until your test result is available, until you know if everything works well, it's not so great actually from developer perspective. And we started to optimize it by applying a very common pattern from computer science divide and conquer or in our area it's basically divide and test. So the first thing we did was we separated the test job from the build job itself which means we can now run the build on a different machine on maybe a machine which is optimized for building our product and at the end we run the test on a machine which is optimized for testing. So a very common example is that build machines has typically more CPUs and test machines has more memory. So but actually this split now increase the time because you have now also this communication time you have to transfer the artifacts across the network to the different host but this was a prerequisite. Now we can also split the test block into smaller test blocks which has also the benefit that in case one test block fails then we can reschedule it and we can still keep the time for review by around seven, eight hours which is actually good enough right now. So let's talk a little bit more about test failures in our case. Tests can fail and actually it's the intention of test so you wrote something new you don't thought about maybe unrelated component and now the test is broken. In case the test is broken we currently run the strategy to rerun this test to verify that this is now a real regression you really broke something and it's not caused by some sporadic failures and kind of network latency issues or some kind of general infrastructure problems and so on. So after the rerun is complete and in case the rerun is still or the rerun is still in a failed state then we know okay it's a real regression and someone has to take a look at the results and take a look at the traces and decide okay what is the reason for this. So the main question is who restarts fail tests so you can imagine in case a developer pushed something and then goes for example home actually doesn't want to get in the office at morning and see okay test failed I have to restart it have to wait eight hours or one hour until the test result is available. So this is the reason why we started to think about more intelligent test scheduling and we thought more about test scheduling and we found out that test scheduling is basically about two parts the first part is about configuration which test should now run which tests are now interesting for this change. So for example we have different configuration for our multiple hundred git branches with different components inside that if you push in your topic branch which is something like a feature branch then we will run tests optimized or test basically for this particular feature. If you push against one our integration branches then we will run a huge suite of tests to avoid requestions and other components and so on. But we can also integrate things and we also added features like layer testing which means we first do some kind of unit testing so we run first unit tests in our infrastructure and first after this unit test successful we run the really expensive integration test and then we run the really expensive end to end tests. And as we have a large developer base and a large code base it can happen that someone breaks a test and in case of you have a broken test you cannot stop to integrate new changes in your integration branches it wouldn't work in such a scale anymore and this is the reason why we have also some features and ways to handle such broken tests so you can for example move a test into a quarantine and say okay we know this test is currently unstable there are some bugs inside and being excluded tests in the run in the execution to save runtime. The other big thing about test scheduling is to observe the whole test one so in case there is a failed test he reschedules the thing or in case the test is now complete you can automatically perform a review of the test and the most important thing is basically after someone pushed something you actually want to know okay now it's complete because the test just runs eight hours. For this we started to implement more intelligent test scheduler obviously in Python because we really love Python which means after a build the build will trigger our more advanced test scheduling we call it the data and the data will then ask different systems about configuration about states of certain things so for example in case you push something and you reference a bug inside of it then we will take a look at the state of this bug and only in case this bug is in in a defined state in process we will then start the test execution after the waiter decided which test should run he will schedule it in Jenkins and will still monitor it in case there's a failure or some test block is missing or something he will really schedule it. At such scale you have also to talk about queuing in scheduling of tests so we have certain requirements so for example our nightly test should be completed in the morning or that bug fixes are a little bit more important than new features so the test should run with a higher priority and also that we should maybe finish started test the finish the testing of commits in case they are not fully tested and we have some reruns. Jenkins currently only provides the first in first out queue and with first in first out it's really hard to implement such requirements and our solution for this was also to implement it in Python again because we love Python. So we built a prioritized test queue which means we the waiter is putting now the tasks to run a certain test into this queue and this queue will sort them around based on the priority based on the content of test task and then a processor will fetch queue items from the prioritized queue and will distribute it across Jenkins hosts at Jenkins masters which is actually also very required feature to distribute across Jenkins masters because we just learned that Jenkins doesn't scale well with more than 350 server inside. So let's talk a little bit about artifacts. One thing is that our installer is a little bit bigger than typical software products than a typical Python package. Our installer is still 15 gigabytes and is living on an NFS share after the build is complete they will place the installer on this share and we will install from this share and we have also test data in various ranges something like four megabytes but also above or until 800 gigabytes and we are doing per week actually nine petabytes of data transfers just to transferring the installer and test data to the real host which are running the tests and to optimize this we introduced some kind of caching so we just place a very simple Python script it's around 800 lines of code before our before we call the installer and he will check is the installer already locally available and in case there's a cache miss he will fetch the installer from the NFS share place it locally in the cache and then we can run the installer right away from it and we can also do the same thing for test data so in case it has requesting some specific test data artifact we intercept this call and then we are fetching the artifact from the central stress on the local disk and we can import it and with this we actually saved a lot of traffic and this implementation implementations are very straightforward and very easy we saved 66 percent of traffic and we are now only transferring it's still much but it's better than nine petabyte so they're still transferring three petabytes of test data effects every week across our network the next very important thing especially in such test environment is that you have a very healthy test environment so you have to make sure that all your external dependencies are available so as I mentioned we have this NFS shares with artifacts with test data but we are also testing distributed systems so your local host is not the only host which is currently interesting for your test run and as we know external dependencies will always fail but we have also to make sure that your local system is in healthy state as we are performing parallel testing on the same host we may have to make sure that there's no noisy neighbor around on the same host and noisy test basically running on your on the host which for example consumes all their valuable memory and then it's very logical that your test will fail because there's no memory available anymore so to solve this we started and implemented a health check which runs before and after a certain test run and it checks all these dependencies availability of external services local memory usage CPU usage and so on and also implemented in Python so what does it look like today today we have still Garrett but Garrett doesn't trigger anymore Jenkins directly or there's no Garrett trigger anymore available we have now dedicated infrastructure for building the database itself and after the build is complete we get some notification and we will start our waiter which will then schedule the test tasks and then some test processors will distribute them across our Jenkins landscape and the nice part is that if we now take a look what is currently running in our infrastructure we will see a lot of things and now totally written Python so every blue box is now in Python and also the build infrastructure is heavily implemented in Python and we still have our central database with all the results and the web UI to review these results and we are still heavy Python fans so it's still very great that the learning curve is so easy to get started that non developers can write tests for us for the database we are very big fans of the community so we are heavy heavily relying on open source tools like virtual and flagpip we are heavy use of sentry so we are storing a lot of exceptions in sentry we are big fans of the development velocity performance is nice but to bring a feature which you had in mind in the morning at the afternoon into production is much better actually and that Python is platform independent allowed us also to scale across typical architectures so currently we are running tests on three different CPU architectures and over 10 different operating systems versions so just give me a very short outlook what we are currently doing and what we are currently trying to achieve so we are currently thinking how we scale for across 3000 nodes and we have some POCs running with Apache methods for doing a more resource-based scheduling approach we are playing a lot around with Linux containers currently Docker but maybe some other container engines could be also interesting to limit resources to ensure that your test run has enough resources locally available and we are currently in step to migrate to Python 3 so a lot of the codebase is still on Python 2 but there are some projects already migrated to Python 3 so thank you very much we are still hiring so in case you want to play around with a lot of memory you should definitely to talk with me so I think we have enough memory for everyone thank you thank you Christoph are there questions hello thank you for the presentation I'm curious about how do you test the failure scenarios like the I think Anna has like the distributed structure does it test on the Jenkins site or do you test it with like the Python so we're testing it in Python itself so in case you have a distributed system like HANA it's possible for the developer to why test cases to say okay now I would like to intercept a certain network on communication for example and then you can test the behavior how it now works without network between two nodes for example thank you hello thank you for your talk one question you mentioned use you run relevant tests first for topic branches how do you do that like does a developer have to define them or are you able to determine that automatically currently it's configured so the developer says okay I would like to run this test in my branch but we're thinking about ways how you map for example source code to testing to test coding and then we can decide okay what should run and we have also some proof of concepts running to use our coverage data for example for such things hello how do you split up your test suites so that they could run in under eight hours do you do it by hand or do you have something interesting scheduling methods or something we are currently doing mostly by hand so one of this test block contains multiple Python test scripts actually and then we are trying to picking them together to reach a runtime which is acceptable for us we have some scripts which can generate this collections but it's not so sophisticated more questions we have a question also performance tests part of the suit yes we are also doing performance tests with actually the same landscape so all the performances on many performances are written in Python also and we're doing also a lot of reporting and evaluation of the results of performance testing with Python cool more questions that's not the case so let's thank Christoph again