 Hello. Hello, guys. Hello. Hi. I'm Shyam, one of the organiser of Takila. So first question. How many of you are attending Takila for the first time? Wow. A lot of new journeys. Okay. How many of you are visiting Facebook office for the first time? Okay. Almost everyone. Cool. So if you don't know what Takila is, Takila stands for Test Automation and Quality Engineering Law. Yeah, that's English law. So we started as Singapore Appium Meetup back in 2018. So we were doing meetups for only Appium, the mobile automation framework. Then our folks told, we need more than Appium. Okay. Why don't we talk about the whole quality engineering? That's why we moved to a broader perspective and we began to talk about all the aspects of quality engineering. Okay. And we got the name from our community. So we had even for suggesting what could be our name. So someone suggested Takila, which is quite a cool name. And we adopted that one. That's the story behind Takila. And this is our 14th meetup in Singapore. So first of all, thank you Facebook for giving you the chance to be here. So first one, please tweet about this meetup with hashtag TakilaSG because we won't be having a kahoot quiz this time. So the best five tweets gets goodies. Okay. How here? So you will be getting maybe in the next meetup only, but okay, I have got some goodies here. So please tweet with the hashtag TakilaSG. So you can do it after the meetup. So the best five tweets will be getting prices. Okay. And this talk will be recorded. So in case if you don't know. So basically it will be publicly available after a few days. So I invite Raj to the stage and Raj will talk us about how Facebook do testing. It's quite interesting, right? I'm so excited. Thanks, Shyam. So welcome to Facebook Singapore. I'm so excited to have Takila in our Facebook office for the first time. And if you like what we say, we can have more of it in future. Today I'm going to talk about how Facebook moves fast with stable infrastructure. I hope that there will be some learnings that you can take it back to your work and apply it there. So what can you expect from this talk? At least three things. You can find out what our team does at Facebook. I know that there has been a lot of curiosity about what we're doing within our space within Facebook Singapore. And I think this talk should give you that 10,000 feet view. And this is the first time we're talking externally. So I hope that there's something new for you. Alongside, I'm also going to share our philosophy about how we think of testing and test infrastructure. There's a second point. And last but not the least, we will share top 10 learnings from last one year of our journey. So it was hard to compile top 10, but I thought at least we should share the top ones. How does that sound? Are you guys excited? So without any further delays, I'll start. So we are testing infrastructure for enterprise products. And we are part of a group called Productivity Labs. And as the name indicates, we do a lot of fun experiments and take it to all our enterprise products. And hence the name. And this is the talented team that I have the privilege to support. This team is 100% built out of Singapore. It was started in Singapore. We grew in Singapore. And believe it or not, this small team serves seven enterprise engineering locations across Facebook. So this work happens out of Singapore. And there is Shreya. She's talking later. And to tell you the truth, I'm just warming you guys for Shreya's talk. So I'm sure like you're waiting for that. So how many of you have seen this poster before? So that's an integral part of our Facebook culture. A few years back, this poster used to be on all our walls within all our offices. Our engineers are extremely proud of fail fast culture. And I would say that I think there's a lot of credit that has to be given to this mindset of taking risks and shipping faster. And this has caught us where we are now. But what happens when things break? So at Facebook, when anything breaks, it breaks big. Big as such, where people call the cops. And poor authorities, they can't bring Facebook up. So we have to be careful about what we break. And then it does get all over the news. So nobody wants that kind of press. So over the years, we learned that while we want to retain the part of move fast culture, I think that's something none of the engineers would want to compromise. But we need to change our mindset that we can't break things anymore. So when you have 2.8 billion users using your products monthly, we have a certain responsibility to keep our products stable. And that's our new mantra that move fast but with stable infrastructure. I took the liberty to add tests to it because how can you move fast without breaking things if you don't have a stable test infrastructure? So test infrastructure is a subset of your overall infrastructure, which is amazing. And that led to the mission of our team. So we want to provide stable test infrastructure for enterprise products. That makes it easier for our engineers to write good tests that can provide high quality signals. I know it's a bit of mouthful, but the key word here is engineers, that we build stuff which is for our engineers so that we can provide them all the toolings that's required for them to be able to write tests in a much easier way with least friction and they can still move fast. And what happens when you do that? The quality automatically improves because you want to remove all the obstacles which are coming in the way of your engineers while writing tests. So that's our goal. To see how industry looks at this problem, I just took a quote from this famous book which is How Google Test Software. And if you see, I think there is a bit of a similarity in the way you think about it that your developers own testing, developers own quality, and teams like ours who build tools for developers enable them. So it's a very clear message that every product team is responsible for development and testing of their own products. And while we need to provide productivity, engineering productivity tools to them so that they can do a better job at it. So this is just to give you a glimpse of that. We kind of mean in the industry there is a common thinking around how this future of this thing should look like. And we built on top of our existing state of the art infrastructure, a common infrastructure which is used across Facebook to support our enterprise use cases. So we have unique use cases which are very different from consumer products and we leverage our existing infrastructure and build on top of it to support our goals which is for enterprise products. At this point it's also worth kind of talking about what our team doesn't do because you have to be very clear about I think what your team stands for so you can keep that focus. So the number one thing we don't own writing test for are enterprise products. So that's not our ownership, our enterprise product engineers who are called enterprise engineers or software engineers, who do write test for their own products. We don't just preach. So this is to say that if we don't help them write test or if we don't write test for them, then what do we do, right? So we have taken a more pragmatic approach here that we should work along with our enterprise engineers shoulder to shoulder and get into those products. So each half we identify some of our high impact products and we get involved in writing test along with our engineers. We kind of pretty much pair with them. We have a small team where we go in. We understand what are their challenges, what's stopping them, what's slowing them down, is it the learning, is it the training issue, is it the tooling issue, is it the culture issue and we go in there, solve those problems because you could do all the tech talks, you could do all the wikis and write blogs and you could create infrastructure but if nobody uses that, then there's no point, right? So it's very, very important that as we build things, our products adopt it and improve and that we do it through getting involved at a level where we can understand their pain points and come up with solutions for them. And last, we don't limit our work to end to end testing. So though we know that I think the engineers by nature think they're more inclined to a unit test, integration test or they may not have a great end to end test focus, we could limit ourselves to just fill in that gap but we try to look at it holistically and it's not even about end to end or unit test or integration test. There's a lot of new work happening in this field. I'll give a glimpse of that. We get involved in those kind of projects there. So our work scope is much broader. So it's everything that's test and reliability falls under our scope. So with that, let's look at the most important of the last part of the talk where I want to spend most of the time which is what are the learnings from this young team in the last one year? Now some of these learnings you may have heard of it and it may be just a good reminder but some of it may in fact challenge the conventional wisdom and you may not agree with it which is fine. We will have a Q&A at the end of the talks. You could bring your points, point of views and I think the whole idea is to learn from each other and then we can also talk offline about some of them. So before we start, right, I want to start with this point that writing good tests which are reliable is an extremely hard job. A lot of people underplay this and they think writing tests is easier and anybody could do it while writing your product code is more complex but I would say if you're talking about writing tests which are not good, what's the definition of good? A test which is not finding things, issues or maybe it doesn't have a high probability of finding something or writing tests which are not stable or which are not reliable, that kind of work is easy. You could write such kind of low quality tests but when it comes to writing really good and reliable tests, that work is not that straightforward and I would say that requires solid engineering and some of the times I've seen that it's even harder than writing even your product code. So I want to start with that. I think that's an important thing for us to all acknowledge especially if we are in this space. So let me start with our first learning which is one test is equal to one idea. It's a pretty basic thought. You may think like there's nothing something that catches your eye but the idea is important in a way that you want to make sure that your test has just one goal, has just one objective. You're trying to only get a feedback on one thing and that should be the scope of your test. Don't try to test too many things in one test and why we can look at an example just to make that kind of illustrate that point. So let's say you are after the work you're sitting with your family maybe watching a movie in the hall and you get an alert on your phone that one of your end-to-end tests like buy new product, it has failed after running for 340 seconds. So what can you make out of it? By just knowing that your product has an important test maybe and a buy new product test has failed. So it doesn't give you a lot of signal there about what went wrong. So you may wonder what step failed. Did the landing page itself crash? Were the users not able to search the product or were you not able to add it to the cart or was the checkout now working or did the payment fail? So there could be many things that could go wrong in this one flow of buying a product and if your test is such a big test or such a big end-to-end test all you know is that your test has failed and now what you have is go back to your Mac or Windows and start looking at the logs and start troubleshooting. So how about this? Let's say you have a test which just checks for the landing page. Does it load properly? You have another test which looks only at the search of the product. One only focuses on adding to the cart functionality. One focuses on the payment test functionality and there could be a test which may be actually causing the overall test to fail which could be a checkout. Now if you get the same alert on your phone and if you look at this and say okay my checkout test has failed it gives you confidence that the rest of the things have not failed. So what you could exclude is that the rest of the things are working fine. So that's the problem with this functionality which is checkout. So even without looking at the logs you have got some high signal feedback from your test that you can work upon. So not only this gives you a high signal when you write your test and one test is equal to one idea but it also makes your code more modular. It becomes much easier for your peers to review your test code and do a better job of giving you quality comments rather than having a test which is say 500 lines of code and then obviously when you break down your test they run faster. So now your test doesn't run for 340 seconds maybe it runs in few seconds so you have got the speed and this could translate into a lot of these tests can run in parallel. So now you are not running a lot of your tests long running tests in sequence smaller tests can run in parallel and could complete much faster but I will argue that the smaller tests have lesser chances of dependencies on each other or on the previous steps and that's why it would maybe time out lesser or maybe it would have a higher chances of completing without any issues. So overall I think this learning tries to give you kind of a reminder that I think when you are writing test it has to be very clear that what is that you are trying to test in a lot of ways. Let's look at the next one invest in writing test in the right way now you may say it was the right way there could be many ways to do it. So I could rephrase it saying that when you are writing a test think about long term life of your test if the test works fine on your machine at that point in time that doesn't mean that test would work fine as part of CICD and even if it works as part of CICD it would work fine six months from now. So you want to make sure that once you write your test and it's shipped then pretty much you don't have to think of it you don't have to spend effort on maintaining it it should be smart enough to run for a long long time in future unless you change something which requires you to make changes to the product. So you may say yeah so don't cut the corners I think that's the kind of the advice so what could be the reason why your test would fail? So one of the most common things that your test fail could be because of the prerequisites what we could call it Broglius setups which could be the users that you use for your login it could be the data that your test relies on it could be the teardown that did not happen successfully it could be even your environment was not set up the right way so these are all the things that could attribute to your test failing and giving you a lot of noise so let's look at an example to kind of make this point so look at this functionality that you are trying to test a functionality for a minor that functionality is to be used by someone who is 16 to 18 year old so that's your business use case so what do you do you create a permanent test user and let's say you create a user who is say 17 year 6 months old when you created the user your test run fine for 6 months what happens after 6 months the person becomes 18 right and then that day your test starts failing so you would have a perfectly stable test running fine for 6 months and you did all the right things and one day when that person becomes 18 your test starts failing now you may have to kind of like see that was it a false positive that you are starting a regression and you spend effort on doing something that you could have avoided so what could have been the solution here the solution could be that every time you run the test you create a new user and you control the age of the user you don't assume that that user is going to remain less than 18 forever right so if you would have gone that one extra step to make sure that you can control the age of that user forever you would not run into this trap so this is just an example to illustrate my point you check I think I think the slide seems okay good so that's where the concept of data builders come in so let me explain what a data builder could do or how it could help you could pre-create the data before your test runs so you could have a builder that goes and does this thing as a kind of a setup activity even before the test runs so that you don't have to worry about the state of that data to change even accidentally one thing could be that if you use permanent data somebody else could be using the same permanent data and as part of a different test they may change the state of that data which may break your test and that's a collision that happens so how would you avoid it you don't rely on someone's data you have your own data builders before the test because if you create these data as part of your test your test is going to become slower so there are ways like you could make it cacheable, you could make it available but the whole idea is you're not relying on any existing data you're always creating your own data have full control of the data so let me give another example where this may seem useful so take the same example which we're looking at the buying a product so let's say landing page works fine search product works fine search works based on if the landing page works and then add to cart test is failing and let's say that there is a genuine UI bug there and because of which this is failing if you don't have a data builder your subsequent test would be blocked because you have your add to cart test failing you now can't check out you can't do the payment because you have a bug blocking your flow for the rest of the use case now you're not getting signal on those last two tests there could be another issue there so how could you overcome this with a data builder so let me give an example let's say this test works fine the next one is working fine this fails but how about if your checkout test is not dependent on add to cart could you do that could you make your checkout test not dependent on add to cart but you could maybe like let's say the bug is in the UI for add to cart and you have a data builder let's say called draft orders data builder what it does is it goes and creates draft orders basically it adds products to the carts and keep it there so now you have a data builder which is creating the data for you in a way where you could go and do the checkout right so this is creating a lot of draft orders for you and keeping it in the database for you to pick it up you could do it through the backend you could do it through APIs you could actually basically bypass this failure that's blocking you and now your test relies on this and it works without any failure dependency on the previous test and the other bonus that you could see is that if I'm trying to check test out and if I'm doing add to cart I'm making my test slower because I'm doing a lot of operations which I don't need to do again if you go back to one test is equal to one idea concept why do you want to add something search something and then add to the cart when your focus is checkout just do that so that way your test becomes less dependent on something else and you could leverage it okay so going to the next learning which is bit related make your test isolated this is a common wisdom that is better as long as you have your test isolated but especially when you have even end to end test a lot of people think like end to end mean that I have to do a whole round trip with all the components involved and your end to end could mean that you have your one application talking to another application and then other application does something which triggers something back and now you have a complicated flow jumping between different applications different products it could involve third parties it's very very important that even when you do an end to end test you have to define your boundary if you don't define the boundary and just take end to end in literal terms you would have an extremely noisy and unreliable test which would fail more often than not so it's fine to mock external interfaces that you don't care about so I know that I think a lot of people would think like if it's an end to end why do I want to mock it I want to actually test it with that third party that's the whole point of doing it but what I would say is you have to decide that are you testing your product or do you want to test your product's integration with some other product if you're testing your product then you need to limit it you need to define boundary and say that I wouldn't let anything to go outside our systems so that I have more control on our environments okay so let me give an example here so the same example that we are working on let's say check out works fine payment fails now why is payment failing so you could go and say that this payment service talks to say visa or master card gateway and you're actually making a call to it and you're trying to do a one cent payment and that's how your test works but it could be that the external service is down it could be that your ISP is down it could be an issue with your proxy servers now you have too many variables that you're relying on and any of those things failing could cause your payment test to fail now if it fails what it tells you something is wrong with this end-to-end flow with that payment service provider but it doesn't tell you where the issue lies if you really want to find the issue in your payment business logic you don't care about that particular external services down or not you know that your test would recover if that service is back again so you don't even want your test to fail so I could go and implement a very intelligent mock which acts like that whole payment service the key thing here is that you don't want to mock the business logic part of it it's very very important that at what point you mock it so you want to literally simulate a system that's acting like a payment service payment gateway so you're actually simulating your third party within your environment but removing all the network and all the other factors which are not in your control so now you're not skipping any business logic you have a kind of intelligent payment gateway mock and now your payment test passes then you know that it was an issue with the third party being down the only thing that could be left is that you may say how do we make sure that that particular third party is up and running and I don't have that real problem there you could have different ways to do it you don't have to do it as part of every payment test right you could have a separate monitoring test that just make sure that your integration touch points are working and keep your functional test functional and don't mix them up by complicating it talking to third parties which could be not in not in your control that make sense okay so don't be the boy who cried wolf right so that's my next point if your test fails 8 out of 10 times what happens most of the time if it's failing and if it's a false positive it's not really an issue with the product but something going wrong with your test what happens people start ignoring your failures people start ignoring your test we can safely ignore it and once when really it failed and caught a real issue people will still ignore it right so you so it's very very important that if you don't make your test reliable and if there's too much noise in your test then people will start relying less on your test because they would think that most probably it's an issue with the test I don't have to look at it and you would miss a genuine issue there so that's a point like noise versus signal in our terms noise is anything which is say false positive which is not leading to anything productive waste of time that's noise signal is what you want to find through that test maybe you want to find a bug that's a good signal so you want to reduce the noise you want to improve the signal now you may ask like how right there are many ways you could do it one of the very simple thing that we did which helped us a lot was if your test fail it should fail for a genuine reason if the test shouldn't fail it makes sense that that particular test shouldn't even be run then on the fly dynamically skip that test don't fail it it's much better to skipping a test than failing it and causing a lot of noise so decide what you want to do with your test do you want to fail it do you want to skip it the third thing you could say like yes we know that this test is failing it's not going to happen in two hours I'm not going to be able to fix it in two hours there is a lot of dependencies that we have to properly fix this test so you can temporarily disable this test so now you have three options right you want your test to fail when you know that it should fail when you know that your test shouldn't fail it's just noise you detect that noise and skip your test and when you know that your test is broken it's failing but it's failing for the wrong reasons and you need to do more work on it then better to temporarily disable that test rather than having it run as part of your every CI CD pipeline and say oh yeah I know about it let it fail the signal you are giving it to people is that it's okay for your test to fail right you rather disable it then don't run them why to waste your resources give a false kind of report to people when you know that that test is not doing what it is supposed to be doing so let me show this as an example that if your landing page test is failing does it make any sense for you to say maybe run any of these right if it's a UI test then I know my landing page doesn't work no point in trying to go and search something there no point in trying to add something to the card no point in trying to check out something I could have failed five of these tests and I could have raised a lot of alerts and an engineer looks at it and says oh payment is failing and really you know that it's not really the payment that's failing it's actually the root cause is the landing page is failing so what you could smartly do is have one test fail and let's all skip so people know what's failing they only concentrate on that and they can ignore the things which are skipping and you have a good reason why they're skipping you can keep an eye on the things which are skipping let me go to the next one kind of related to the previous one if you don't prioritize fixing your disabled or failed test then nobody else will right we have to set the example we have to leave everything we are doing if you're adding a new test or if you're working doing a sprint or planning you should stop everything that you're doing and fix that broken test or fix that disabled test this is quite important because the more number of broken and disabled tests you have one those tests are not doing what they are supposed to be doing so you're either missing sudden bugs that are slipping through it and second your dashboard looks red people get used to it they think like it's always red so we can just ignore it so you're creating a lot of tolerance for red in your dashboard so my message here is the thing you have to focus on fixing your test on the highest priority that's how we have done it on our team it's better to have quality tests rather than have a lot of quantity so you prioritize keeping your house in order keep the trunk always green that's the message right if your trunk is not green you have a lot of things failing on master on what confidence are you releasing those things out so you want your trunk to be always green or as much as possible you want to get close to that point there are many ways you could actually do this one of the very common ways people build their dashboards you could have a big monitor where you could flash your dashboard which says how many tests are broken how many are flaky and that metric puts the pressure on everyone in the team that you have some broken tests you need to work on them, you need to fix them that's one way the other thing which has worked pretty well for us is that we have automated rules that we use to send alerts to people when their tests are broken so you find out the owner of that particular issue if it's a code issue then engineer has to fix that particular thing if it's a test issue then the test owner has to fix it it could be the same person for both of these things but you send automated periodic reminders, we start with like two days your test is broken for the last two days, what are you doing and you don't have to do it like manually you could automate all these things your test is broken for one week two weeks, are you going to fix it so something like this you could have like chat reminder or message basically not naming and shaming people but nagging them poking them, kind of reminding them that it's important to fix it so you don't have to do everything, you could use automation to automate a lot of these mechanical, boring stuff but still get the attention of the people next one, so delete test, I would say this is one of the most controversial one because we tried to say that we should delete test that are disabled for more than 120 days, a lot of people felt that why should we go so extreme why are we talking about deleting test so our criteria is that if your test is disabled for more than 120 days submit a code diff and just delete the test people would argue that no, no, no it's important it's just that I don't have got the time to work on it but if somebody has not got time to work on it for 4 months then the chances are it's either not important or people don't care or nobody is going to ever fix it so it's much better to be honest to yourself and not live in a false hope that you have test and they're just disabled and they're just broken and they're going to become green one day it's not going to happen magically unless people work on it so you have to set some hard kind of rules saying we fix these tests within 4 months or 3 months and beyond that point if the test is still broken we are just going to get rid of it so you constantly clean up your bad test that's how the quality of your test suite is going to improve otherwise over a few years you would have a suite where 30% of the tests are broken and you would say this whole suite is waste you have to get rid of the bad test which are not doing what they are supposed to be doing write them again a bit extreme but works going to the next one end-to-end testing is not a silver bullet so we all know it we have heard about test pyramid but I still wanted to have this as a reminder that if your problem is something else if the problem is that you don't have a good unit test coverage you don't have a good integration test coverage don't try to solve that problem with end-to-end test you're not going to solve that problem you're going to make it worse so I would say don't even bother trying to write your end-to-end test if your unit test or integration test are not solid or if they are not doing what they are supposed to be doing or the coverage is not great it's very very important that 90% or 80 to 90% of your test should be unit test and integration test you want to keep your end-to-end test really for those critical few flows that you care about the most so you have to have a conversation with your team and don't just get into that trap of filling the unit test or integration test lack of their coverage with end-to-end test all of teams have done it they have failed epically the next one end-to-end test shouldn't be your first choice, this should be your last choice so think of it this way work up the pyramid so next one there is a 70% fix rate by developers if test results are reported within 10 minutes for each pull request if you can achieve this every time there is a pull request there are set of test which are run now you can't run all the test now that's a test selection problem that which test to run but that could be a talk of its own we could do that in future but assuming you know what test to run and if you can afford to run those test in 10 minutes and give that feedback to the engineer when he is in the IDE when he is writing his code and when he is trying to ship that code there is a 70% chance that people will fix it this is based on data from within facebook and why 10 minutes you could say why not 1 hour this is like after a lot of experiments we came up with 10 minutes is you have a code for review you have some collaboration going with some human is giving comments on that code and let's say you may actually want to ship that code or you want to merge that code into the master within say few minutes you don't want to do it in hours so your test have to be fast enough and smart enough to give that feedback to the engineer within that time when he is working on maybe human code review comments and you tell him that that test is broken you can fix it and your test would be a lot more valuable because now you are preventing some of the things from even going into your master so that should be your I would say the sweet spot is I have highlighted that in the box there this is your sweet spot if you can make your test run as part of your pull request step before your code gets merged into the master you are going to get the maximum ROI what's the next best thing so next best thing is LAN and push blocking test so you did the previous one and you still need a bit more coverage or a bit more I would say safety net that's where LAN and push blocking does come handy so look at this your CICD flow LAN blocking test run when you are merging your code to the master so you want to prevent some bad code getting merged to the master so that's your LAN blocking and then you want to block some bad code from going out of your system that's your push blocking so if you have LAN and push blocking test you could prevent your master getting polluted, you could prevent some bad code getting shipped so one way of thinking this could be that not every test can be LAN blocking and push blocking if you make every test as LAN push blocking your LAN and push would get too slow you can't afford to do that so you have to identify what are your high quality high signal test high quality in the sense that you know for sure that if they run 1000 times, 995 times they would work fine if you have that level of confidence this is high quality high signal means that if this fails I don't want this code to go through high signal means that it's such an important thing for you that if this fails you don't want this to go through so you want to stop it there if you have that then you could have say LAN blocking test fail then it stops the code from getting merged to the master if you push blocking test fail stop the code getting deployed to your production tiers the last one is a code from Abraham Lincoln that could take away for us there that if you have to spend 6 hours cutting trees you would spend 6 hours like sharpening your tools so if you apply to our kind of thing you should spend heavily on building your test infrastructure which could mean a lot of things but it could be your CICD pipeline it could be your frameworks, it could be your fallback mechanisms it could be your smart mocks if you do that your efforts for writing test would automatically go down because a lot of times you may be struggling with your infrastructure but not with your test so yeah, you have to follow your all the best practices and some of the tips we talked about while writing your test but it's even more important to have a strong foundation on which your test run so I will spend a lot more time doing that right and then writing test would become easier so with that I'll summarize what we learned today is one test so it's pretty much focus on one thing get that right make test isolated doesn't matter whether end to end or not get comfortable with taking out the variables which are not giving you any signal but they're only making your test less reliable and focus on testing your product within the scope of your boundary you could have another test which could do the rest smarter ways of doing it write test the right way which is where think of long term don't think of today or tomorrow okay let me just kind of have this test and then we can later fix it I would say that rather you write quality test do the right setup, do the right year down have full control on your environment make it in such a way that you think this is not going to fail unless there is no other things changing in the system noise versus signal you don't want your test to be less trustworthy this is more on the trust it's also more on the productivity trust in the sense you don't want people to start feeling that these things are not good enough I can't rely on this at the same time you don't want to slow down people every time a test fails you look into the logs, you spend half an hour and say oh this was an issue with the test user that's not time well spent that would be considered loss of engineering productivity so you want to reduce the noise you want to bubble up the good things that you can find from your test which is the bugs you want to not put your broken or failed or disabled test in the back burner you want to attack them you want to be on top of those you have to set examples you have to make sure that every time a test breaks kind of late in the evening you see that it's important that you want to know why it's happening and not just have that attitude that it's okay for test to fail and there's the key message and if you're not fixing your test after a while you could decide what's the right window for you then rather not have them delete those tests and have a really quality suite where you could say that this suite is high quality everything works here 99% of the things work there and I have high confidence don't go overboard on the end-to-end test as we spoke about this may not be the problem that you're trying to solve problem could be elsewhere so focus on the right problem run test earlier as early as you can run you can run them as part of PR that's great give that early feedback if you're a engineer they would appreciate it they could stop their code from getting merged to the master fixed rate would be much higher use LAN and push blocking test if you have good test and if they're failing or if they're not being run and still the code is being released what's the point of having those tests when you could always say that oh, this issue happened in production I had a test which failed somebody ignored it or somebody did not run it slipped into production I think there's no point in regretting later if you really care about that test you have to then set those rules saying if this test fails I want it to be stopped and if somebody is overriding that then he's taking responsibility of why he thinks that's a good idea if your test is reliable then you should be confident about it and the last I would say invest in building a stable infrastructure it would go a long way in helping your teams write reliable test in a less friction way so with that I think there is a lot of great work we are doing which is going to come as we say in Facebook, I think the journey is 1% done so that's to say that we have just started there's so many things that we could do we have not focused a lot on mobile platforms, performance there's a lot of that stuff we have it in our pipeline there are also a lot of great tools whether it's first testing, whether it's mutation the leverage artificial intelligence and machine learning algorithms and we want to adopt those tools how about if you don't have to write test if your test gets auto-generated and they are as good as your functional test then you would want to do that but this is where you have to see what to use where and there's a lot of research work that we are doing in exploring those frameworks and bringing it to our space and by the way we are also hiding so there is a survey which would come to you if you have some recommendations you could always recommend some people and with that the Q&A we would do after Shreya finishes her session so I'll invite Shreya to the stage to talk about the framework reliability yep so as I've been introduced already my name is Shreya Bhatt and I work with Raj for the enterprise engineering team so like most of you guys might have already experienced E2E has bad PR so we most people don't want to write end-to-end tests and most of the times we are like remind me tomorrow I don't want to write end-to-end tests right now the reason for that because it can seem like marathon running is faster than test runs and at the same time it's like flipping a coin your test failure could be a genuine product issue or it could be a flaky test so that is one of the other reasons why E2E tests have bad reps and this is another one not readable code for example what's that unhelpful comment in between about why there's a sleep in this code right so this is again something that we always are plagued with so Facebook also has these problems we have slow unreadable E2E tests and that is the reason why we have test frameworks that try to take the weight off of the test writer for all these issues so we have a bunch of frameworks we have integration tests and unit test framework and one of them is an open source one which I don't know if you guys know about it's called GEST and then we have backend services end-to-end frameworks we have mobile UI frameworks we have web frameworks and in the web frameworks we have Selenium based framework as well as something that is new and upcoming for the sake of the stock I'm going to use GEST E2E as the baseline for talking about framework reliability but the idea that I will be presenting can be used for any kind of framework so it is somewhat generic so what is GEST E2E GEST E2E is a framework that is written on top of GEST the unit test framework that I was talking about and it uses Puppeteer for those of you who don't know what Puppeteer is it is a Google Chrome API which helps with headless mode as well but it has the other mode as well for running these tests but one of the major advantages of using Puppeteer is that it gives you a lot of control over what is run as well as it gives you control over the network as well and the timing and the JavaScript rendering and all of that how is this useful for us I'll explain it to you in the next few slides but Puppeteer has been a really good framework to start off with so the agenda for this talk is flakiness and that's something that I think it's plaguing all of you so we will be talking about what is flakiness how we determine flakiness why we track it even the common reasons for it and how we fix it right so let's start off with what is flakiness flakiness is just understanding test determinism what is a deterministic test if it's failure tells you that it is a genuine product or an infrastructural issue then it is a genuine test failure if not it is a noise like what Raj was mentioning before so test determinism can be defined as a state where failures are repeatable until it is fixed if it's not fixed and it starts passing it's of course not deterministic right and if there is no issue and it's failing again not deterministic so how do we track this determinism of test so we have something called flakiness score the theory is up there probability of a non deterministic failure but what it actually means I'm going to talk about right now so why do we even track flakiness score and why do we even care about test determinism right this is a typical code cycle at facebook we write code create pr we merge to master once the code is in the master we build and release the code where does our test run here it runs everywhere that means that if we don't track flakiness score we will be giving the owners of these all of the steps poor test signals and at the same time we are losing resources we are simply running tests which we know are not in a good state and simply wasting resources on them and at the same time confusing the developers and the RE folks with poor test signals they might be thinking that there is something genuinely wrong but there is nothing wrong with the product it's just your test is flaky right so how do you quantify flakiness so this goes back to the flakiness score how do we actually measure the term flakiness score right test periodically when I say periodically it is 3 to 4 times every hour why do we do that why do we use so many of our resources to do that because whenever our test is running we are collecting key metrics about the test the three key metrics are time taken the test run results and the flaky retries I think the first and third are somewhat understood by everybody but the flaky retries is something that is what determines the flakiness score at facebook let's take an example of a typical those periodic runs are called continuous runs so let's take examples of some of our continuous runs so run one the test has passed is it deterministic yes the status of that test will be pass the test fails we rerun it it passes again is it deterministic no and it is termed flaky the test fails we rerun it fails we rerun it fails again deterministic but the test is broken obviously right this is an example of a flaky test this is an actual dashboard view of one of our flaky tests and every one of those bars are one of those test runs you can look at it and you can see the blue is a timeout yellow is skips red is failures right so this had a flakiness score of about 40% so you can actually see that this test is extremely flaky so what do we do with the cumulative flakiness score that we get so all of the metrics that we collect we accumulate it over a day we have like 100 tests that run right we accumulate all of this and we use it in our test selection process so how exactly does pfs or our flakiness score get used in our test selection whenever you create a pool request and you merge to the master we make sure that the flakiness score of your test is less than 10% otherwise it won't run right and the time taken is less than 60 seconds the status of your test has to be good so this reduces the scope of your test and makes sure only the best of the best run on your pool request in the master it's a different story so this is because whatever is in the master is the one we use for our periodic runs so it doesn't matter what the flakiness score is how much time it's taking and the status is because we want to collect the metrics right so for that reason it will run all of the tests and we have thousands of them but the frequency of those runs will reduce once your test is not in a good state for example if your test is flaky or if it is broken then the frequency might reduce once in a day or so because we don't want to waste resources so once it starts passing again it will go into the good state automatically and it will start running those tests periodically three times in an hour now in the build and release stage we use it's even stricter because what we do is we merge a lot of pool requests from our code here we want it to be a stricter measure and that's why the test determinism here has to be less than 3% so the flakiness score has to be less than 3% and the time taken is still less than 60 seconds and the status has to be good so this is how we use our flakiness score to ensure that we are moving fast but we are giving the best of the best tests into all the processes now how exactly do we make our tests less flaky one of them I think Raj has already covered but I am going to get into it as well this is to fail fast now has the test page loaded if not then you skip your test otherwise you run your test what does that give us it saves us resources it gives us better signals like whatever Raj was mentioning where we know for sure it is because the test load has failed and not because your test functionality has failed and it saves us a lot of time the other one is how to pre-create data why do we pre-create data because creating data can be flaky for example let's say you are writing into the database and there is a rate limiting on your system your rights will fail that's why you will have to do a retry on the rights flaky right so if your rights can be pre-created earlier before your test you are removing that flakiness away from your test data propagation might be slow it can so happen that your right might have not completed and you are already reading that data already trying to read that data again causes flakiness and that is the reason why you have to make sure that your rights have completed before you do your reads in your tests creating data might be slow this is quite simple right we all know that creating data can be slow especially if there are IO operations involved creating data can involve many dependencies so it could be that you are dependent on an external application and that application is down at the moment of your test run but why should your test run not complete because an external application is not running right that is the reason why you pre-create all your data and then only you start off your tests now how do we pre-create this data so there is something called an object creation code this is done using interfaces so we have interfaces recognized by our test infra and then we just implement these interfaces we have something like an implementation method where we say that this is how you generate the code and then we also give the test infra a structure that it understands that this is the structure of the data required right so that's all in an object creation code but what does test infra do it starts calculating how many records you need right so it's some pretty complicated data which is now let's say you have your object creation already in place but then you have 10 tests which are using this object right so the test infra should know that there are 10 tests that will use this data and we have continuous runs then we will have local runs we will also have runs on pull requests so you have to take into account all these runs and then create that much of data right so test infra does that which is calculates the number of records required for all the tests and once it is done it pre creates data and it caches this data and when your test runs it will just say something like data dot get and we get it from the cache it's important to note that every one of the test run gets a unique record from the cache maybe the data inside the record is the same it's duplicated maybe but the record itself is unique and that means that the cache is hashing every one of your requests and whenever you request for the data and it is passed on to your test it's removed from the cache so what are the wins for this the setup and tear down is not part of the test run time and your test is only checking the functionality the tests run faster of course because now you don't have to worry about your data creation and the flakiness in the data creation is not equal to the flakiness of your tests now the last bit for framework reliability is how do you make the framework do the work and this I will delve into what just e2e specifically does this is a new framework that I mentioned earlier just plus puppeteer right I'm going to take the example of the flow of the registration that you guys went through before you came in so this is a welcome to facebook start registration page so you click on welcome and then you click on the social button right one second okay so a typically good test should look like this go to the URL click on welcome and click on social but we all know that this is now how it works right so we have to wait for that welcome button to be visible and we have to wait for that screen to be completely loaded so we add something like a wait for navigation so we say something like let's wait for the page to load for 200 milliseconds right if you're smarter we'll use a selector to wait for that selector similarly when we click on the button social you would say wait for that selector social to come up before you click on it right but facebook's idea is you should just provide the test steps and the framework should handle the rest of the things why should you have to say that you have to wait for welcome or why should you have to say that you have to wait for social it should be understood by the framework right so how do we do that now let's just go step by step the first step is you go to the URL right you know that a URL is not or a page is not loaded when the JS activity is still going on and the page is redrained to rank right so what just e2e does is it tries and understands if there are page re-renders or JavaScript activity going on in the background and then waits for it to complete the other one is network traffic if the network traffic is minimal that is the only point at which we can say that the page is loaded why do I say minimal because you could have polling activity going on as well you can't actually say that there should be no network activity at all if it's at the minimal then we know that the page is loaded right so we wait for all the network activity to go down before we declare a page to be completely ready now for example we are now looking at this page and we see that the page re-renders have completed there's nothing that is changing network activity is minimal yes we are good to go the page is now ready for interaction now we can click on welcome after we click on welcome the page has now changed right now should we still worry about the navigation should we still tell our test that we have to wait for the social button to come up no so again what we do is we check is the page ready are there re-renders happening are there is there any network activity happening if not good to go next we wait for the element readiness we check that the element is ready for interaction how do we check that we see that the element is visible it is enabled and clickable only then we do the interaction all of this again like I said is taken care of by the framework so our test ends up looking like this what we initially started off with our test is just this you go to QRL you click on welcome you click on social everything else the framework is taking care of it is doing in the back end right so to summarize what just E2E does is it takes a look at all the network activity it takes a look at the GIS activity page re-renders and when it is trying to interact with the elements it auto scrolls through the elements waits for the selector to be visible and enabled and then it tries auto retries as well because sometimes your framework might not be the best at determining that the component is ready right so you retry which is completely fine so the wins with this framework is that there are no sleeps in the test there are no repeated statements while or waits in your test everything is taken care of there is no stale element references in your test because you are only interacting with it and it is ready and you have readable code as well as interactions are happening as soon as available as soon as the elements and the pages available for interaction we are interacting with it so we are not saying let's wait for the selector and then with this time out we are doing it immediately after so that means our test run much much much faster with flakiness there is one other aspect that is like really really just like okay fine so our framework is in a very good shape right now but there is still a test which is flaky now what do I do how do I debug this flakiness so our alerting systems provides us some help in terms of giving us alerts whenever there is a flaky test so what actually happens is let's say your flakiness code it goes beyond 30% then you get an issue raised against yourself which tells you that your test is going flaky and what does the issue have it gives you a dashboard which actually tells you all the common failures that we have seen the most common failures it will be a table which will be like this is a failure we have seen 100 and 1 times and this is what we saw 20 times so you know which one was your most common failure or flakiness issue and then there are flaky test runs as well so that if you want to drill down deeper you can drill down as well and at the same time for a broken test you would have the stack trace in it because like it's not an accumulation right multiple tests multiple test runs have failed and that's why it's resulted in a broken test so we get a stack trace of the test we get the test run a link itself and at the same time we have something called bisect which is going and checking all the pull requests that went in recently and it tries to find the pull request that might have caused this issue so a possible pull request is linked to your task and then you get it from the alerting bot we also have a screenshot and video this I think is already provided by Puppeteer out of the box so yeah we get this as well whenever there's a failure it gives us both the screenshot as well as the video and at the same time this is something that we have built which is this is the current result and this is the last past result just so that you can compare the two side by side and try and understand we're building something similar for the logs as well where we are trying to say that at this point so this is what we had with the last past run and at this point we are seeing a failure so that you know exactly what might have gone wrong. We also have additional logs like JavaScript error logs we have Sandbox console logs we have test console logs as well as the HTML dump for example if let's say your selector was not selected properly that way you can just with the HTML dump go and check if your selector is still working. To summarize how do we maintain our flakiness for all our frameworks we track test health metrics we let the framework do the heavy lifting and we help debug issues with ample logs that's it from my side I think we are we can open the floor for Q&A