 Okay, yang ini okey, kan? Just like this. Or the one has to do it like this. Like this. Okay. Okay. So, hello everyone. So, yeah, I'm Juniwi. And today I will be talking about like my experience in contributing to TANAS for the first time. So, of course, first question is, okay, what is TANAS? So, okay, a little background. So, my name is Juniwi. I'm a data engineer at ST Engineering. I come from a background in aerospace engineering and computational modelling. And of course, the topic of the talk, which is something that I love and I contribute to, which is TANAS. So, what is TANAS? Do we have anyone from data here? Okay, 1, 2. Ya, so, you all know what is TANAS, right? Okay. So, for those who don't know what is TANAS, maybe this is what you think of. Maybe a big TANAS? Giant TANAS? Or you might be thinking of a red TANAS. But yes, they are really cute. I love them too, but we're not going to talk about these TANAS. Instead, we'll be talking about the library called TANAS. So, TANAS is an open source data analysis and manipulation library that is written for Python. So, it is written for Python, doesn't mean that it is written in Python, by the way. So, the core concept of TANAS is that it uses data frames as a fundamental high-level building block for data analysis. R? Okay. So, I think for R, you have a concept of data panels and stuff. So, what TANAS aims to achieve is something similar to what R could do, which is having some form of data panels. And, this project, TANAS has become so popular that it is now sponsored by NumFocus and it has over contributors. And, by the way, this logo might not look so familiar because it's the new logo for TANAS. So, why contribute to TANAS? Number one, I'm a data engineer. So, I use TANAS in my daily work. Pretty much every single day, every hour, almost every hour. And, when I use TANAS to do my work, I spend time looking through the docs. So, I try to look at the docs. But then, somehow, I will still end up googling and find my way at Stack Overflow. So, there's the docs there, right? But then, why do I have to keep looking for answers at Stack Overflow? So, it's been a bug for me for the past one year. So, okay. What better way to mark my first year as a data engineer by contributing to something that I use? Ya. It's something that I'm familiar with. So, why not? So, well, we talk about, I want to contribute to open source. I want to contribute to open source. But, how do you get started? So, this is how it all started for me. Okay. So, I was looking for issues that I can work on in the TANAS repo. So, I saw this particular issue. So, what happened was that someone had issues with multi-levels or multi-indexed levels. And then, found it as a bug. So, it said, help. It started working as I think it intended according to the docs. So, can you fix this issue? Like, anyone can help? So, but it turns out that no, it is the intended so the docs multi-level, multi-indexed levels it is working as intended. It's because, as it turns out, the person misunderstood how set levels work. But, why did he end up misunderstanding it? Unfortunately, as always, it's usually because of the docs which caused the confusion. So, the co-contributors decided that, maybe we should keep the issue and look for contributions and look for someone to help out. So, I was looking through other issues and I was thinking, should I contribute? Well, I use multi-index in my work but I haven't contributed to TANAS before. So, how do I start? Will they allow me to contribute? So, I just say, well, okay, I could give a try improving this docs if you're okay with it. So, that's it. Why not? Well, they're welcome to contribute. Okay, so, I saw that message like, yes! Time to get started! But, first, before I start, before I have to work on the poor request, there are some rules to contribute to TANAS. So, they have this whole contributing.md which is really very long has a lot of sections and I have to keep looking through, like jumping through sections to get through how to contribute. So, the first thing that I saw was number one, version control. Also known as version control we use Git, but Git is not so easy. And when we are contributing to open source project, we try not, we try to be a good code citizen, we try not to be a code Git, you know. Ya. Because we have many, many open source contributors who are working on different issues, different branches, so and so forth. So, you need to have some rules with doing version control. So, some rules, one branch, one feature. Maybe you want to work on multiple features, because you say, I want to contribute to a lot of features, but sorry. One branch, one feature, so that people know what you are working on for that branch. And I said step zero. Ya, I didn't say step one. So, step zero before you start work. Fog the repo. Fog the repo, not Git clone, is Fog the repo. So, you need a GitHub account to do that. And the third master rule. Never ever work directly on the master branch. Because if you are working on the master branch, you might break something. And after you Fog the repo, make sure that you clone the remote Fog repo to your local machine, because that's how it works. We can't possibly be contributing to the buyer, the GUI on the Git hardware page and then go in, edit. It's not a very good practice. So, since it's a Python project, there are some style guides that we have to follow for any Python open source project. So, one would be PAP 8, which is the style guide for Python code. So, if you code in Python, you must know PAP 8. If you code in Python and you don't know PAP 8, you don't know Python. PAP 8 is the grandfather of coding style guidelines for Python projects. So, different Python projects may have slightly different guidelines, but ultimately the objective is it goes back to PAP 8, which is consistency. Because you want consistency across the codes. And then you have so many people working on the code base on the docs and stuff. So, if I'm not consistent, then it's going to be a huge mess. Everybody's not going to be happy. So, you have to follow the style guide or you'll get DevStats. Or in this case, you're going to get Digital DevStats. Ya. Like that. So, we talked about PAP 8 on the style guide. But there's one other style guide which could easily overlook. PAP 257 is about the doc string convention. So, what is a doc string? So, doc string will be a string in the row in code. But in taste in Python, doc string is actually very important because it also acts as a documentation for your scripts and modules. So, let's say I use request or I use pandas or other Python library. And then, I'm just starting to use it. So, I may not be familiar with all the syntax what input to put in and stuff like that. So, let's say if I'm using VS Code, and then I go and shift my mouse, and then I go and see that function. And then, whatever I see, will be the doc string. So, the doc string is very, very, very, very, very important. Ya. So, for doc string, they also have the rules. So, it's not just I touch command long, long lines and then that's it. You can do that for your personal project but you can't do that for open source project where there's so many eyes looking at you. So, the standard is that you must have a one line summary. One line only. One line. Cannot be two lines, three lines, four lines. Must be one line. And sometimes you will have some description and examples. So, examples is actually the most important part of your doc string because that is how if I want to learn how to use request and then I don't know how to use it the examples will help me understand how the function works. But, a lot of things, if you don't just refer to the contributing guidelines each project will have a contributing guideline and you just click on the section that you need and then all the commands are there already. So, after all those rules I make my first full request. So, before I make my first full request I close the report already. I have to create an isolated development environment. So, I mean you have you need to practice that for JavaScript but for Python it's especially important because unfortunately Python is very infamous for being really bad at managing dependencies. So, you make sure to create it in an isolated environment. So, because you do not want dependency hell. Ya, you don't want dependency hell. It's not pleasant. So, why do you need an isolated development environment? Because you're not downloading pandas from the library, from PyPee, from Kondah or anything. You are actually building and installing pandas from source. So, it's already from the code base itself and since the pandas is not built entirely in Python it actually has C extensions in it. So, you need a C compiler and after you install the C compiler then finally you install optional dependencies. So, after all this when you when you do import pandas in the environment, you should not see 0.25.2 or 0.25.3 you should be seeing a death environment. So, if you see 0.25.2 or 0.25.3 or anything other than death environment that means you've done something wrong. And then secondly after I have the environment ready, I go into the environment and then I work on and then I do commit changes in the feature branch. So, what I mean by a local feature branch? So, because because in your Github you have the remote, you have a remote ripple and at first you have a master. So, you clone the and then you clone so you have a local master branch but before you do anything because maybe you clone already and then they might mix up updates so you don't really want to have much conflicts. So, you have to update your local master branch because actually every day they'll make changes to the master rapport. So, you have to keep updating and then after you have an updated local master branch then you go on a new feature branch to build your changes. So, don't touch the master code because master branch is for production ready code. So, if you mess with the master branch good luck. And as mentioned earlier we all love to do a lot of things to something but stick to one branch for one feature or but if not people don't know for this branch and then after you do all your changes and stuff you push the changes to your remote feature branch. So, before you do that make sure git commit. Sometimes I forget that and then you say, oh, we have nothing to commit but I've already made changes but just to commit. And then when you commit don't just say update A let's say I did A, I did B or what. I don't know what's A, B or C. So, please write meaningful commit messages because people will see your commit messages and then please clip style faces to a separate commit so that's a sub point because let's say I edit my tags like the dogs but then I have some formatting error then I do back vendors and then say you have one trailing white space you have one trailing white space then you cannot you just anyhow push everything into one commit so please clip style faces to a separate commit and then now I have something on a remote branch remote branch like the remote feature branch before I make the pull request from my remote feature branch to the upstream branch number one must always check for stylistic errors in a code so this is related to PAP 8 so for that vendors say let's use that pandas or gitv usually you will pick up something that goes wrong for a code so this one is in the contributing guide and I tested it and point number two for your dog string you must check post for dog strings so how do we check dog strings is okay you don't really have a tool that is very openly available but for pandas they have prepared a script for you to do that so in the rebel you just run a script to validate your dog strings to check whether you are following standard or whether your examples work or not sometimes it doesn't work so last but not least after you have done your PAP 8 or 257 you think that you may have covered everything but please review your code maybe you might have made some spelling mistakes and then if you make some embarrassing spelling mistakes or if accidentally propose a secret in your code and then you push it up then everybody knows your secret so please review your code and after I push so usually you will get feedback from the maintainers so you will usually get some feedback to say or I would like to make some changes so it doesn't mean that you push up and then that's it drop done it doesn't work that way so that's when the actual open source work begins because that's when you start communicating with the contributors so for example let's say you have they will leave some comments and then they will also suggest certain changes to make so in this case add some comments for example my dog string is in double line so you say single line please so okay note that so don't expect your initial pool request to be perfect but ya you may make some mistakes but it's okay because contributors are your mentors they are there to help you and feedback is a gift so you learn from the core contributors you learn more about the project and getting feedback is actually a good thing because it means that someone in the project actually cares about your work and cares about contributions so number two making you lots and lots of changes to your pool request because for open source project it's always quickly developing everyday you might make some changes so there might be a point whereby you might have a much conflict so to resolve the much conflict you will have to manually resolve them and if you don't somebody will complain and besides compared to conflict maybe one contributor one maintainer say i think this is a good idea but another contributor might have a better approach so it doesn't mean just because one contributor say it's okay you are good to go okay so how to update your pool request or actually not so soon to update your pool request now i'm going to be talking about this very horrible thing called nuking with good git rebase so what happen is that in mid october python 3.8 support was released so good thing, very good right because people want to try python 3.8 and then they say pandas can you please please support 3.8 faster ya but then good for those who want to play and try out but not so good for me because pool request will need to pass new test for python 3.8 and mistake number 1 forget to update from the master branch so what happen is that i try very desperately to fix my pool request try out some ways and end up with a lot of much conflict and then blah blah blah and then somehow i fall away and my peer finally pass the new test so very good right pass test already ya i can go to sleep already ya ya very good right but then the next morning i woke up to this and this email so what happen was that someone say i'm not sure what happen but you end up having like 200 plus and showing up in your gif and then this is what happens and all the colleagues like okay okay this is bad so i look at i look at the the branch like that and this is what happen so i have formatting changes explicit level all those things so my feature branch is actually this blue one so by right if i've done my merge property this will be in the blue one but then what actually happen was that look at the blue one it's got everything from the master comment so to put it bluntly i nuked my branch so what to do so can ya so turns out that i made a very common mistake because i did a git rebase upstream and then oof this is what happens so i kind of started posting a bit of a rant and stuff like i made this mistake what should i do so i posted on twitter because all the tech people are on twitter so so what actually happen is that i shouldn't be using git rebase so this is what Mark Garcia one of the i does say so but i should be doing a fetch first to fetch what is the difference between my load by branch and the project branch after i make the difference i merge the difference so that is what i'm supposed to be doing but then because i just particularly wanted to get the drop down i just rebase you can do that if you are in a local report but when you are working on an open source project so what actually what actually happen is that there were some liverted endpoints because while i am working on my feature the project is actually constantly updating so there are some so there are some liverted nodes here so if i do a proper git merge i should be joining nodes merging the changes in a new commit so in a way it look quite nice and then when people make further commits they just going to keep adding on very nicely but if you do git rebase this is what happens you end up adding the entire master branch to experiment node if if the difference is only about 1 or 2 it's not so bad but no there is 200 plus over commits so it's not good so imagine this whole thing 200 commits you go and add to your branch and then it gets worse let's say if somebody did not catch that then you keep updating your master branch and then you say hey your changes look good let's go and merge it remember C4 is 200 over commit then your branch is not going to be very nice and neat and then somebody else go and work on another feature and then it gets worse and worse so look at this branch do you think it's a good branch so imagine if someone had not caught this problem okay so what did I do but thankfully pandas contributors are quite nice people so I didn't get death stash for that so after all this hard then I actually went to look at what actually went wrong so from this book pro gate which I have read before but apparently I have not internalized the lesson is do not rebase commit is that you have pushed to a public repo never you can do it in local repo but please don't do that in a public repo because the whole world will see and they will start cursing you at you okay so this is disastrous so how do I recover rebase anyone anyone have any ideas delete the branch unfortunately you have to start a new branch and before you start a new branch make sure your master branch is updated so after that whole disaster is now making my first pull request version 2 so this time I know my lesson I'm not going to do a git rebase or anything so the same whole thing is happening again commit the changes in your new local feature branch the same old thing make sure that I work on one feature of one branch and then by push changes to my new remote feature branch still the same steps and then now I've taken account of feedback from the maintainers so should be okay right now not really because after a week a much conflict occurred because because while I'm working on my branch and then it's been a week but then a lot of things happen within a week and then and then there's a much conflict how do I know that because I receive an email that there's a much conflict so please pay attention to your work whether there's any much conflicts if not somebody will complain to you that you have a much conflict ya so much conflict what to do turns out I have to do it manually so it is hard sometimes I'm not trying to try all sorts of things it's better to ask the community for help then I don't need to go through all the 200 plus commit errors already and then after that after that I resolve the much conflict then more feedback because different core contributors will have different opinions so make more changes I have to update my full request because I have to get feedback so step 0 now I know after I get feedback I should update my local feature branch with the changes in the uptreem master so this one is step 0 before I do that I cannot do anything else and then step 1 make changes to a local feature branch review a code I review properly I do my black pandas I check my PAP 8 PAP 257 and then I push to my remote feature branch and then changes the proof so which means that it is added to a milestone but that doesn't mean that the work ends here because we will have to look through to see whether it is ready for release but the good thing is that most likely whatever you are going to contribute will go into one of the future release so some key takeaways contributing a dog is actually a very good way to learn more about the project you get free feedback and mentorship from the core contributors so because I am currently using multi-level quite a bit and when I am contributing to the part of the documentation so I actually understand what that particular function does better and then and then thirdly contributing guidelines is effectively code adequate please please please look through it even though it is very long but it can save you some trouble and last but not least it is also a very good way for you to learn it by doing contributing to the dogs you may not be contributing a code date or anything but ultimately you will still need to use Git so if you are getting started with contributing to a project like a big project like Pandas you can contribute to the dogs and you can learn Git at the same time and so if you like my talk of if you want to talk a bit more you can wish out to me I will link in twitter maybe github and I have my own webpage so thank you