 This is basically going to be a list of resources and checks that you should think when you're conducting a data science project. So we're gonna talk about how to define requirements when you're starting to do your data science project, how to how to outline or scoop your data science project and some of the considerations you want to have to ensure you have your project can be reproducible and ready to ready to be reproduced. Sorry, I don't know what I did. What is it? Yeah, and some of the useful Python tools and tips on how to better present your findings and of your data science project. So, yeah, let me go over here. I don't know. Yeah, so a little bit about me. My name is Sara Iris Garcia. I'm originally from Guatemala. My background is in computer science, but I currently work as a data analyst in the UK. I have worked as data scientist before and I can tell you I have made an incredible amount of mistakes along the way and even today. So I can honestly say that I could be seriously the least qualified person to be here and to give you any sort of advice on how to do data science. But given my past and present experience in data science, which by the way is very limited, I can tell you what not to do and which is basically a list of every single mistake that I've made. So you don't have to. I intend to give you like a walkthrough to a list of documents and checkings for your data science project. But not really how to describe any of them given like the time limitation here, but feel free to do a further research and find the documents and the best what the documents and steps that better works for you for your specific project that every single data science project is different. So I'm just giving you like some of the things that you should really consider even no matter what that data science project you're conducting like the minimum things and Yes, so Where is it? So basically we're going to talk about the things you should keeping in mind before you're starting a data science project during the execution of your data science project and after you finish it. So in the first step which is very very important is how to define the requirements. So I can tell you really this is the most important step of your data science project defining how to define requirements and this is paramount. This is paramount and it will by all means it will determine the success or the failure of your data science project. So you want to invest time on this and make sure you clearly understand the What are you doing and why are you doing it? The project and what are the goals and the objectives of the project and how the deliverables you're going to provide when you finish your project How are they going to impact your organization and how you're going to measure that impact? So all like you want to you want to define the metrics you're going to use and as well all if you foresee any any sort of challenges along the way for example, do you have the right data already? Do you need more data to to conduct the project? So this is everything goes in this step and so so for this like I said there are a lot of resources out there when I was a start in this my main data science journey You will you you may think okay, this is this is so obvious you you want to have a clear understanding of The project what are you why are you doing this? But it's not so obvious and later it was it was not so obvious for me so at the beginning I was not putting much of attention of on this and never used like a documentation in this step and I can't tell you this is very important and if you do this you're really going to Save yourself lots of headaches in the future so for I offer you here. These are not mine. I Borrowed them from from the internet, but I found these two and specifically these two documents really good The first one here in the left is a project poster template is from at Russian and What I like about this is that? You can see here is very is a very simple template But that's exactly what I what I like about this so you can see here What in few sentences you describe what is the problem then basically why are you doing it and How are you going to measure or validate your project? It's a very simple template, but the more simpler it is I believe that the more And the more understandable it is and also In in the right There is a design doc template as well This is all the resources. I will give you all the resources in the github in my github repose So you can see you can have a look at of this more in more in detail But you can see here in this template what I like about this document is that you describe all the objectives and Of of the overall project and then I like what I like about here is that you describe as well The deliverable the river walls you're gonna provide so this is very very useful for in terms of communicating in a head with with the With the Higher managers or the ones that are in charge of of Of the project so yeah, this is I'm offering you these two documents to templates but feel free as I said they feel free of even Doing your own what what I What it really matters is that you document this the document the requirements because trust me the requirements May change in the future. Maybe you're already like 40 percent 60 percent already in the project and then your boss Comes and tells you hey I also want this thing or instead of this Let's switch to do this. So Make sure your everything is clear and So You don't have to You don't have to leave this kind of experiences and better if you Sign this up send an email. So everything is like written down So the next step If what is Let's see. Oh, yeah, so yeah, there's a And like I said, there are a bunch of resources out there But I don't want to because this is intended for beginners when you're really starting your data science Journey so I don't want to I'm even myself. I am starting so the This is the minimum amount of resources that I think is going to be of help. I hope so You you will find all these links in the documentation and the next step Is how to outline or scoop their project? So once basically once you have understood what are you going to do and why are you doing it? then you want to make sure of Outline a plan on How are you going to do it? So this This doesn't have to be very specific It's just like the idea of this step is to formulate a plan or Roadmap to help you stay on track and Because along of it in the execution of your project things may change so for example You might find that Maybe there's another methodology or algorithm more suitable for the project You're trying to solve or maybe the model that you first plan to to use You can really implement it for whatever reasons so it's The thing that what is important here is that you define All the steps you're gonna do not really like in the technical In not not much Technical yet, but it is like you can think about it like Construction plan things can Change along the way, but if you have like If you have already all the plans that you're gonna You're gonna follow then it's more easy for you to stay on track So for example in this step you want to describe the derivatives you're gonna Provide at the end of your your project Then you want to define the milestone for example the EDA the implement the baseline model and so on and so forth then describe the timeline for each of the Milestones so you keep on track And then you describe the data as well. Do you have the data already? What is the data you're gonna? use How are you going to source it if you don't have it and All of these and then the resources and tools that you already have and the ones that you will need to complete the project and Also a little bit of the not too much technical is not necessary to be technical, but also it You also want to describe the implementation That you plan to To use and for these also, I I have here as well. I borrowed this These two templates what I like about these These documents is as well the simplicity of this So for example in the first one you can see here that This is not really Necessarily for every single data science project, but I find it very very useful for the data science for the projects that are Like medium to big if you have a If the data is if your project is a small, I don't think you will really need to use this but I like it because This is a stakeholders analysis Documentation so what I like about this is that you can identify in this Document you identify what kind of stakeholders you have for example Are you going to Does your project going to fulfill? I don't know one of the goals for the financial department, so you're gonna be You're gonna have a very close communication with someone from finance So their language is not technical So you're gonna you want to identify all of these things why because you want to Speak their own language and you want to make sure of understanding. Okay How this person or this team Will be involved in the in the project? How are they going to? Help me or what resources they can provide all of these you can And and very simply very simply you can put it here and this will save save you Time as well in the future When you're presenting as well And also to the right you also have a A documentation as well as a template for Describing the analysis you're gonna your you intend to You intend to conduct during the execution of your project This is also useful in case you're not the only one Working in the data science project. So if you have a team of colleagues This this is gonna make sure that you all keep on track and like the discussions of What sort of analysis you? You want to conduct during the during the project are everything organized? so as well all the documentation is in in the GitHub repo and also a couple of Documents that you might find helpful for this step on the project the other thing that is very very Paramount important very important during how to do your project at your data science project is ensure their reproducibility of your results you know Well, there is a nature survey in 2016 and it says that around 60% of the biological studies Can't be reproducible by their own people that The ones who made the who presented the results they cannot even They cannot even present their own experiments. So this is very important if you cannot reproduce your own experiments Why what important this is is your project is meaningless? It's trash so you want to make sure to Pay much of attention and have to reproduce your your results Just think about this scenario where you're working on a project then Your boss comes and tells you Let's there is a change of of Planning's so I don't want you to focus on this instead of working on this put a pause and Then focus on this other project. So you go ahead you pause this and Then you work on another project then one year The your boss is coming comes and tells you you remember that project. Yes I want this for I don't know three months from that and Like if you have already say you already work in this break 50 percent along the way and now one year In the future you you want to make sure it off. Okay one year. I work on this one year ago Do you is this is it documented? Can you reproduce the results you did? You produce like you this is going to save you lots of future headaches so Yeah, in best time on this and really for me is very tedious to Spend time in documenting documentation all that but it's really Because I don't personally like it. I'm not fun of working on this I just found that there are some tools that they can save me time on doing this and so for that there are a couple of checks that you want to You want to work with and the the first and foremost Don't do things manually Do not ever do things manually for example, is there Do you already see in your I don't know maybe you're working with an excel file and you can see clearly there is a Data entry mistake or outliers. So The easiest thing will be you Manually edit the excel file. Don't don't do it because that is not reproducible You want to make sure everything is out to mat Automatic and for that I don't know if you were here yesterday in the There there was a workshop with Great expectations Bandera and Python think there are very good tools that can help you to Ensured or a your The date the data validation Process so this basically what they do is they You can ensure their That you go to us you define a checks a list of checkings for the data and Yeah, this this can also help you to identify further Errors and also You want to this is obvious Obviously, I suppose you want to make sure of using a Version cultural tools such that there's a lot say the most I believe we all know about the github github and also Software environments Docker is your friend Condembaroments virtual M You name it. There are lots of tools to keep track of the software environment. You're currently using So when things change like like your hardware change, then you can still Reproduce your results if you know the versions of the the packages you you were using and also there are some Test automation tools as well I give you a couple but there are tons of resources in internet and also very important They said the seed number as well Yep said the seed number so you can reproduce your results and some of useful links and then some of the Like the best practice for writing documentation writing the documentation is Very very important. It's very boring for me to do it. But luckily there are some And some tools that can that can help you to automate this One of the tips I can give you use linters agree on style call Use dogs dog strings to document your functions and Document your API as well when you are exposing your model to an API you can use swagger Sphinx or but better Use fast the API it will help you save lots of time and some of the useful Python tools as well Don't have time to discuss any of these but here are a list a cookie cutter is basically a template for Building a structure of a data science project. You can also use ML flow very very good to track the experiments and Dbc is the data validation Data control so it's like a github for both for data fast API as I said very very fast for implementation and extremely dash for example if you have to make a very quick dashboard to present your results are as well and In terms of presenting your findings is Paramount know your audience you want to make sure of speaking your their own language It's good as well to present an executive summary. There's a template. I give you there use bullet points and For example, there are a couple of of Tips down there provide regular updates on the current status of your project. There is a There is a guy. I don't remember his name, but I Like his tip what he does is he spends 15 minutes of his time to write Paragraph that His boss can read it in less than five minutes So in that paragraph he gives updates of everything he's done in like a week So it's very good because you know higher managers then they don't read a lot. So if you can Use bullet point on just briefly summarize the results and the findings and how they are going to how they are translating into the business case then you You you will find yourself that you are You are delivering The the better you are presenting the benefits of your project and in conclusion a structure from the very beginning be Be a very organized from the beginning document every stage of your project and Briefly reports your finding and progress don't Put much of attention in the technical things when you're explaining things is better to use Visual use Workflows or charts of workflows or whatever visuals. They are your friends and I didn't put it here I forgot it. But if you Find yourself a good subject matter expert that can help you go along the way and call call review is As well like you can have another person to review your code. That's Best you can do and so yeah, thank you. Thank you. And here is the The github repo for for all the resources I share today. Thank you