 Okay, I think we can get started firstly. Thank you everyone for joining in for my talk today a Little bit about myself before we get started. My name is Ria. I currently work as a software engineer at Microsoft Prior to Microsoft. I have been working with Flipkart which is sort of like an e-commerce platform like Amazon and I've also worked at American Express, which is a fintech company Apart from that. I love teaching. I absolutely love teaching students about software engineering and open-source technologies to tech communities and I have mentored and taught over 10,000 students in the past. I also have a small platform It goes by the name algocam.io. You can check it out as well we actually inform and train students on relevant software engineering technologies if they're interested in certain tech stack but they do not have the right guidance any kind of mentoring that they require we do it on this particular platform and Fun fact, this is actually my third time at the open-source summit. I attended. I think the first conference in 2019 I was a scholarship recipient and I was like invited to the conference. It was an all-expense Paytrip I came here for the first time and that's how I knew about this community and like, you know What people are doing at this conference and like I did network with a big bunch of people at that time And then we had COVID so I did make sure that the next time the conference happens You know offline and I get to interact with people. I would definitely apply and come here again So I did apply in 2021 again, but the conference I think in 2020 but the conference didn't happen again And I gave a virtual talk and today finally I have the opportunity to give the talk in front of all of you today So yeah, that's about it. Now to get started. We'll get a little bit context We'll talk about the two types of people who are actually in IT those who do regular backups, right? And those who actually wish they did so this talk will actually might be more relevant for people of the second kind, right? You know those who wish they did and now they're just looking for some platforms and you know some ways to actually do that I was talking to a bunch of people at the conference and you know When I did tell them this is what I'm saying and that the first question was you know What do you think I should really do because we did come across a issue a few days back And you know we had multiple issues at that time So I was like, you know, this is the talk that will probably help you out and at least give you a fundamental idea You know why backup is necessary and you know why recovery is necessary? Why should you really be doing it, right? Do we really have some good open source tools available for that? You know and what could we really all do about it? So yeah Now whenever you think about Data right think of data is like your favorite pair of socks, right? You never really appreciate any of them until one of them goes missing, right? Just like socks has a vanishing way of you know an amazing way of vanishing in the laundry data can also mysteriously disappear When you least expected that's where your trusty backups actually step in you know ready to reunite you with your precious files Photos spreadsheet any of your company's backed up data that you've been you know pretending to work on Okay, so today what we'll do is we'll take a quick deep dive very quick deep dive On different types of backups that are existing right different types of backup tools that are also existing I will specifically talked about Bakula, which I have worked on a lot and I will give a small overview of Amanda as well both are Open source archiving and backup software is available I'll give a small demo through screenshots, which I have also done like a backup using Bakula using the Bakulum APIs The reason why I chose Bakula over here is because Bakula has a very good GUI Which you can actually use Amanda doesn't really give any GUI tool whatsoever So I like more to do with work and I like to work more with dashboards and you know less with console based interfaces So that's why I chose this demo We'll also talk about a few best practices that you should follow while doing backups and a few common pitfalls that people You know usually come across I came across in most of them and which you could avoid as well So first let's talk about why this backup and recovery matters, right? Why does this concept really make sense? Both concepts are very crucial to any organizations IT strategy for several reasons right first data loss statistics Actually reveal that businesses regularly face You know the risk of data loss due to various factors It could be a hardware failure human failure or it could be a cyber attack as well Secondly the cost of downtime that is associated with data loss is also high right even when I'm working on different problems Right we come across a data loss. That's not the only problem The problem is you have to take a downtime You have to get your servers up and running and that affects the user experience a lot So the data loss could be staggering in terms of it could be impacting your revenue your productivity on a Individual level and also you know the work you're doing So finally and also Linux systems though. They are known for their robustness, right? They are not immune to vulnerabilities making regular backups and you know effective recovery plans, which are very essential to safeguard your critical data and actually minimize any potential Disruptions and operations that you are doing right so Know that whenever we are doing backups, right? It's always a cost to risk Relation that you know companies take in mind while doing backups, right? The decision to implement backups is fundamentally a relation which is sort of based on that on one hand The cost of establishing and maintaining backup systems may seem like an expenditure that you may not be ready to do right now However, this cost actually pales out a lot in comparisons to the potential risk of not having backups at all Right, you could be using you could be looking at data loss System failures and multiple types of downtime and that's the investment in backups is actually a good proactive mirror Measure that you actually use to significantly reduce the potential cost and the disruptions that could also come along with it Now when we talk about backups you could choose between three different backup options available We have a full backup like the name suggests a full backup will copy all the data into a system or a drive You know it will create an exact replica of the entire data set which is a good way to go about it But it can actually be very time-consuming and it can you know require significant Storage space in your systems and your off-site resources as well Then we have incremental backups available incremental backups are sort of like a versioning system that you do right Incremental backups only save the changes that you made since the last backup that therefore it kind of reduces your storage spaces a lot And the backup time but know that when you're restoring that backup It has to also go through those multiple versions in order to complete that restoration process So it could be a little time consuming in that sense The third one is a differential backup now a differential backup actually stores all the changes made since the last full backup So this is sort of like a hybrid backup that you do making it actually quicker to restore than incremental backups But it consumes more space. So this is sort of what lying, you know in the middle of full and incremental backups Now while doing backups Make sure that you follow this three to one backup strategy It's a very robust approach very very frequently and popularly used as well when I say three to one three stands for three copies That means you ensure that you have three copies of data the original one and then you have two backups Right this redundancy actually guards the data against any loss due to hardware failure or corruption When I say to two means two different media options, you know You store these copies on at least two different types of storage media you could use like a hard drive and you know, then you could use another cloud storage and Diverse if you're using diverse media options It actually mitigate the risk in terms of media specific issues that you can come across and then it's you keep one Offside copy. Okay. Keep one of the cop keep one of the backup copies off-site idly maybe in a remote location or a cloud service You can also choose over your this actually protects data against any fires or theft or you know Anything that could affect your on-site copies, right? We ensure that there is Phenomenas available to you know lead to this catastrophic scenarios and we have ensuring data recovery mechanisms in that sense So, yeah, this three to one backup strategy is actually very good I come across a lot of companies that actually use this backup strategy as well and it is a very compliant and it actually works like a charm Now I'll talk about Bakula, which is which I Mostly focus Lee work on and I really like to give a small introduction Bakula is actually a fully featured open-source backup tool It it performs. It is functional for multiple platform clients not just for Linux systems So although we do talk about Linux systems over here. It works very well for Mac OS as well boon to send to us any types of operating systems. It's highly portable It uses 100% free open-source software systems to implement a complete backup system and it is based on Bakula community versions This is just a small Architecture of what Bakula has it has basically four components It has a director very simple bad director catalog storage demon and your client serving the different functions So director as you see in the very middle It's basically like an architectural component that performs and does all the backup and the restoration jobs The director process basically runs on Linux distributions You could use red hat dbn or a boon to catalog as the like the name suggests is like a component Which is in a specific database. It is only accessible through sequel updates and sequel queries The catalog basically stores different metadata For all type of backup activities the information could be stored include the information that you store over here Could be file names. It could be permission dates. It could be storage coordinates for all the information You know that you're storing and some global records if you have as well Then we have a storage demon the storage demon is basically the one that is actually backing up your data And that is actually writing your data to different storage medias and then you have the clients Right, you could see different clients over here, which could be latching on to these functionalities in order to do the backup So what we what you see over here is actually a GUI which I downloaded and this is the dashboard that you see feel free to use the console based web interface Also, it's actually very easy to use that one also, right? You actually get you do actually get a bunch of extra operations that you can do when you're actually using the web interface but I feel like the GUI also looks very good and Amanda does not provide any GUI. So that's why that's another reason why I personally prefer Bakula over here Just to give you like a small walkthrough I have just taken a bunch of good screenshots Which gives you a walkthrough of the entire process and how quick and seamless and user-friendly the entire process is So to define a new backup job you go to the backup you go to the job page So if you will see over here on the left-hand side, there's a jobs window available You go to the you go to that option you define a new backup job, right? And then there's this wizard that comes in for creating backup, right? And for this demonstration, I chose the backup job option to show you which is this new wizard over here And then you type the new job name. You may be writing an optional description also, right? And then you decide what you need to backup, okay? For example, I need to basically backup a bunch of files over here, right? So I will define the paths. I decided to choose the client file system over here So I select the path in the drag and drop browser as you can see over here And then I could just see what all files do. I need to back up Once that file set is ready. I've chosen up what I need to backup The next set the next step is to select where you need to store or save the backup data Now this could be a cloud service over here. This could be a hard drive also This could be any storage services that you like then you select a storage location and you select the volume pool And in the next step you have some job specific options Like would you want to do a full incremental or a differential backup? Bakula gives you the functionality of doing all the three different types of backups Amanda only does full and incremental backup So if you would want to do all three of them, you would have to choose this And you could also define the priority of the job as well And then on the next wizard, you can see how we are moving from one wizard to another wizard On the top, there's a bar moving On the next page, you specify when to run this backup job Backups are usually done periodically, right? Here you can choose a schedule You know, when do you really want to run this job? I'll also talk about how frequently you should do this backups also But if you do not have a schedule, right, you can choose manually also If you just want to do it the way you like If you want to do it and whenever you want to do it, that's how you can choose as well And then the last wizard is basically a summary of all the values that are selected in the previous steps Right and you can review all the values over here And if they all look correct, you just go and create the new job And now at this point, your backup is ready, right? Whatever data you want it to be backed up There is a job that has been created for that Now at this point post this, what you also need to do is you need to run that backup, right? So firstly, either like I said, you can either choose to run that job manually If you do not have a fit schedule, when do you really run those backup jobs, right? And if you want to do it periodically, right? There is actually a very useful capability in this window that you see right now Which is the estimate window, estimate job window, right? You can actually run this estimation to actually know in advance How much time is it going to take And how many files and how many bytes of data will be backed up, right? And how much storage it will actually take in your effective end systems Which is going to store the data in And then after running the job, you actually move to a new view page, right? Where you can select the backup process from the client You can see the backup process from the client perspective By the way, this works absolutely the same whenever you're using mobile also Even from mobile, you can actually run the backups, which again works like a charm And then here you can see the backup, you can see the job progress also So there is a director who's running, who's taking care of running these jobs And the storage demon who's actually copying the data, right? So if you will see these bars over here, you can see that You can see how much speed it is taking, right? And like how much time it is going to take And what's the CPU usage on your systems right now And that's it, your backup jobs completed So simple wizard steps that you need to follow In order to complete your backup And it's very, very simple over here So this is how you can actually do a backup over here Once you're done doing the backup The next step is obviously you need to know How do you restore the backed up data, right? These are the two most important functions we use And we do using Bakula APIs, right? Okay, the first is the backing up part Then we have the restoration part People do copy jobs part also Where you want to just copy backup from one place Or one storage pool to another storage pool That is also a functionality available Here I will show you how do you restore the backed up data Okay, so Bakulam again over here provides a restore wizard also In the again the sidebar menu that you will see in the back And then after opening it You again see a backup client selection Which backup client do you really need to restore the data from You can select the client and then you can just go to the next step You can see all the backups over here Which have been done previously There is also an option to backup a specific file By a file name with or without a path And then you select the backup And then you go to the file section On the third restore wizard step Here in the file browser You choose the directories and the files to restore Okay, know that there is also an option That if there is a file which has four different versions You could actually backup a very specific version also Of the file that you really want over here That exists in the backups And the next wizard basically defines the destination Where we will restore or we will store our restore data By default the location in which you did the backup Is the location from which the backup actually originated Right is the location you actually restore the data to But you can actually change and pick up a different host As well in the original over here And then the next step just shows you It has a bunch of more I would say options I skipped the fourth I would say a wizard step over here It has just simple some restoration policies and options If you really want to add it Right you can do that I usually keep it untouched And I go to the summary section Just before to see what I'm trying to backup And if everything works fine I just go ahead and do the restoration part This is the view that you get when you're trying to restore Just like the backup job You can see that the running restore job is in progress After completion There is a summary of the entire progress And what all has been backed up And where all it has been stored Okay And yeah you've performed this way Very simply in a bunch of few minutes The two most important Bakula functions Most people actually use First it's very the interface is very simple It actually enables the user to administer Bakula from any mobile device Okay So this can be very crucial for cases When you're working outside the office And somebody from the organization sends a text message Like you know Hey I actually deleted you know an important report file And I need it urgently Are you able to restore it to my computer And you could do the restoration process Using a mobile phone and the same wizard steps That I just described The second important function That we get with Bakula is It's multi user interface right Multiple users could have multiple several Authentication methods You could use local user basic authentication You could also use LDAP also Over your for organizations And it enables companies employees To actually use Bakulum to back up And restore their own resources Right without actually requiring access To other utilities over here And you can customize the role Based access also And you know control interfaces For each group of users Of course these options are just the tip Of the iceberg regarding the Bakula capabilities Bakula is really about being configurable Okay If you consider all the cloud cloud Cloud based backup options available out there And a lot of other backup options available Although they are quite flexible Where I feel like Bakula is quite configurable In terms of what do you really want to back up From where do you want to back up What do you really want to back up So it's quite configurable And you can use the way You like to its own purpose over here So this kind of helps you Give a very good view of your data also How it's being back up And it definitely makes your life a lot easier In this case Next we have Amanda Amanda basically So the word stands for advanced Maryland automatic network disk archiver It's again an open source archiving software And it follows sort of like a client server model The fundamental way of the workings Of both are a little different Right A schedule It's sort of like a schedule based backup That starts whenever a server sort of contacts Each client It follows a tape scanning Spanning I would say methodology Where it divides your backup storage Into multiple tapes And it tries to store data into that If at any point of time it doesn't fit Right It will actually split it into multiple different tapes One of the good things about Amanda is that It's actually a very good and intelligent scheduler Right It'll always try to optimize the use Of your computer resources Across different backup runs So it has different functionalities available Where in it tries to Make sure that your backing process is always optimized And you don't have to focus a lot On the areas which you don't want to focus on Although the downside of using Amanda Is that it doesn't have a GUI over here So you only have console based options available In order to use it Amanda doesn't work with macOS systems And you know those Storage devices as well So that's another thing that you need to keep in mind While making a choice If you want to work with Bakula and Amanda But apart from that Both are very popular I would say backing up software available The customer support and the enterprise solutions For both of them are really good And you don't face a lot of issues while working with them Now to talk about a few important points While you're working with I would say Any type of backing up software Or backup solutions that you're going with Just make sure you follow these few practices Which works very well First is you do regular data backups Okay You ensure that there is a regular backup schedule Based on the criticality of your data And how frequently it changes For critical data You know people do daily Or even more frequent backups than that as well Right With less critical data You know maybe backed up less frequently in that case And you can consider implementing versioning also For important files and important documents as well This sort of allows you to access the previous versions of a file In case of accidental changes Or any kind of corruption that happens Second is you do choose a secure backup storage Which I was talking about in the 321 backup functionality Storing backups off-site is very very crucial And a very good option to go to To protect against any physical disasters Like fires, floods or any kind of theft That could happen on-site Use cloud storage It's a very good option Or maybe you know Physically storage backups in a secure off-site location And make sure you implement strict access controls And authentication mechanisms for your backup storage Only authorized personnel should also Should ideally have the access to your backups Bakula has a very good functionality Where it can actually help you to define several roles A lot more roles than usual roles And then you can just define Very specific access a person has as And not just the access for the data But also the certain files of backup that they can do That's how the level of configurality you can do When you're using Bakula When third point is you do regular backup testing So as important as it is to backup your data It's very very important to test your backups as well Okay periodically test the restoration process From your backups Ensure that you can actually successfully Recover from those backups And that the restored data is actually usable Many a times it happens that you know People keep on doing the backups But the moment they try to restore the backup It usually does not work Or it has some sort of corrupted Data or something happened That the backup didn't really work well So make sure you verify the integrity of the backup data To ensure that it has not become corrupted over time Some backup solutions already has this Inbuilt validation feature available Where they do sort of test your backups Very frequently to see if everything works fine or not Then we have monitoring and alerts That's how I like to work on it personally You can have different monitoring tools also That actually you know alert you In case of any backup failures or any anomalies Early detection that can actually happen That can actually help you to address issues Before they become more critical I Bakula also do have some plugin functionalities available Where you can set up monitoring And any type of logings that you would really want In order to make sure it works fine And then make sure you have a disaster recovery plan Like you do you know backups And like you do disaster recovery plans everywhere Right you make sure that you develop a comprehensive Disaster recovery plan that you know outlines How you respond to different types of data loss scenarios Right and you include details also about you know Who is responsible what are the communication procedures And everything along with it And yeah to sort of talk about a few common pitfalls Also over here that you avoid The first is how you rely solely on one backup method Okay do not do that Do not fall into a single point of failure scenario Depending on only on one backup method Right it can actually lead to a single point of failure If that method fails Or if anything becomes compromised You risk losing all your data Okay apart from that implement multiple backup method techniques Right such as local backups Offsite backups Cloud backups Have all three of them Just so in case you are recovered from all of them This sort of does include a little bit redundancy And increases the chance of data recovery In various scenarios In adequate testing Which I think just focused very heavily upon right now That failing to actually regularly test your data Can result into a very unpleasant surprise During loss of any event Ensure that you can successfully recover data From your backups Right and then you ignore Sometimes we ignore data retention policies Know that how much data do you really need to backup If it's if you're keeping too much data For extended period of time It can lead to a lot of storage inefficiency And a lot of increased cost What you can do a good thing is You could have certain cron jobs running Which you could see how recently Did you access a particular backup file or a data And if there's a particular time that has passed And that's a particular benchmark for you That you didn't access the data Maybe that's a good time to let go of that backup files as well If you don't if you're not really going to use them anymore The idea being to define clear data retention policies About how do you really need to backup And for how long do you really need to backup For example at Microsoft We do work a lot on the customer data For example we are dealing with Partners who are selling Microsoft products And who have a lot of purchase orders in voices Right so we have a big reporting platform available For us we actually touch data Which goes back 10 years, 20 years, 30 years as well Because that's how the agreement works With the different tools and the Microsoft tools available So for us this option does not really make sense a lot Because we are not able to You know let go of any kind of data that we might not feed it Because everything is connected in terms of user experience But yeah there are cases where you know Our software is just a video streaming platform Right where people come And it's sort of like a networking tool Where you don't really have a use case Where you want to store the data for a very long period of time In that case it's a good option to avoid And have good data retention policies And the last one is understanding the bandwidth and the storage need This kind of goes back to the cost to you know Risk relationship which I talked about And don't underestimate the bandwidth And the storage requirements that you have for backup Especially in cloud environments Which could be really really expensive That could lead to a lot of performance issues also And incomplete backups Regularly assess your backups And know how much resources and how much allocations Do you really need to do in order For based on your criticality of your data And yeah this kind of sort of helps you To understand all these points Are something that I came across very frequently And I worked on and I felt that you know These are the points which become very very crucial When you're working with data Right and especially how do you try to understand that data Because data as in its sense could not just go through In terms of if you're accessing it from database For example I work on a data platform team Where you know the data is not just present In our source databases It is also going through multiple pipelines We are running through ETL pipelines right We are harnessing and massaging a lot of data And then presenting it to the end users right So not just at the source databases But at a lot of middle stages also We want to make sure that our data is always backed up And if any stage sort of leads to a loss of that data We want to recover it from that particular stage itself So yeah that's how I came across And I realized the importance of data backups Recoveries and how do you know We really work on this And what view should we really have on that So I hope this gives you a good idea On about a very good open source tool available Bakula in Amanda I do not see a lot of people very frequently You know kind of coming across data backups tools And you know they're only thinking about it When they know then there's some issue That we come across in But it's also important to know that we sort of Introduce this method of very early into our stages To avoid any issues And we think about backups and recoveries Right when we are starting planning About our applications and our work And all of our important data that we work in And yeah that's it That's how I wanted to talk about And give you all an overview about Backing up and restoring your data Related to Linux systems Thank you