 Good afternoon I'm crazy about backups for some reason I back up to disk and I copy it to tape and then I Copy the tapes and I put the tape somewhere else and like have them in a bank vault and I have them in a friend's basement So hopefully if there's ever a disaster it I can restore So we're not going to cover insulation concurrent jobs. So lots of other things This is basically an introduction to how I use bacula and how you might get clever ideas about how you can use it to So basically bacula is a set of programs Their independent modules you can install them on different Devices It is client server. So basically you've got a server that talks to a Client which talks to a storage device and it just keeps going on from there People sometimes get shocked that bacula does not use tar don't be upset that it doesn't use tar tar is good But it's not good for everything This little sorry this little bit here Sorry, there we go. There we are So copy all your comp files and your SQL files that you dump out somewhere else Copy at several different places because if you lose your Catalog which we'll get to later. It's horrible and you have to use be extract This is an old deck DLT 7000 tape drive that I use That machine died shortly thereafter it's a DLT 8000 I found the 7000 is more reliable in the 8000s This stores about 80 gig compressed I back up to local disc everything streams down to the local ZFS server Then gets copied to tape afterwards I used to copy to tape every day now. I only copy to tape once a month Mainly because it's summertime Terminology the director is the thing that knows everything. It's like your global process It does everything that nobody else wants to do but it does everything SD is your storage Damon Dear and SD are often referred to as a server together those This is brand new Sorry So you want your dear and your SD always to be in the same version never get them out of sync your FD can Be behind but never ahead So here are the steps in running a quick backup so you get on the console It contacts the bacula director and you say hey listen. I want to run a job it Contacts bacula FD and it's gets told that you're supposed to back up these files to the storage Damon It contacts a storage Damon with the files that it's gonna back up The storage Damon puts those files on either the disc of the tape and then the storage Damon sends Tobacco a dirt a list of things that got backed up that gets stored in the catalog because you When you go time when it's time to restore you use the catalog to find out where the stuff is This is a typical setup You have one director and a bunch of FD's that it talks to and the key thing here is a director always does the Initiation it contacts the FD and tells it is time to back up So the schedule isn't dictated by the clients is dictated centrally There is an option for an FD to do a self started backup, but I never use it This is your usual starting point when you're getting started you have a directory You have an SD you have disk and or tape and then you have a catalog server Catalog server is the most important part of your backup. We'll get to that later Now we're getting a little more advanced here where you have one directory you have multiple FD's all Contacting it, but you also have another bacula director that contacts them as well These can be identically configured so that this FD thinks that that one and that one are identical and they can all connect in It's just the shared secrets that count when you're connecting through we'll get into that soon, too So here's here you're going really crazy. You've got one catalog a director another storage Damon over here A whole bunch of stuff two different catalogs This is all very feasible you can have multiple databases that it writes into you can have one per client For example one per pool we'll get into pools later. So this is key to keep track of it's not based on cron You can run a backup manually and many configuration options can be set when you are running manually But when you run automatic it just uses the defaults and we'll get into the into defaults for stores cannot be scheduled But they can be automated and by automated I say I mean Echo run restore equals blah blah blah pipe be console so you can automate it that way Hot tip There that that's running a job You can run that on the client and it pipes it to be console and then it backs up that client and That yes is just an auto prompt saying no I'm not going to prompt you so that's just an example So then it'll connect to the director. It'll do that. It'll there's a job being run. It uses that catalog and it's cute That's it the jobs run Might take a few hours, but you started up These are these are a whole bunch of tools that come with back. Is it sleeping? No, it's not sleeping. This is all in here The screens on here, okay? Because this screen keeps flashing as if they're it's as if the external screen is going away I can keep talking you don't need to see all the slides So be console is your main interface between your fingers and the rest of the system It seems a bit primitive to have a program that you run in order to access it So in that regard, it's not a true command line tool, but you can run be console from the command line So you can do it from the command line Actually be console is really great because they actually when they're doing regression testing be console is what all the commands are fed through So if anything be console is one of the best tested tools in the suite so It's not important to know the commands There is a utility that comes with it called be tape if you're going to use a tape drive run it through be tape it Make sure that all the configuration settings that you have are appropriate It's almost like the external monitor is going away So Sometimes in the status you'll see old jobs and stuff like that the same thing with temp logs I've seen so many people waste time trying to get rid of these just move on It's important to note that it's not all run as root the director runs as baccala baccala storage runs as baccala baccala Often it's in the operator user group so it can access the tape drives baccala fd has to run as root wheel in order to read everything and also to do be able to restore everything But you can configure baccala fd so it is read only and it cannot do writes Yeah, there's the operator group This is a bit that confuses people all the time Passwords are shared. That's how you authenticate one daemon to another So you have the same storage secret in two different locations. You have it in the director Directors configuration fall and you have it in the storage daemons configuration fall So that's how they how they communicate they are not encrypted passwords You just secure the file in which these passwords are located and people get all uppity about that, but no No, all communication is done over an encrypted channel It's a shared secret so Remember this bit and you'll have a much better life when it comes to setting it up A lot of the passwords look like they're encrypted because people use random text But literally you can just say this is my password colon and that's your password and you can use that So you have a back be console.com you say I want to correct connect this director and that's just a name It's not an address or anything. That's just the identifier used by the baccala direct Director at that location and that's the password that you use so that same password will occur in the baccala Directors configuration. There it is there. So it listens on a different port. They're all 91 of 19102 91 of 3 and What this is this defines who you are, but this password is for people connecting to you not for you connecting somewhere else So whenever you have a step a clause that says rep is self-referential That's the password for contacting you not for contacting someone else So Everyone struggles with this Everyone struggles with that's the first time through even I get stuck with it, but it's a very good FAQ to go with it When it says the name and password do not match check and make sure the names and passwords match It's very So many people get stuck on this get right down to the TCP level make sure that you're contacting the right host Do a TCP dump check your DNS try accessing Replace the fully qualified domain name with an IP address somewhere something's wrong, and it's not your name and password Well, it could be okay Every director and sorry every SD and FD needs at least one entry like this and that identifies Who they will accept connections from there can be multiple instances of this for each SD and FD so basically I'll talk to anyone or I'll talk to everyone That's listed in my configuration file a file set. What do I back up? That's a file set That's the list of files. I'm gonna back up. It can be a list of directories. It can be a list of explicit files It can be auto-generated on on at runtime I have a script that says back up all these file sets and all it does is ZFS list and Defines the ones that I want to back up. I do that for jails basically I each Jails a separate file system, and it may change over time So when I add a new jail it automatically gets included in the backup for the jails. Oh This part is important One file set per job don't try and make a job that Backs up multiple file sets either create new multiple jobs or create one big file set This is what a client resorts looks like It's a fully qualified domain name the catalog that you're going to use The password for this client. This is what? This will also be sorry This is specified in the director this for this particular client the director would contact it with this password Supplying its own name. So this is a shared secret which will also occur in the bacula client configuration file Now when I say client I say FD, but the client is actually the host it's running on if you really want to get technical You can think of the bacula FD is also being a server But it's it's it's a client running on the client which is a server because you're backing up a server Schedules are just so flexible. This is the one that I use I say That on the first Sunday of every month you run a full every other Sunday is a differential all the other days are incremental Pretty easy I actually go one step further and I say It's not shown here But I say all my full backups go into the full pool all the differential backups go into the differential pool You can get really really flexible Here's a job That's all you have to do to define a job is basically the name of the job Job deaths basically all your jobs often have common Definitions it's like an include file. So a job deaths is an include file think of it that way And there's a file set I'm gonna back up. You can use the same file set for multiple clients So these are the important bits about a job a job runs on just one client So on one server you're backing up one box. You have one job The job has only one file set. You can't specify model file set The job backs up to exactly one storage location But then you can use copy or migrate to move that job to other storage locations There's just one schedule for the job But that schedule can say that if that job runs at different times on different levels on different days So schedule is not just like a chron tab entry. It's much more flexible You can have multiple jobs per client and I do it that way. I have different ZFS data sets I want them backed up at different times on different media and for different reasons so Don't think of don't limit yourself to just one job per client because one job per Thing you want to back up and there might be multiple things that you want to back up on that client This is your job deaths that I told you about this is your include Basically, you put all these in a job jobs death include type thing I actually put my job deaths in a separate file and then include that into the baccala director file, but This is just shorthand so you don't have to specify the same thing in every job This is where I start saying things like okay when I'm backing up from tape I want to back up into this pool and when I'm backing up to disk I want to back up into this pool you can put different priorities on jobs so that all of all the stuff is backing up to SSD for example runs last because it's going to take less time. You're not worried about it so you want to run the the hard drives that you're backing up in too sooner or Whatever priority you want to apply This bit about spooling data if you're backing up to tape You sometimes want to spool the data to a local drive and then back it up to tape Spooling attributes generally you want to spool attributes so that at the end of the job It does a whole lot of inserting into the database catalog that speeds up backups generally Job level we sort of covered full incremental and differential before But it's it's important to know the difference between the incremental and the differential the incremental is relative to whatever last backup ran successfully and that bit is important and That's the bit down there that dictates What a successful job is so differential is just since the last full backup? Incremental is since the last backup whatever so generally In a month the worst case you'd be doing to restore At the end would be the full from the beginning of the month the differential from the last yes for Sundays because you can have five Sundays and then all the incrementals from later on the week now This will not give you an accurate Restore because sometimes you delete files sometime you add files, but you'll get them all restored even though they've been deleted from disk There is this thing called an accurate backup. We'll get to that next When it's backing up it looks at C time and M time so that means that if you copy a whole tree from one location to another It's not going to get backed up because the M time and C time are the same that annoys some people doesn't annoy me This is where accurate backup comes in basically what happens is the The director sends the FD a list of files that are backed up last time and the FD compares that to what it finds And then it ships the files over that aren't in that list So if you do a move it'll take note of that and send it over of course it comes at a cost Virtual backups, this is another feature that I've never used but I've heard really good things about Every time you run a backup you have a full backup as a result and what it does is it jumbles things around Logically and says okay. Hey, this is now a full backup, but in effect You're doing incrementals every day and it's giving you a full at the end of the day We talked about priority before we talked about the schedule Once you've got all that set up they all run for me. It's at 555 UTC every day There's no reason for that time. It's just conveniently in the middle of the night for me You can also have a schedule that says a job never runs so basically you say schedule whatever never Closed bracket and that's for jobs that you want to have around that you only run manually And they never get run run by Baccala deer This is a thing that gets people all the time if you make a change to your file set It's effectively a new file set the file set is actually MD5 Check summed into the database and it says is this the same as when I had before if not It does a whole new backup because it is a different file set But you can get around that by saying ignore file set changes but Then if you make a change it's not reflected until the next full backup a volume Think of a volume is a tape and Use that for the first little while of Learning Baccala that a volume is a tape and it's a physical thing and here it is. I've got this tape Don't confuse it with a file system volume because it also backs up to disk That's why I say forget about backing up to this. Just think of backing up to tape. So what? It's key to remember that disk and tape are sort of treated the same internally And a backup can span multiple volumes You have a 5 gig tape you have a 20 gig backup you need for maybe only three tapes, but That that that's the concept to get in your mind first So when you're backing up to disk it creates a file it backs up into that file. That's a volume It's not a it's not a file system volume. It's a Baccala volume a pool Is a collection of tapes You've got all your full backup tapes on this shelf You have all your incremental backup tapes on that shelf you go and you grab one from the pool It's important to know that a pool is a set of specifications This backup will be retained for three years and every volume that is created based on that pool She has the same characteristics at creation time. You can change them manually later, but that's the bit To keep in your mind when you're using pools. You can have multiple pools I've got about 10 or 15 different pools, and that's a ridiculous amount given that it's just one person So a volume belongs to one pool, but it can change pools After a volume has expired It can be put back into the scratch pool and then be used by any other pool when it needs a new volume So these are a lot of common Pool attributes the most important one to remember is volume retention. We'll get to that Later and how there's two other retention periods as well Label format isn't so so Complex it's basically the name you give to the volume. It's the logical volume name It also happens to be the physical name on disk when you're using disk based backups But don't get too complex with with volume labels You as an operator don't care about the volume label. It's just something for bacula to use You never need to really know what it is except maybe when you're going to retrieve them from the safe undisclosed location So here's what a what my full file backup looks like I I keep everything for three years in that pool and On disk. It's five five gig max. Use whatever size you think makes sense. It doesn't make sense to have a max size that's Way way more than your usual backup or way way less than your usual backup So I have it a multiple type thing if you're doing 100 gig backups all the time if you have a 20 gig volume file That's not unreasonable The key to keep a mind is that when it comes time to recycle a volume It doesn't get that three-year period doesn't start until the last time it was written to So you can be using it over several months three years doesn't start until the last Right not the first rate. So if your older backups are no longer needed, it doesn't matter They're not going to disappear until Three years after the most recent backup. That's why I tend to keep my volume size less than my total backup size If pool there's not a lot of difference between the two Just the time period in the names and the same with the incremental and you can see the label format as well So it's all very straightforward It will not label a volume which is already labeled such as a tape Okay, so this is this is the what you have to go through in order to clear label off a tape if you need to I Storage of resources on disk are nothing very special. It's just it's just here go and contact this host There's something running there back it up there The storage resource is very simple. It's again a password and and Fully qualified domain name to go to and then over on the storage daemon which can be anywhere It can be running on the same host. It can be running on a host on a different VPN. It'll just contact it on the address Now on this case the address on this side and the storage daemon means you should only listen on this IP address You may have multiple IP addresses on this host On the SD you say only this director can contact me and they have to supply that password Otherwise, I'm just going to ignore them. I'm not going to listen to anyone else It doesn't supply that special stuff So here we're backing up to this directory on disk. It's file Basically, this just sets it up. It has different meanings on a tape drive, but on a file based backup This is sort of what it looks like This is the most important thing your catalog is everything protect it even more than your backups because without your catalog You're not screwed But you've got a very difficult time to get anything out Okay, because your catalog defines what was backed up where it was backed up from and where it is now There are ways of reproducing the catalog based on what's in your volumes but it means reading every single one of your volumes and Reproducing the catalog one at a time and it's a tedious task and I wish it upon no one so with a catalog What can you do with it? You can restore anything from anywhere to anywhere You don't have to restore to the same server. It was backed up from you can restore it to another Any client anywhere Just from the comfort of your laptop sitting in the hot tub So we talked about retention for before retention does not refer to the volume itself on disk or on tape Refers to the catalog and keep that in mind. It's how long the information how long the metadata is retained Not necessarily how long the backup is retained Okay, because you may wind up not recycling that backup data until much later than that Because you can still have the backup data, but the metadata about that backup is gone There are three main types of retention specifications file Job and volume the file metadata allows you to specify. I want that file from that date restored The job metadata allows you to say I want that job restored from that from that date and the volume Tells you what's on that volume So we've covered this already a little more in retention So catalog shrink and grow if all you're ever doing is backing up backing up backing up the catalog keeps getting bigger but you may have older jobs that are no longer on there and what you do is you go through a pruning process and You can do that manually or you can do it automatically. I do the pruning automatically And what it does it says is there anything in this catalog that is past its retention date? Yes, there is okay get rid of it and it does it all automatically But keep in mind it's only getting rid of stuff in the catalog You can also purge manually which removes stuff from the catalog Irrespective of retention. You're just saying delete it. I don't care how long it was supposed to be in there Just delete it. We're always pruning follows retention rules You can do pruning manually or automatically I do it automatically Yeah, we talked briefly about a lost catalog don't do that I If you lose your catalog You didn't have a backup really of your catalog you need a backup of your catalog The extract you really don't want to do that, but it's horrible. It's absolutely horrible So what I do is every day after my backup jobs my last job that runs dumps of the catalog To disk then backs up that catalog in the bacula and then an our sync job comes along every day and our sinks that 30 gigabyte file to two or three other locations along with all the configuration files as well So do that every day because one day your backup server is going to disappear and You're not going well your catalog server is going to disappear and you're not gonna you're gonna have all your backups But you're not going to be able to do anything with them So yeah catalog is your best tool and seriously I'm trying to emphasize this so you take care with your catalog and The biggest demand I've seen for help is not the passwords because that's easy just go and read this but when they've lost their catalog Recycling we we mentioned that about recycling you have a tape You don't want to keep it forever after a while you want to rewrite what is on the tape? Recycling dictates when a tape can be reused Apply that same logic to the volumes on the disk and that's what recycling is But this is the important bit bacula will keep filling up your driver regardless of what you Think it should be doing it. It has no Comprehension of how full your file system is it'll just keep creating a new volume and writing there There are directives that you can use You can put pools you can put restrictions on pools You can say the maximum number volume the max size of this volume you do the math and that's how much disk space It'll take up but still monitor your disk space because one day your jobs are gonna fail because they they didn't They ran out of space There's the three types of retention we talked about there again. It refers to the catalog not your backup This is the way I handle handle retention. I put Job file and volume retention all basically at three years because that's the maximum time on keeping any backup But then what I do is as I copy a backup to a different Pool that pool Retention overrides it so any as soon as one retention period is gone The data can be removed from the from the catalog. We covered this already. It's passwords are plain text You just shared shared secrets It can use For the catalog it can store in sequel light my SQL and postgres For some odd reason I recommend only postgres Don't use SQL light. It's not big enough and my SQL if it fails. I'm sorry for your loss Disc versus tape. Yes, some people just you know Tape is really good. Like I can easily transport a tape carry it with me to Europe No one thinks anything of it. I can leave it at home. Whatever. It's easier than transporting a disk in my opinion It's a little more stable usually, but some people would like to argue with that but Later There's not a lot of difference use tape use disk whatever I just like backing up to tape afterwards The disk space issue here is mentioned, but we covered that already running a job It's pretty straightforward This is the type of output that you're likely to get It then gets queued and then runs in the background You need one job for per thing that you want to back up but when we're storing you only need one restore job Don't bother going and creating restore job for each client because all of these configuration items in your restore job can be Overridden at runtime and remember restores are only done manually So you're going to be doing it manually and then we just worry about that later so when you go to restore job you get like 13 different 12 a dozen different choices you get here the ones that you probably use the most is enter a list of files restore or Select a backup for a client before a specified time Sometimes it's number five, but you just pick the one you need and go from there We're not going to do a demo These slides were originally for a three-hour lecture three-hour tutorial Tape drivers. You don't need anything special. It just uses a SCSI file Well, generally it's a SCSI layer. You don't need anything special There is an MTX changer that deals with MTX directly You can put any special thing in there you need Yeah, run your B tape tests especially run a backup that spans two tapes If you're backing up to tapes because sometimes the tape driver doesn't really like spanning tapes Read these two articles if you please That's where I documented all the fun I had when we first got it going because there is an error in the SCSI driver that we get fixed Use sudo to test the baccala commands like sorry sudo su Imitate what baccala is actually going to do don't run it by hand The biggest problem people run into when they're doing tape backups is they haven't tested the process as The baccala user they said but it ran fine when I ran it from the command line You ran it as root, which is not baccala. It's a different user Yeah, we talked about a falset change You can have multiple file systems, but you may not want to back up your NFS mount When it's recycled it you can tell it to truncate before running and this is something obscure about Dragonfly that somebody mentioned to me one day Spooling we talked about spooling before Don't worry. Oh, yeah, sometimes not forget that Backing up jails. In fact anytime you can snapshot what you're going to back up Otherwise you can have a file that was written at the beginning of the backup and it gets written at the end of the backup And it's not consistent. I put a script on github to do this sort of thing Ignore, you know, don't get very fancy with Volume labels just do something like that because you never have to reference this only baccala has to reference this make it Do the work? Questions that was a lot of slides in a very short period of time We have Olivia you You are using it to back up files which are stored in the CFS Supposed your your system crashes. How do you I got a tape? Yeah, that's for the files, but I mean the One of my servers that I backed up has crashed. How do I restore? It hasn't happened yet But what I would do is I would reinstall the OS and Then restore all the data because I only backup data. I don't backup applications or anything like that. I just back up the data No, I don't back that up. So basically this is a file level backup. It's not a file system level backup What do you use to monitor baccala to see that backups are running in storage and stuff like that? I have a Nagios job Nagios check that looks to see how old the catalog backup is and Since the catalog backup runs last It would be held up by any other job that couldn't run So I just check to make sure that that's not more than 36 hours old And that happens a lot on Sundays It passes out layer Do you have any experiences with encrypted backups because I think the cooler is able to deal with them? Yeah, I thought a lot about encrypted backups just like I thought about encrypted file systems But I figured that losing my encryption key was more dangerous than losing my data So I'm I meant someone else getting access to my data. That's what I meant by losing my data I I figure my threat model is not such that someone's going to steal my backups and When I send them off-site, they're not in the care of a third party But if I was doing that I would look at an encryption and you can figure The file daemon to do the encryption so it's encrypted when it leaves the file daemon and you can set it up So it cannot It only has the public key So you sort the private key off off the server and only put it in when you need to do the restore Okay, I think the file daemon can deal with multiple public keys, but I've not heard heard about anyone doing that I've seen discussions, but I've not done it and the file daemon can compress to before it sends it over the wire I think storage daemons can also compress, but I rely on ZFS compression to do that next Backups are really boring. I'm surprised. There's so many people here. I'm very impressed though that you all came in Can you can you know most of us can name software developers that write this or that you can most of you know the people that wrote? Python or pearl, but nobody knows who wrote bacula Right nobody See no, I wrote a small part Kern cybold wrote most of it and he took over from someone before that You don't get famous writing backup software unless it doesn't work. Yeah I'm tempted to do all my backups with ZFS replication Which isn't supported at the moment in bacula. Is there any thoughts of adding it? I think the only problem would be that the catalog wouldn't know what files were where but if you if you Analyzed the snapshot That you did back up with ZFS replication You could add that to the catalog with for the list of files. So has there been any progress in adding that to bacula? No, not at all because Bacula's a file based backup solution not a file system based But what I would do if you wanted to do that is I would back up your replicated data Because that's read only and I would back that up in to bacula so that then if your Replication failed or something you still have a copy in bacula which happens to be on another is that a fuss file system But your beauty there is that When you go to restore you can say I want a file from this date or I want all these files I just What really appealed to me when I was first looking at bacula was the fact that it had a catalog and that Amanda At the time didn't and I was this close to deploying Amanda I had it all installed and stuff and someone said look here And the first thing I did was was give them a postgres back end because I didn't like my squirrel as much other question yep to Regarding chls and the multiple clients I experienced it. So it's possible The fact is that you don't use a pk a PKI because you just have public keys and private keys and no authority certification authority that sign it But the configuration is named pki and the chls configuration is configured to use a pki for authenticating everybody, but it's quite messy and well I Guess that if somewhere some people here had never used the bacula the slide has maybe a bit a bit confusing And it's true that it's big piece of a software There is a fork project Boris. We won't talk about that Did you had experience? How is this? When the fork happened It wasn't Like it wasn't a good situation One of the things some of the things that they did really annoyed the people that did the work on bacula They did, you know, not nice things Like removing authors names from files and changing the copyright headers and stuff like that And so that created a lot of bad animosity between the groups and The I don't think there's a lot of difference between the two projects But I prefer bacula because that's where I started and bad blood but yeah Basically, they're they're about the same they have some different features, but I'm beginning to think that a Lot of features are floating between the two Any other question up? How do you go about backing up databases and such like who can you have sort of pre exec hooks for doing a SQL dump or something There is a feature called run before or run after and it can run on the client or the server the server being bacula dur and the Client being the bacula fd and so you can do whatever you want and that's literally what I do is in the run before job I do a PG dump to disk and Then the examples that you see in the run after job. It deletes the file. I don't delete the file I leave it there because it's another copy But yeah, that that's my preference is PG dump just because I like text files for backing up databases I don't trust backing up the file system level. Although many people claim that that's just fine if you do it, right? I don't like it any more questions Thank you. You're all very brave Thank you