 One of the most important packages of software that's installed on any GNU slash Linux distribution is the GNU Core Utils or the Core Utilities. The Core Utilities are the command line tools that you interact with on a regular basis anytime you open up a terminal, LS, CP, MV, RM, CAT and Grip and SID and all of that stuff. All of that is packaged up as a suite of software called the GNU Core Utils. We also have a very important package called the GNU Find Utils. That includes the GNU Find command, Locate command, XRGs. And I've done videos about many of these Core Utils and Find Util programs before, but today I wanted to talk about two very important Core Utility programs that not a lot of people know about and that's Split and C-Split. So let me switch to my desktop and I'm going to go ahead and launch a terminal and let's zoom in here. And of course we're going to talk about Split and C-Split. Very similar programs, just slightly different Split. You can see it splits a file into pieces and if I quit out of that and I man a C-Split, C-Split splits a file into sections or pieces determined by context line. So basically it accepts regular expressions, rig X. And I mentioned these are lesser known Core Utils because not a lot of people have a need to actually split files up into pieces. But I have run into this problem, the reason I know about these commands is I knew about these commands before I was a full-time Linux desktop user. Many years ago when I was still running Windows on the desktop like 15, 20 years ago, I was actually using Linux web servers because I was one of those kids that loved building websites back in the early days, the mid 90s, late 90s. And sometimes transferring large files, especially large database files to a remote web host you couldn't do it because they had a limit on the size of the files that they could handle for a transfer. So what you would have to do is you would have to split those up into manageable chunks. So if you had, I don't know, a 100 megabyte file, but the web host, the web server could only handle 10 megabyte files for a transfer, well you take that 100 megabyte file and you split it up into 10 separate 10 megabyte files. That way you could actually do the transfer. So let me show you Split and C-Split in action. So what I'm going to do is I'm going to make a directory, I'm going to do M-K-D-I-R Make Deer Test in my home directory. So I'll make a new directory called Test. Make Deer of course is part of the GNU Core Utils. Then I'm going to CD into the new test directory. If I LS it is empty. What I want to do is I want to copy some stuff to work with, some files that we can work on. So what I'm going to do is I'm going to copy my config for fish, the fish shell over to this directory. And then what I want to do is I will also copy over my bash RC and we may split that up into some sections as well. And for a XML file I know I have some OpenBox configs on the system. So let's do OpenBoxMenu.xml, copy all of that over just so we have some test files that we can play with. So the most basic way to use Split is to use Split, space dash L for a line count and then give it a line count. For example I could do 100 lines and then the location of a file, in this case dot bash RC is the name of the file here in this test directory. And if I do that and hit enter and now I do an LS, I have my bash RC still here but now I have these three new files, XAA, XAB, XAC. If I open XAA here inside VIM and I go to the last line, the last line is line number 100 because that is what it's split on, on line 100, right? Because that's what we told it to. If I VIM XAB and I go to the last line, that is also line number 100 and then of course if I VIM XAC, if I can spell it correctly, and I go to the last line, its last line is only line 55 because it didn't have 100 lines, right? That was the very last file that could split. So that is the L flag for a line count. Now you don't actually have to give the L flag for a line count if you're going to do line breaks after 1,000 lines because that's the default. If you don't give it a line count, it's going to break after 1,000 lines. I don't have 1,000 lines in my bash RC. So actually let me show you what would happen if I ran this. So let me actually remove the X files here. Now let me split dot bash RC. And then because the default line break is after 1,000 lines, just watch what happens when I do it LS. I have my bash RC and then I have one outputted file XAA. You see both of them are the exact same size, 8.6 kilobytes in size because basically XAA is an exact copy of the bash RC because there weren't even 1,000 lines to split it up. So let me go ahead and do a RM. I'm going to do a dash RLF just to force this so it doesn't ask me the interactive yes or no question that it asked me when I removed the files before. Now one thing I want to show you before I move on, let me go back to the split dash L100 bash RC where we split the files into 100 line chunks. And I opened the files in Vim to verify that that was the case. You could actually verify this very easily using the WSC program, the word count program, which is another GNU core util. If I did WSC dash L for a line count and I did XAA, it'll actually tell me that XAA has 100 lines in it. XAB has 100 lines in it. If I did XAC, it only has 55 lines in it. And if I wanted to get a list of all of that, I could have just did word count dash L, space X, and then asterisk. And it would have gave me each file and their line counts plus the total of all three. So let me go ahead and remove all of those split files. And then now I want to show you a new flag. Instead of splitting it based on the number of lines, you can actually just tell it how many even splits you want on a file. So if I do split dash in space five and then dot bash RC, what I'm telling split to do is I want you to split the bash RC into equal file sizes of five files. And if I do it LS, you see now I have XAA, XAB, XAC, XAD, XAE. And the one thing to know about the names for these in split and in C split, the very first part of the file name X, think of that as the prefix of the file name. And then everything after that, such as AA, AB, AC, all of that is considered the suffix in the file name. And the reason I mentioned that is because we can actually change the prefix and the suffix of these split file names. So that is splitting based on the number of files. But by the way, I just deleted those files. But you can see from the last LS it created five even sections of files. So it split it into 1.7 kilobyte files. And it was able to do that evenly amongst five files. If we'd actually read those, how it achieves an exact split on those files is because it no longer cares about line breaks or breaking in the middle of a line. So it's a little clunky to do that, especially if you want to go ahead and put these files back together into one file later. So you can split based on line count. You can split based on the number of files. You can also split based on a byte size. So if I do split and space dash B for byte size, I can say split after 512 bytes. Or I could actually do something like 100 megabytes or 100 kilobytes. But my bash RC doesn't have that many lines in it. So let's just do 512 bytes on the bash RC. And if I do an LS, you can see we created a number of files here, about 20 files or so. If I open the first one in Vim, and you can see it no longer cares about line breaks because there was more to this line. But you know, once it reached 512 bytes, that's where it split the file. And it starts the next one. And if I quit out of this and I Vim XAB, you can see it starts. The very first line is the end of the line from the previous file. So again, it's kind of clunky when you split based on exact file size or byte size. Now I mentioned earlier that the X part of the name is the prefix and then the AAABAC is the suffix part of the name. Now what happens if you're splitting a really large file into a bajillion pieces? These are not enough characters in the name to actually accommodate for that. So what you would do is you would tell split, you know, exactly how many letters in the suffix do you want. So if I did split dash A5, for example, that tells split to put five letters as part of the suffix rather than the standard two letters that it would typically do. So if I do this on the bash RC, hit LS, and the new split was this right here. You see XAAAAA, right? Five A's and it only split at the one time because I didn't specify a line count or anything. And remember by default, the line count is a thousand lines. So that's why we only get the one file there. Let me go ahead and do a RMRF on all those split file names that begin with X and let me clear the screen. Now one other thing we could do, let me do a split, a line count 100. And this time I'm going to do dash D. What dash D does is it tells split instead of using the alphabetic characters for the suffix, please use numerals instead digits. So let's do the bash RC again. If I do it LS, you see now instead of XAAA, I have X00, X01, X02. And I believe once again, if I wanted to, I could, you know, just do a dash A5 for example. And I split that up and you see now I get X00, X00, X00, X01, etc, etc. So that is how those flags work. The dash A flag adds more characters to the suffix. And if you add the dash D flag, that changes from using alphabetic characters for the suffix to digits to numeric characters. Let me clear the screen. Now let's talk about the C split command. So if I man C split, C split splits a file into sections determined by context lines. And so what this is, it's pattern matching essentially. Now C split does a lot of the basic stuff that split does. For example, I could split my config dot fish at lines five, lines 25, line 100. If I do that, I get some output here. This is telling me that it created four new files and it's giving me the byte size for each of the four files. If I do a LS, you see we have four split files now, XX00, XX01, XX02, XX03. So the prefix is different in C split. So split, the prefix was a single X. In C split, it defaults to a prefix of two Xs. Also the suffix is changed, where split defaulted to alphabetic suffixes. C split defaults to a numeric suffix. Now let me clear the screen. If I can type clear correctly. Now one cool thing you can do with C split is I could run C split on my config dot fish and I could tell it to split after five lines. And then I could do the curly braces and inside the curly braces, I could do a asterisk meaning repeat the split every five lines. It's basically telling it how many a number of times to do the split after five lines. If I did curly braces and then a zero inside the curly braces, that would say split after five lines one time. Asterisk means just keep going until you can't do it anymore. Now I think this command will probably fail because this command only works if the file you're splitting is actually divisible by five lines. And in my case, it looks like it might have not, it didn't, it gave us some output but eventually it reached the very last split and the very last split was not five lines. It was probably less than five lines and of course that caused it to error out. If I do a up arrow to run that last command and I give it this flag here dash K five and then the curly braces asterisk, this is telling C split. If the last file is not divisible by five run the command anyway. So now it should allow us to run that. Yeah, if I do it LS, you see I have 68 files now that were split into five line chunks. If I did a VM and XX00, you can see there's the first five lines of my config dot fish. Let me run out RMRF to clean all of that up. I'm gonna clear the screen. And one other cool thing you could do is I wanted to split my bash RC on lines five and lines 26 and then I could give it the dash F flag and dash F lag allows you to set a prefix. And maybe I want the prefix instead of being XX to be, I don't know, ZZ. And if I hit enter and I do it LS, you see our outputted files here. ZZ00, ZZ01, ZZ02. So now let me do a RMRFZZ and get rid of those files there. And if I wanted to change these suffix for the files I could also, instead of doing just dash FZZ for the prefix, I could also add dash B for suffix and I could add %02D. And that's saying the suffix needs to be two integer digits and .sh. So I want you to add .sh as an extension to the file name as well. And you see we get the bite size output. If I do it LS, you see now the split file names are ZZ00.sh, ZZ01.sh. And of course I specified two digits. Now that's the default. If I wanted to remove these and rerun that command and specify three digits .sh and then do it LS, you see ZZ00.sh, et cetera. So let me remove those files and clear the screen once again. Now, of course I mentioned the big advantage of C split over split is the fact that it can do pattern matching. You can do reg X expressions in it. So the basic form of that would be a C split and then the name of the file. And then inside single quotes, some kind of pattern matching. So if I wanted to do a pattern match, well, let me show you this in action. Let's C split my config.fish. And then I know I have several functions defined in my fish config. So I want you to split on every line that contains function space. So that is actually where I start defining a function in my fish config. And let's go ahead and add the curly braces, the asterisks. So split every time you find function space in a line. And it looks like it found about 15 of those or so. If I do it LS, it looks like it found 14 of them. And if I open the very first file in Vium, you can see if I scroll down, there's no words or lines to begin with function because if I look at the second file, function. There's a function defined here and that's where it split on that line. And if I look at the third file, you can see it should start with function. So every time I start defining a function, that is where it split my config.fish there. Let's go ahead and remove all the XX files. And of course, you can get creative with the red jacks. I'm sure I've got plenty of blank lines in my config.fish. So if I wanted the pattern to be the carrot symbol, which is the start of the line, the dollar symbol, which is the end of the line. Well, the start of the line and the end of the line with nothing in between them, that's an empty line. So I want you to split on every single empty line. That's gonna be a whole bunch of lines probably. So it split my config.fish into 51 files. Remember it starts at XX00. So let me go ahead and run the RMRF command to get rid of all of that. One last thing I wanna mention with the red jacks and the pattern matching with C split, you can offset it. So for example, if I want you to find every empty line here and split on that, well, I could offset it by whatever number I wanna offset it, so I want to do a plus one. So a plus one line offset, meaning don't break on that empty line you find, break on the line after that. If I run that, we're gonna get 49 split file names this time. But if I vim XX01, the second one, instead of breaking on an empty line, you see there's no empty line there. The empty line is actually gonna be the last line of the very first file that split because it did a one line offset on the splits. Let me quit out of that. And the final thing I wanna just briefly mention is sometimes when you're doing complicated regex expressions and pattern matching and it splits your files, sometimes you will get empty files in the output. So some of the split files that result will actually be zero byte files. And of course, you're not gonna want that. So you may want to go ahead and add this flag here dash dash Eli dash empty dash files to any kind of complicated C split regex command you run just to make sure that you don't have any zero byte files as a result of that command. So that's just some of the basics with the split command and the C split command. Actually, that's a pretty good bit of what you can do with split and C split. They're not very complicated commands. There's not a ton of flags to them. Now I've done several videos in the past highlighting some of these great GNU Core Utils. If you like these videos and you wanna see me highlight some more of these great command line utilities, let me know in the comments down below. Now before I go, I need to thank the producers of this episode. I'm talking about Devin Gabe, James Matt, Michael Mitchell, Paul Scott, Wes Acami Allen, Lenox Ninja, Chuck Commander, Ingrid Kurt, Diokai, David Dillon, Gregory Heiko, Casca, Lee Maxim, Mike Nitrix, Erion, Alexander Peace, Archon Fedora, Polytech Raver, Red Prophet, Steven and Willie. These guys, they're my high steered patrons over on Patreon without these guys. This episode you just watched would not have been possible. The show's brought to you by all of these fine ladies and gentlemen as well. All these names you're seeing on the screen right now. These are all my supporters over on Patreon because I don't have any corporate sponsors. I'm sponsored by you guys, the community. If you like my work and you wanna support me, please consider subscribing to DistroTube over on Patreon. All right guys, peace. My love for the GNU Core Utils is the reason I can't switch to BSD.