 All right, I guess we should do an intro to this as well. So this is a sort of a free form Q&A lecture where you, as in the two people sitting here, but also everyone at home who did not come here in person, get to ask questions. And we have a bunch of questions that people ask in advance, but you can also ask additional questions during for the two of you who are here. You can do it either by raising your hand or you can submit it on the form and be anonymous. It's up to you. Regardless, though, what we're going to do is just go through some of the questions that have been asked, try to give some as helpful answers as we can, although they are unprepared on our side. And yeah, that's the plan. I guess we go from popular to least popular. Fire away. All right, so for our first question, any recommendations on learning operating system related topics like processes, virtual memory, interrupts, memory management, et cetera? So I think this is an interesting question because these are really low level concepts that often do not matter unless you have to deal with this in some capacity. So one instance where this matters is if you're writing really low level code, like you're implementing a kernel or something like that, or you want to just hack on the Linux kernel. It's rare otherwise that you need to work with especially virtual memory and interrupts and stuff yourself. Processes, I think, are a more general concept that we've talked a little bit about in this class as well and tools like HTop and PGREP and Kill and Signals and that sort of stuff. In terms of learning it, maybe one of the best ways is to try to take either an introductory class on the topic. For example, MIT has a class called 6-828, which is where you essentially build and develop your own operating system based on some code that you're given. And all of those labs are publicly available and all the resources for the class are publicly available. And so that is a good way to really learn them is by doing them yourself. There are also various tutorials online that basically guide you through how do you write a kernel from scratch. Not necessarily a very sort of elaborate one, not one you would want to run any real software on, but just to teach you the basics. And so that would be another thing to look up is how do I write a kernel in and then in your language of choice? You will probably not find one that lets you do it in Python, but in C, C++, Rust, there are a bunch of topics like this. One other note on operating systems. So like John mentioned, MIT has a 6-828 class. But if you're looking for a more high-level overview, not necessarily programming or an operating system, but just learning about the concepts, another good resource is a book called Modern Operating Systems by Andy Tannenbaum. There's also actually a book called The FreeBSD Operating System, which is really good. It doesn't go through Linux, but it goes through FreeBSD. And the BSD kernel is arguably better organized than the Linux one and better documented. And so it might be a gentler introduction to some of those topics than trying to understand Linux. Do you want to check it as it answered? Yes. Nice. Answered. For our next question, what are some of the tools you'd prioritize learning first? Maybe we can all go through and give our own opinion on this. Tools to prioritize learning first. I think learning your editor well just serves you in all capacities. Like being efficient at editing files is just like a majority of what you're going to spend your time doing. And in general, just using your keyboard more in your mouse less. It means that you get to spend more of your time doing useful things and less of your time moving. I think that would be my top priority. This works. So I would say that for what tools to prioritize will depend on what exactly you're doing. But I think the kind of the core idea is like you should try to find the types of tasks that you are doing repetitively. And so like here, like if you are instead of like if you're doing some sort of like machine learning workload and you find yourself using Jupyter notebooks like the one we presented yesterday a lot, then kind of again like using a mouse for that might not be the best idea and you want to familiarize with the keyboard circuits. And pretty much with anything you will end up figuring out that like there are like some repetitive tasks and you're running a computer and just trying to figure out like there's probably a better way to do this be it a terminal, be it an editor. And it might be really interesting to use how to learn to use some of the topics that we have covered. But if they're not extremely useful in everyday basis, then it might be not worth prioritizing them. Out of the topics covered in this class, in my opinion, two of the most useful things are version control and text editors. And I think they're a little bit different from each other in the sense that text editors I think are really useful to learn well. But it was probably the case that before you started using Vim and all its fancy keyboard shortcuts, you had some other text editor you were using before and you could edit text just fine, maybe a little bit inefficiently. Whereas I think version control is another really useful skill and that's one where if you don't really know the tool properly, it can actually lead to some problems like loss of data or just inability to collaborate properly with people. So I think version control is one of the first things that's worth learning well. Yeah, I agree with that. I think learning a tool like it is just gonna save you so much heartache down the line. It also, to add onto that, it really helps you collaborate with others and Anish touched a little bit on GitHub in the last lecture and just learning to use that tool well in order to work on larger software projects that other people are working on is an invaluable skill. For our next question, when do I use Python versus a Bash script versus some other language? This is tough. I think this comes down to what Jose was saying earlier too that it really depends on what you're trying to do. For me, I think for Bash scripts in particular, Bash scripts are for automating, running a bunch of commands. You don't want to write any other business logic in Bash. It is just for, I want to run these commands in this order maybe with arguments, but even that it's unclear that you want a Bash script once you start taking arguments. Similarly, once you start doing any kind of text processing or configuration, all of that reach for a language that is a more serious programming language than Bash is. Bash is really for short one-off scripts or ones that have a very well-defined use case on the terminal in the shell probably. For a slightly more concrete guideline, you might say write a Bash script if it's less than 100 lines of code or so, but once it gets beyond that point, Bash is kind of unwieldy and it's probably worth switching to a more serious programming language like Python. And to add to that, I will say that I've found myself writing sometimes scripts in Python because if I've already solved some sub-problem that covers part of the problem in Python, I find it much easier to compose the previous solution that I found out in Python than just try to reuse Basko that I don't find as reusable as Python. And in the same way, it's kind of nicer that a lot of people have written something like Python libraries or like Ruby libraries that do a lot of these things, whereas in Bash it's kind of hard to have code reuse. And in fact, ooh. And in fact, I think to add to that, usually if you find a library in some language that helps with the task you're trying to do, use that language for the job. And in Bash, there are no libraries. There are only the programs on your computer. So you probably don't want to use it unless there's a program you can just invoke. I do think another thing worth remembering about Bash is Bash is really hard to get right. It's very easy to get it right for the particular use case you're trying to solve right now. But things like what if one of the file names has a space in it? It has cost so many bugs and so many problems in Bash scripts. And if you use a real programming language, then those problems just go away. Checked it. For our next question, what is the difference between sourcing a script and executing that script? Ooh. So this actually we got in office hours a while back as well, which is aren't they the same? Like aren't they both just running the Bash script? And it is true, both of these will end up executing the lines of code that are in the script. The ways in which they differ is that sourcing a script is telling your current Bash script, your current Bash session to execute that program. Whereas the other one is start up a new instance of Bash and run the program there instead. And this matters for things like imagine that script.sh tries to change directories. If you are running the script as in the second invocation here dot slash script.sh, then the new process is gonna change directories. But by the time that script exits and returns to your shell, your shell still remains in the same place. However, if you do CD in a script and you source it, your current instance of Bash is the one that ends up running it. And so it ends up CDing where you are. This is also why if you define functions, for example, that you may want to execute in your shell session, you need to source the script, not run it. Because if you run it, that function will be defined in the instance of Bash, in the Bash process that gets launched, but it will not be defined in your current shell. I think those are two of the biggest differences between the two. Don't have much to add to that. Next question. What are the places where various packages and tools are stored and how does referencing them work? What even is slash bin or slash lib? So as we cover in the first lecture, there is like this path environment variable which is kind of like a semicolon separated string of all the places where your cell is gonna look for binaries. And if you just do something like echo dollar path, you're gonna get this list and all these places are gonna be consulted in order. It's gonna go through all of them. And in fact, there is already, did we cover which? Yeah. So if you run which and specific command, the cell is actually gonna tell you where it's finding this. Beyond that, there are like some conventions where like a lot of programs will install their binaries in their like user bin, or at least they will like include sim links in user bin so that you can find them. There's also user local bin. There are special directories for example, user S bin is only for the pseudo user. And some of these conventions are slightly different between different distros. So I know like some distros for example, install the user libraries under OPT for example. I think that was what I wanted to say. Yeah, I think one thing just to talk a little bit more about bin and then Anish maybe you can do the other folders. So when it comes to bin, the convention, there are conventions and the conventions are usually slash bin are for essential system utilities. User bin are for user programs. And user local bin are for user compiled programs. So things that you install, that you intend the user to run are in user bin. Things that a user has compiled themselves and stuck on your system probably goes in user local bin. But again, this varies a lot from machine to machine in distro to distro. On Arch Linux for example, bin is just a sim link to user bin. They are the same. And as I was saying mentioned, there's also S bin which is for like programs that are intended to only be around this route. That also varies from distro to distro, whether you even have that directory. And on many systems like user local bin might not even be in your path or might not even exist on your system. On BSD on the other hand, user local bin is often used a lot more heavily. Yeah, and so what we were talking about so far, these are all ways that files and folders are organized on Linux. Things are, Linux or BSD, things vary a little bit between that and Mac OS or other platforms. I think for the specific locations, if you need to know exactly what it's used for, you can look it up. But some general patterns to keep in mind are anything with bin in it has binaries, executable programs in it, anything with lib in it has libraries in it. So things that programs can link against. And then some other things that are useful to know are there's a slash ETC on many systems which has configuration files in it. And then there's slash home which underneath that directory contains each user's home directory. So like on a Linux box, my username or if it's a niche will correspond to home directory slash home slash a niche. Yeah, I guess there are a couple of others like slash temp is usually a temporary directory that gets erased when you reboot. Not always, but sometimes you should check on your system. There's slash var which often holds like files to the change over time. So these are usually going to be things like lock files for package managers. They're gonna be things like log files, files to keep track of process IDs. Then there's slash dev which holds devices. Usually, so these are special files that correspond to devices on your system. We talked about slash sys. Anish mentioned ETC. Slash OPT is a common one for just like third-party software that basically it's usually for companies ported their software to Linux but they don't actually understand what running software on Linux is like. And so they just have a directory with all their stuff in it. And when those get installed, they usually get installed into slash OPT. I think those are the ones of the top of my head. Yeah. And we will list these in our lecture notes which we'll produce after this lecture. Next question. Should I apt-get install a Python whatever package or pip install that package? So this is a good question. I think at a higher level this question is asking, should I use my systems package manager to install things or should I use some other package manager? Like in this case, one that's more specific to a particular language. And the answer here is also kind of, it depends. Sometimes it's nice to manage things using the system package manager so everything can be installed and upgraded in a single place. But I think oftentimes, whatever's available in the system repositories, the things that you can get via a tool like apt-get or something similar, might be slightly out of date compared to the more language-specific repository. So for example, a lot of the Python packages I use, I really want the most up-to-date version and so I use pip to install them. To extend on that, it is sometimes the case that the system packages might require some other dependencies that you might not realize about. And it also might be the case that for some systems at least for like Alpine Linux, they don't have wheels for like a lot of the Python packages so it will just take longer to compile them. I will take more space because they have to compile them from scratch. Whereas in, like if you just go to pip, pip has binaries for a lot of different platforms and that will probably work. You also should be aware that pip might not do kind of the exact same thing in different computers. So for example, if you are in a kind of laptop or like a desktop, that is running like x86 or x8664, you probably have binaries but like if you're running something like Raspberry Pi or some other kind of embedded device, this is running on a different kind of hardware architecture and you might not have binaries. I think that's also kind of good to take into account. In that case, it might be worthwhile to use the system packages just because they will take much longer. They will be much shorter to get them than just to kind of compile from scratch the entire Python installation. Apart from that, I don't think I can think of any exceptions where I will actually use the system packages instead of the Python provided ones. Maybe virtual and the multiple versions. Oh yeah. It's like we're talking about... So one other thing to keep in mind is that sometimes you will have more than one program on your computer and you might be developing more than one program on your computer and for some reason, not all programs are always built with the latest version of things. Sometimes they lag a little bit behind and when you install something system-wide, you can only... Depends on your exact system but often you just have one version. What PIP lets you do, especially combined with something like Python's virtual end and similar concepts exist for other languages where you can sort of say I want to, NPM does the same thing as well with its node modules, for example, where I'm going to compile the dependencies of this package in sort of a sub-directory of its own and all of the versions that it require are going to be built in there and you can do this separately for separate projects so that they have different dependencies or the same dependencies for different versions, these are still sort of kept separate and that is one thing that's hard to achieve with system packages. Next question. What's the easiest and best profiling tools to use to improve performance of my code? This is a topic we could talk about for a very long time. The easiest and best is to print stuff using time. Like, I'm not joking, very often the easiest thing is in your code, you at the top, you figure out what the current time is and then you do sort of a binary search over your program of add a print statement, the prints how much time has elapsed since the start of your program and then you do that until you find the segment of code that took the longest and then you go into that function and then you do the same thing again and you keep doing this until you find roughly where the time was spent. It's not foolproof, but it is really easy and it gives you good information quickly. If you do need more advanced information, Valgrind has a tool called Cashgrind. Callgrind, Cashgrind, one of the two. And this tool lets you run your program and measure how long everything takes and all of the call stacks, like which function called which function. And what you end up with is a really neat annotation of your entire program source with the heat of every line, basically how much time was spent there. It does slow down your program by like an order of magnitude or more and it doesn't really support threads but it is really useful if you can use it. If you can't, then tools like Perf or similar tools for other languages that do usually some kind of sampling profiling like we talked about in the profiler lecture can give you pretty useful data quickly. But there's a lot of data around this but they're a little bit biased in what kind of things they usually highlight as a problem and it can sometimes be hard to extract meaningful information about what should I change in response to them. Whereas the sort of print approach very quickly gives you like this section of code is bad or slow. I think would be my answer. Flame graphs are great. They're a good way to visualize some of this information. Yeah, I just have one thing to add. Oftentimes programming languages have language specific tools for profiling. So just figure out what is the right tool to use for your language. Like if you're doing JavaScript in the web browser, the web browser has a really nice tool for doing profiling. You should just use that. Or if you're using Go for example, Go has a built-in profiler is really good. You should just use that. The last thing to that is sometimes you might find that doing this binary search over time that you're kind of finding where the time is going but this time is sometimes happening because you're waiting on the network or you're waiting for some file. And in that case, you want to make sure that the time that is kind of like if I want to write like one gigabyte file or like read one gigabyte file and put it into memory, you want to check that the actual time that is there is the minimum amount of time you actually have to wait. If it's 10 times longer, you should try to use some other tools that we cover in the debugging and profiling section to see why you're not utilizing all your resources because... Because that... No, no, finish your thought. It's just, I realized. Because that might be a lot of what's happening. And think like for example, in my research, like machine learning workloads, a lot of the time is loading data and you have to make sure that like the time it takes to load data is actually the minimum amount of time you want to have that happening. And to build on that, there are actually specialized tools for doing things like analyzing wait times. Very often when you're waiting for something, what's really happening is you're issuing a system call and that system call takes some amount of time to respond. Like you do a really large write, a really large read or you do many of them. And one thing that can be really handy here is to try to get information out of the kernel about where your program is spending its time. And so there's a, it's not new, but there's a relatively newly available thing called BPF or EBPF, which is essentially kernel tracing. And you can do some really cool things with it. And that includes tracing user programs. It can be a little bit awkward to get started with. There's a tool called BPF trace that I would recommend you look into if you need to do like this kind of low level performance debugging, but it is really good for this kind of stuff. You can get things like histograms over how much time was spent in particular system calls. It's a great tool. What browser plugins do you use? Ooh. I try to use as few as I can get away with using. Cause I don't like things being in my browser, but there are a couple of ones that are sort of staples. The first one is Ublock Origin. So Ublock Origin is one of many ad blockers, but it's a little bit more than an ad blocker. It is a, what do they call it? Like a network filtering tool. So it lets you do more things than just block ads. It also lets you like block connections to certain domains, block connections for certain types of resources. So I have mine set up in what they call the advanced mode where basically you can disable basically all network requests, but it's not just network requests. It's also like I have disabled all inline scripts on every page and all third party images and resources. And then you can sort of create a white list for every page. So it gives you really low level tools around how to improve the security of your browsing. But you can also set it in not the advanced mode. And then it does much of the same as a regular ad blocker would do, although in a fairly efficient way. If you're looking at an ad blocker, it's probably the one to use and it works on like every browser. That would be my topic, I think. I think probably the one that you use like the most like actively is one called stylus. The less you modify like the CSS, like the style sheets that like web pages have. And it's pretty neat because like sometimes you're not like looking at a website and you want to hide some like part of the website you don't care about. Like maybe an app, maybe some like some sidebar. You're not finding useful. The thing is at the end of the day these things are displaying in your browser and you have control about like what code is executing and like similar to what John was saying like you can customize this to no end. And what I have for a lot of web pages is like oh hide this part or also trying to make like dark modes for them. Like you can change pretty much the color for every single website. And it was actually pretty neat is that there's like a repository online of people that have contributed these style sheets for like the websites. So like someone probably has like one for GitHub. Like I want that GitHub and someone has already contributed one that makes that much more pleasing to browse. Apart from that one that I, it's not really fancy but I have found it incredibly helpful. It's one that just takes a screen stop of an entire website. And it will kind of scroll for you and make kind of like compound image of the entire website. And that's really great for when trying to print a website is just terrible because they have. Oh interesting. Oh now that you mentioned building to Firefox another one that I really like about Firefox is the multi-account containers. Oh yes, I know. Which kind of lets you like by default a lot of kind of web browsers like for example Chrome have this kind of notion of like there's the session that you have where you have all your cookies and they are kind of all shared from the different websites in the sense of like you keep opening new tabs and unless you go into incognito you cannot have the same profile. And that profile is the same for all websites. There's like this, it's like, it's an extension where it's built in. It's a mix. It's a mix. It's complicated. Okay, so I think like you actually have to say you want to install it or enable it. And again, like the name is multi-account containers and this lets you tell Firefox to have like kind of separate isolated sessions. So for example, you want to say oh I kind of have separate sessions for whenever I visit Google or whenever I visit Amazon. And that can be pre-need because then you can like at a browser level it's ensuring that no like information sharing is happening between the two of them. And it's much more convenient than having to open an incognito window where it's going to clean all the time, the stuff. One thing to mention is stylus versus stylish. Oh yeah, one, I forgot all that. One important thing is the browser extension for side loading CSS stylesets. It's called stylus and that's different from the kind of other one that was called stylus because that one got bought at some point but some shady company that started using it not only to have that functionality but also to read your entire browser history and send that back to their server so they can data mine it. So then people just build this kind of open source alternative that is called stylus and that's the one we recommend. Said that I think the repository for stylus is the same for the two of them but I will have to double check that. Do you have any browser plugins Anish? Yes, so I also have some recommendations for browser plugins. I also use Ublock Origin and I also use stylus. But one other one that I'd recommend is integration with a password manager. So this is a topic that we have in the lecture notes for the security lecture but we didn't really get to talk about in detail. But basically password managers do a really good job at increasing your security when working with online accounts and having browser integration with your password manager can save you a lot of time. Like you can open up a website and it can auto fill your login information for you so you don't have to go and copy and paste it back and forth between a separate program if it's not integrated with your web browser. And it can also, this integration can save you from certain attacks that would otherwise be possible if you were doing this manual copy pasting. For example, phishing attacks. So you find a website that looks very similar to Facebook and you go to login with your Facebook login credentials and you go to your password manager and copy paste the correct credentials into this funny website and now all of a sudden it has your password. But if you have browser integration then the extension can automatically check like am I on F-A-C-E-B-O-O-K dot com or is it some other domain that maybe looks similar and it will not enter the login information if it's the wrong domain. So browser extension for password managing is good. Oh, I agree. Next question. What are other useful data wrangling tools? So in yesterday's lecture I mentioned curl. So curl is a fantastic tool for just making web requests and dumping them to your terminal. You can also use it for things like uploading files which is really handy. In the exercises of that lecture we also talk about JQ and PUP which are command line tools that let you basically write queries over JSON and HTML documents respectively. That can be really handy. Other data wrangling tools. A purl, the purl programming language is often referred to as a write-only programming language because it's impossible to read even if you wrote it but it is fantastic at doing just like straight up text processing. Like nothing beats it there. So maybe worth learning some very rudimentary purl just to write some of those scripts. Like it's easier often than writing some like hacked up combination of grep and all can said. And it will be much faster to just hack something up than writing it up in Python for example. But apart from that, other data wrangling. No, not off the top of my head really. Column dash t. If you pipe any kind of like white space separated input into column dash t it will align all the white space of the columns so that you get nicely aligned columns. Yeah, that's, and head and tail, but we've talked about those. Think a couple of additions to that that I find myself using commonly. One is BIM. Like BIM can be pretty useful for like doing data wrangling on itself. Like sometimes you might find that the operation that you're trying to do is hard to like put down in terms of like piping different operators. But if you can just open the file and just record a kind of like a couple of quick BIM macros to do what you want it to do, it might be like much, much easier. And the one, and then the other one, if you're dealing with tabular data and you want to do more complex operations like sorting by a one column, then grouping and then computing some sort of statistic, I think a lot of that workload I ended up just using Python and Pandas because it's kind of built for that. And one of the pre-need features that I find myself also using is that it will export to many different formats. So kind of this intermediate state has its own kind of Pandas data frame object, but it can kind of export to HTML, latex, a lot of different like table format. So like if your end product is some sort of summary table, then Pandas I think is a fantastic choice for that. I would second the BIM and also Python. I think those are two of my most used data wrangling tools. For the BIM one, last year we had a demo, and this year it's in the lecture notes, but we didn't cover it in class. We had a demo of turning an XML file into a JSON version of that same data using only BIM macros. And I think that's actually like the way I would do it in practice. I don't want to go find a tool that does this conversion. It was actually simple and tuned code as a BIM macro that I just did it that way. And then also Python, especially in an interactive tool like a Jupyter notebook, is a really great way of doing data wrangling. A third tool I'd mention, which I don't remember if we covered in the data wrangling lecture or elsewhere, is a tool called Pandoc, which can do transformations between different text document formats. So you can convert from plain text to HTML or HTML to markdown or late text to HTML, or many other formats that actually, it supports a large list of input formats and a large list of formats. I think there's one last one which I mentioned briefly in the, in the lecture on data wrangling, which is the R programming language. I, it's an awful, I think it's an awful language to program in. And I would never use it in the middle of a data wrangling pipeline. But at the end, in order to like produce pretty plots and statistics, R is great because R is built for doing statistics and plotting. There's a, there's a library for R called GG plot, which is just amazing. GG plot two, I guess technically. It is, it is, it's great. It's great. It produces very nice visualizations. And it lets you do, it lets you very easily do things like, if you have a dataset that has like multiple facets, like it's not just X and Y, it's like X, Y, Z and some other variable. And then you want to plot like the throughput grouped by all of those parameters at the same time in producer visualization are very easily lets you do this. And I haven't seen anywhere else that lets you do that as easily. Next question. What's the difference between Docker and a virtual machine? What's the easiest way to explain this? So Docker starts something called containers. And Docker is not the only program that starts containers. There are many others. And usually they rely on some feature of the underlying kernel. In the case of Docker, they use something called LXC, which are Linux containers. And the basic premise there is if you want to start what looks like a virtual machine that is running roughly the same operating system as you are already running on your computer, then you don't really need to run another instance of the kernel. Really that other virtual machine can share a kernel. Like you can just use the kernels built in isolation mechanisms to spin up a program that thinks it's running on its own hardware, but in reality it's sharing the kernel. And so this means that containers can often run with much lower overhead. Than a full virtual machine will do. But you should keep in mind that it also has somewhat weaker isolation because you are sharing a kernel between the two. If you spin up a virtual machine, the only thing that's shared is sort of the hardware and to some extent the hypervisor. Whereas with Docker container, you're sharing the full kernel and that is a different threat model that you might have to keep in mind. One other small note there, as John pointed out, to use containers, something like Docker, you need the underlying operating system to be roughly the same as whatever the program that's running on top of the container expects. And so if you're using Mac OS for example, the way you use Docker is you run Linux inside a virtual machine and then you can run Docker on top of Linux. So maybe if you're going for containers in order to get better performance, you're trading isolation for performance, if you're running on Mac OS that may not work out exactly as expected. And one last note is that there is a slight difference of like Docker and containers and I think one of the main gotchas that you have to be familiar with is that containers are more similar to virtual machines in the sense that they will persist all the storage that you have where Docker by default won't have that. Like Docker is kind of supposed to be running some, like it's kind of the mini-D, it's like I want to just run some software and I get the image and it runs. And if you want to have any kind of persistent storage that links to the whole system, you have to kind of manually specify that. Whereas like a virtual machine is using some like virtual disk that is being provided. Next question. What are the advantages of each operating system and how can we choose between them? For example, choosing the best Linux distribution for our purposes. I will say that for many, many tasks the specific Linux distribution that you're running is not that important. The thing is, it's just worth kind of knowing there are different types of like groups of distributions. So for example, like there are like some distributions that have really frequent updates but they will kind of break more easily. So for example, like Arch Linux has like a rolling updates in the way of pushing updates where like things might break but they are fine with things being that way. Where maybe where you have some really important web server that is hosting all your kind of business analytics, you want that thing to have like a much more steady way of updates. So that's for example, why you will see distributions like Debian being much more conservative about what they push. Or even for example, Ubuntu makes a difference between kind of like the long-term releases that they only update every two years and the more kind of like periodic releases of like one there's like two a year that they make. So kind of knowing that there's that difference apart from that, some distributions have like different ways of providing the binaries to you and the way they kind of have the repositories. So I think like a lot of Red Hat Linux don't want non-free drivers in their official repositories where I think like Ubuntu is fine with some of them. Apart from that, I think like just a lot of what is caught to most Linux distros is kind of sharing between them. And there's a lot of learning in the common ground so you don't have to worry about the specifics. Keeping with the theme of this class being somewhat opinionated, I'm gonna go ahead and say that if you're using Linux especially for the first time, choose something like Ubuntu or Debian. So Ubuntu is a Debian based distribution but maybe it's a little bit more friendly. Debian is a little bit more minimalist. I use Debian on all my servers for example and I use Debian desktop on my desktop computers that run Linux. If you're going for maybe trying to learn more things and you want a distribution that trades stability for having more up-to-date software maybe at the expense of you having to fix a broken distribution every once in a while then maybe you can consider something like Arch Linux. I think those are... Debian, Arch Linux. Gen2. Or Slackware. Oh man, good old days. But I'd say that if you're installing Linux and just want to get work done, Debian's a great choice. Yeah, I think I would agree with that. The other observation is like you couldn't solve BSD. BSD has gotten, it's come a long way from where it was. There's still a bunch of software you can't really get for BSD but it gives you a very well-documented experience. And one thing that's different about BSD compared to Linux is that in BSD when you install BSD you get a full operating system mostly. So many of the programs are maintained by the same team that maintains the kernel and everything is sort of upgraded together which is a little different than how things work in the Linux world. It does mean that things often move a little bit slower. I would not use it for things like gaming either because driver support is, but it is an interesting environment to look at. And then for things like macOS and Windows, I think if you are a programmer, I don't know why you are using Windows unless you are building things for Windows. Or you want to be able to do gaming and stuff but in that case maybe try dual booting even though that's a pain too. macOS is a good sort of middle point between the two where you get a system that is like relatively nicely polished for you but you still have access to some of the lower level bits at least to a certain extent. It's also really easy to dual boot macOS and Windows. It is not quite the case with like macOS and Linux or Linux and Windows. All right, for the rest of the questions, so these are all zero output questions so maybe we can go through them quickly in the last five or so minutes of class. So the next one is Vim versus Emacs. Vim. Easy answer, but a more serious answer is, like I think all three of us use Vim as our primary editor. I use Emacs for some research specific stuff which requires Emacs but at a higher level both editors have interesting ideas behind them and if you have the time it's worth exploring both to see which fits you better and also like you can use Emacs and run it in a Vim emulation mode. Like I actually know a good number of people who do that so they get access to some of the cool Emacs functionality and some of the cool philosophy behind that. Like Emacs is programmable through a list which is kind of cool, much better than Vim script but people like Vim's modal editing so there's an Emacs plugin called Evil Mode which gives you Vim modal editing within Emacs. So it's not necessarily a binary choice. You can kind of combine both tools if you want to and it's worth exploring both if you have the time. Next question. Any tips or tricks for machine learning application? Our resident machine learning person. I think the knowing how a lot of these tools mainly kind of did their wrongly and a lot of the cell tools is kind of really important because it seems a lot of what you're doing as a machine learning researcher is trying different things but in one core aspect of doing that and like a lot of scientific work is being able to have reproducible results and log in them in a sensible way. So for example, instead of trying to come up with really hacky solutions of how you name your folders to make sense of the experiments maybe it's just worth having for example, what I do is have like a JSON file that describes the entire experiment and all the parameters that are within and then I can really quickly use the tools that we have covered kind of query for all the experiments that have some specific purpose or like use some data set and things like that. Apart from that, the other side of this is if you are running kind of things for like kind of training machine learning applications and you are not already using some sort of cluster that like your university or your company is providing and you're just kind of manually SSH-ing which a lot of labs like do because that's kind of the easy way. It's kind of worth automating a lot of that job because it might not seem like it but kind of manually doing a lot of these operations takes away a lot of your time and also kind of your kind of mental energy for running these things. Any more VIM tips? I have one. So in the VIM lecture, we tried not to link you to too many different VIM plugins because we didn't want that lecture to be overwhelming but I think it's actually worth exploring VIM plugins because there are lots and lots of really cool ones out there. You can, like one resource you can use is the different instructors.files like a lot of us, I think I use like two dozen VIM plugins and I find a lot of them quite helpful and I use them every day and we all use slightly different subsets of them. So go look at what we use or look at some of the other resources we've linked to and you might find some stuff useful. I think to add to that is, I think we went into a lot of detail in the electrocurring if I'm wrong, is getting familiar with the leader key and which is kind of a special key that a lot of firms will kind of, like a lot of specific plugins that we link to and for a lot of the common operations VIM has sort of ways of doing it but you can just figure out like quicker versions of doing them. So for example, like I know that you can do like semi-colon WQ to exit and like save and exit or that you can do like capital set set but I just actually just do leader which for me is the space and then W and I have done that for a lot of, a lot of kind of common operations that I keep doing all the time because just like saving one keystroke for extremely common operation is just having kind of like thousands a month. Yeah, just to expand a little bit on what the leader key is. So in VIM you can bind some keys I can do like control J does something like holding one key and then pressing another I can bind that to something or I can bind a single keystroke to something. What the leader key lets you do is bind so you can assign any key to be the leader key and then you can assign leader followed by some other key to some action. So for example, like Jose's leader key is space and then you can combine space and then releasing space followed by some other key to an arbitrary VIM command. And so it just gives you yet another way of binding like a whole set of key combinations, leader key plus kind of any key on the keyboard to some functionality. I think I have, I forget whether we covered macros in the VIM lecture, but like VIM macros are worth learning. They're not that complicated, but knowing that they're there and knowing how to use them is going to save you so much time. The other one is something called marks. So in VIM you can press M and then any letter on your keyboard to make a mark in that file and then you can press apostrophe in the same letter to jump back to the same place. And this is really useful if you're like moving back and forth between two different parts of your code, for example, you can mark one as A and one as B and you can then jump between them with tick A and tick B. There's also control O which jumps to the previous place you were in the file no matter what caused you to move. So for example, if I am in a some line and then I jump to B and then I jump to A, control O will take me back to B and then back to the place I originally was. This can also be handy for things like if you're doing a search, then the place that you started the search is a part of that stack. So I can do a search, I can then like step through the results and like change them and then control O all the way back up to the search. Control O also lets you move across files. So if I go from one file to somewhere else in a different file to somewhere else in the first file, control O will move me back through that stack. And then there's control I to move forward in that stack. And so it's not as though you pop and it goes away forever. The command colon earlier is really handy. So colon earlier gives you an earlier version of the same file and it does this based on time not based on actions. So for example, if you press a bunch of like undo and redo and make some changes and stuff, earlier will take a literally earlier as in time version of your file and restore it to your buffer. This can sometimes be good if you like undid and then rewrote something and then realized you actually wanted the version that was there before you started undoing. Earlier lets you do this. And there's a plugin called undo tree or something like that. There are a number of plugins. There are several of these that let you actually explore the full tree of undo history that Vim keeps. Because it doesn't just keep a linear history. It actually keeps the full tree and letting you explore that might in some cases save you from having to retype stuff you typed in the past or stuff you forgot exactly what you had there that used to work and no longer works. And then there's one final one I wanna mention which is we mentioned how in Vim you have verbs and nouns, right? So you have verbs like delete or yank. And then you have nouns like next of this character or a percent to swap brackets and that sort of stuff. The search command is a noun. So you can do things like D slash and then a string and it will delete up to the next match of that pattern. This is extremely useful and I use it all the time. One another neat addition on the undo stuff that I find really valuable in an everyday basis is that like one of the like built in functionalities of Vim is that you can specify an undo directory. And if you have a specified an undo directory, by default Vim, like if you don't have this enable whenever you enter a file like your undo history is like clean, there's nothing in there. And as you make changes and then undo them you can create this history but as soon as you exit the file that's lost. But what? As soon as you exit Vim. Yeah, sorry, as soon as you exit Vim that's lost. However, if you have like an undo there Vim is gonna persist all those changes into this directory. So no matter how many kind of times you enter and leave that history is persisted. And it's incredibly helpful because even like it can be very helpful for some files that you modify often because then you can kind of keep the flow. But it's also also something really helpful where you've modified your BASRC and something broke like five days later and then you Vim again is like what actually did I change? If you don't have say like version control then you can just check the undo and like oh that's actually what happened. And the last comment is also really worth familiarizing yourself with registers and what different special registers Vim uses. So for example, if you want to copy paste that's going into a specific register and if you want to for example use the OS copy like the OS clipboard you should be copying or junkie like copying and pasting from a different racer and there's a lot of them and yeah I think that you should explore there's a lot of things to know about registers. The next question is asking about two factor authentication and I'll just give a very quick answer to this one in the interest of time. So it's worth using two factor auth for anything security sensitive. So I use it for my GitHub account and for my email and stuff like that. And there's a bunch of different types of two factor auth from SMS based two factor auth where you get a special like a number texted to you when you try to log in you have to type that number in to other tools like universal two factor. This is like those UB keys that you plug into your computer and you have to tap it every time you log in. So not all, yeah John is holding a UB key. Not all two factor auth is created equal and you really want to be using something like U2F rather than SMS based two factor auth or something based on one time pass codes that you have to type in. We don't have time to get into the details of why some methods are better than others but at a high level use U2F and the internet has plenty of explanations for why other methods are not a great idea. Last question. Any comments on differences between web browsers? Yes. Differences between web browsers. There are fewer and fewer differences between web browsers these days. At this point almost all web browsers are Chrome either because you're using Chrome because you're using a browser that's using the same browser engine as Chrome. It's a little bit sad, one might say but I think these days whether you two, Chrome is a great browser for security reasons. If you want to have something that's more customizable or you don't want to be tied to Google, then use Firefox. Don't use Safari. It's a worse version of Chrome. The new internet explorer edge is pretty decent and also uses the same browser engine as Chrome and is probably fine. Although avoid it if you can because it has some legacy modes you don't want to deal with. Yeah, I think that's... There's a cool new browser called Flow that you can't use for anything useful yet but they're actually writing their own browser engine and that's really neat. Firefox also has this project called Servo which is they're re-implementing their browser engine in Rust in order to write it to be super concurrent and what they've done is they've started to take modules from that version and port them over to Gecko or integrate them with Gecko which is the main browser engine for Firefox just to get those speedups there as well and that's a neat thing you can be watching out for. That is all the questions. Hey, we did it. Nice. I guess thanks for taking the missing semester class and let's do it again next year. All right.