====== UNIX 102 ====== ====== Required Reading ====== - [[unix_101|Unix 101]] - [[unix_commands|Unix Commands]] ====== Dotfiles ====== A mention should be made to the existence of dotfiles, or files/directories whose names begin with a '.'. If you take a close look at the output of the ls command earlier in this document, you will notice that some invocations of ls listed these files and some did not. In particular, the command ''ls'' by itself did not list dotfiles. This is because dotfiles are "hidden files" under Unix. They are shown if you pass special command line arguments to ls, but otherwise ls will not show them to you. Dotfiles are typically used to hold configuration information used by the shell and other programs to store your user-specific preferences and data. Dotfiles are almost always plain text files. This means that one way to change the behavior of a program is to edit the dotfiles that the program uses, and you can do this with any editor you choose. A word of warning however: incorrect content in a dotfile may result in unexpected behavior. You should always make a backup of the file you are about to change so that, if needed, you can restore the original version should something go terribly wrong. The easiest way to find out what dotfiles are used by a program is to check the program's manual, help, or info pages. ====== Parents, Children, Orphans, and Zombies ====== It is often the case that a program will execute another program to either replace itself or to run along side itself. In fact, this is so common, that it's the first thing that happens on a UNIX machine on bootup. The very first program run by the system is usually called ''init'', and it takes care of starting up all of the other programs that make your system run. * Every program has an associated process identification number, called PID. You can see a list of all of the processes running on a computer with the ''ps -ef'' command. The ''ps'' command shows you the **process table** -- a special table kept by the kernel to keep up with all of the processes currently running on the computer. If you only want to see the processes running in your shell, just type ''ps'' without the ''-ef'' command line arguments. * When a program executes another program, the executor is called the **parent**, and the executed the **child**. The parent process owns the child process, and the child process inherits it's execution environment from it's parent. (Don't worry if you don't know what "execution environment" means.) * The ''init'' program is the great ancestor -- every program has ''init'' somewhere in it's linage. If a parent program dies, the child becomes owned by ''init''. * If a program exits, but it still has an entry in the process table, it's called a **zombie**. This will have almost no effect on you or your UNIX experience. Ever. I just wanted a reason to bring zombies into the conversation. ====== Managing Running Commands ====== When you run a command, that command takes over the terminal. It expects to read input from the keyboard and write output to the screen. In the meantime, your shell is sitting there waiting for the command to exit. But how do you regain control if a program goes haywire? Or what if your command is just taking a very long time to complete, and you want to get something else done while you wait? * The behavior where a command takes over the keyboard and screen is called "running a command in the **foreground**". This is the default behavior of commands unless told otherwise. * If you want to interrupt a command, you can hold down the control key and press the "C" key (''control-c''). This will kill the command. (NOTE: If the command is held up by some kernel function, like performing I/O for instance, then the program cannot quit until the kernel first returns control to the program so that it can.) * If you expect a program to take a long time, then you can run it as a **background** process. Background processes should not expect keyboard input. They may be fed input from a file (see the section on redirection and pipes below). A process can be run in the background by appending a space and a ''&'' symbol at the end of the command line. * If you are running a program in the foreground, but it's taking too long, and you want to switch it to the background, you can do so by holding down the control key and pressing "Z" (''ontrol-Z''). This will interrupt the program without killing it. You can then type the command ''bg'' to tell the shell to resume running the program in the background. If you type control-Z and decide that you made a mistake, you can continue running the program in the foreground by typing ''fg''. * If you have one or more programs running in the background, you can view a list of background processes by typing ''jobs''. For example, I'm going to use the ''sleep'' command to simulate a long-running process: peek@catus:~$ # I can type something here and the shell will ignore it because peek@catus:~$ # the line begins with a '#'. In shell scripting, a line beginning peek@catus:~$ # with a '#' is called a comment line. peek@catus:~$ # This lets me tell you what I'm doing without confusing the shell peek@catus:~$ # First, I'm going to start a program that's going to take a long time: peek@catus:~$ sleep 300 ^Z [1]+ Stopped sleep 300 peek@catus:~$ # I just pressed control-Z to interrupt the sleep command. peek@catus:~$ # The sleep command just sits and does nothing for the number of peek@catus:~$ # seconds I tell it to. peek@catus:~$ # In this case, the sleep command is going to sit there for 5 minutes peek@catus:~$ # (300 seconds). I didn't want to wait 5 minutes, so I'm going to move peek@catus:~$ # the command into the background with the 'bg' command peek@catus:~$ bg [1]+ sleep 300 & peek@catus:~$ # Now the sleep command is running again, but it's running in the peek@catus:~$ # background, which frees up the shell for me to run other commands. peek@catus:~$ # Now I'll run another one: peek@catus:~$ sleep 600 ^Z [2]+ Stopped sleep 600 peek@catus:~$ # This command will run for 10 minutes (600 seconds). peek@catus:~$ # I can see a list of background processes with 'jobs' peek@catus:~$ jobs [1]- Running sleep 300 & [2]+ Stopped sleep 600 peek@catus:~$ # Notice that the 'sleep 300' command is "running", but the 'sleep 600' peek@catus:~$ # command is "stopped". I'll start the 'sleep 600' command running again: peek@catus:~$ bg [2]+ sleep 600 & peek@catus:~$ # Now I can do other things. peek@catus:~$ # But what if I want to connect with one of my background processes? peek@catus:~$ # I can bring a background process to the foreground with the 'fg' command. peek@catus:~$ # Using 'fg' by itself will bring to the foreground the last command peek@catus:~$ # I put in the background. peek@catus:~$ # If I want to bring some other process to the foreground, then I have to peek@catus:~$ # use the numerical identifier displayed on the left in square brackets. [1]- Done sleep 300 peek@catus:~$ # Ah! Now I see that the 'sleep 300' command has finished! peek@catus:~$ # Well, now there's no point in bringing it to the foreground -- it's peek@catus:~$ # done and gone. peek@catus:~$ # I'll create another 10-minute sleep... peek@catus:~$ sleep 600 & [3] 15157 peek@catus:~$ # I started this command of in the background right away. Notice that peek@catus:~$ # it was given a new, unique job number, 3, even though there are now only peek@catus:~$ # two processes running in the background. peek@catus:~$ jobs [2]- Running sleep 600 & [3]+ Running sleep 600 & peek@catus:~$ # I'll bring the first process to the foreground. peek@catus:~$ fg 2 sleep 600 ^Z [2]+ Stopped sleep 600 peek@catus:~$ # There. I did it. But then I got bored again, so I put it back in peek@catus:~$ # the background. When these processes end, they will tell you so with peek@catus:~$ # a "Done", just like the 'sleep 300' above. However, even after the peek@catus:~$ # program exits, you won't see the "Done" line until you press RETURN peek@catus:~$ # or enter another command. peek@catus:~$ # Oh yeah, the job number is local the shell. It's not the same thing peek@catus:~$ # as the process identification number (PID). The PID is something the peek@catus:~$ # kernel uses. The difference is like "I'm in apartment B" versus peek@catus:~$ # "I live at 1234 Winston Way Apartments". That is, if you view a shell peek@catus:~$ # as an apartment building, processes as residents, and the kernel as peek@catus:~$ # the city the apartment building is in. But maybe that's just further peek@catus:~$ # confounding an already confusing concept...? [3]- Done sleep 600 peek@catus:~$ # Ah, well done. I see that in the amount of time that it took for me peek@catus:~$ # to type, the last sleep command finished. peek@catus:~$ # Wait a second, wasn't there another sleep command that should have peek@catus:~$ # finished before that one? peek@catus:~$ jobs [2]+ Stopped sleep 600 peek@catus:~$ bg [2]+ sleep 600 & peek@catus:~$ # Did you catch that mistake I made? I said earlier that I had put job peek@catus:~$ # number 2 back into the background, but I forgot to actually type 'bg'. peek@catus:~$ # I've just wasted valuable time that I could have been using for peek@catus:~$ # something else. [2]+ Done sleep 600 peek@catus:~$ ====== Input, Output, and Redirection ====== In computer programming, standard streams are input and output communications channels between a computer program and it's environment. These streams are preconnected when the program begins it's execution. There are three standard I/O channels that are available to every program: **standard input** (**stdin**), **standard output** (**stdout**), and **standard error** (**stderr**). Unless told otherwise, the operating system will assume that a program's stdin comes from the keyboard, and a program's stdout and stderr will go to the screen. However, it's often useful to redirect input and output, or to connect the output of one program to the input of another. This is called **redirection**. ^ Command Line Argument ^ Redirection Type ^ ^ > | Standard output redirection -- Output from the program is sent to the given file, pipe, or specified destination. $ ls -ald /etc/passwd > /tmp/ls-output.txt $ cat /tmp/ls-output.txt -rw-r--r-- 1 root root 2803 May 10 16:40 /etc/passwd | ^ < | Standard input redirection -- Input to the program is read from the given file, pipe, or specified source. $ cat < /etc/passwd root:x:0:0:root:/root:/bin/bash daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin bin:x:2:2:bin:/bin:/usr/sbin/nologin sys:x:3:3:sys:/dev:/usr/sbin/nologin [**> The rest of this output removed for brevity <**] | ^ 2> | Standard error redirection -- Output written to standard error is instead written to the given file, pipe, or specified destination. $ ls -ald /this-file-does-not-exist 2> /tmp/ls-output.txt $ cat /tmp/ls-output.txt ls: cannot access '/this-file-does-not-exist': No such file or directory | ^ | | Pipe -- This is used to shuttle output from one command to another command's input. $ cat /etc/passwd | wc -c 2803 $ cat /etc/passwd | wc -l 51 $ cat /etc/passwd | wc --max-line-length 87 (This shows that my ''/etc/passwd'' file contains 2,803 bytes, and 51 lines. The longest line is 87 characters. These command line arguments and more can be found in the ''wc'' man page, or by typing ''wc --help'' (two dashes before "help").) | ^ 2>&1 | Standard error redirection -- Stderr is written to wherever stdout goes. For example, if writing output to a file, then this: > logfile.txt 2> logfile.txt Is functionally equivalent to this: > logfile.txt 2>&1 | ^ >&2 | Standard output redirection -- Stdout is written to wherever stderr goes. | ====== Variables and Environment Variables ====== A variable is simply a mapping between a string name and a value. In the shell, values can be strings or integers. (Fractions and decimal values are treated like strings.) Variables are created by naming the variable, followed immediately by an equal sign (no spaces), and the value. If the value is to be a string with spaces, then the value needs to be wrapped in single or double quotes. For example: peek@catus:~/Documents/Software$ echo "${v}" peek@catus:~/Documents/Software$ v="Hello World" peek@catus:~/Documents/Software$ echo "${v}" Hello World **NOTE:** * Here I'm introducing a new command, ''echo''. This command will print out whatever you give it as an argument. In addition to printing out the value of a variable, it's also very useful to use inside of scripts for giving the user feedback about what the script is doing. * Also notice that while I assign the variable as ''='', I must access the variable by pre-pending a dollar sign to it's name and using curly braces. Actually, the curly braces are optional, but if you do any scripting for very long, then you'll find that using curly braces keeps things clean and bug-free. So by introducing variable referencing to you with curly braces, I'm hoping you will avoid potential heartache down the road. In the first command I use the ''echo'' command try to print out the value of the ''v'' variable. The shell has no ''v'' variable defined, so the shell prints out an empty line. The second command I set a value to the variable ''v''. And in the third command I print out the value of ''v'' again -- this time it works. Environment variables are just variables that the shell shares with any program that it executes. To turn a variable into an environment variable, you only need to export it: export v Now, any program that is executed by the shell will be able to see and use the ''v'' variable. If you haven't guessed already, the system had a set of standard environment variables that are defined automatically. Here's a list of the most common environment variables: ^ Variable Name ^ Description ^ ^ HOME | The location of the user's home directory. peek@catus:~$ echo "${HOME}" /home/peek | ^ PATH | A colon-separated list of directories in which to look for command programs. For every command you type, the shell will search each of these directories in turn until it finds the command you want. The first match is used, which means the order in which these directories appear is important. Here's my ''PATH'' (which will differ from yours): peek@catus:~$ echo "${PATH}" /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/loc al/games:/snap/bin:/home/peek/usr/Linux/x86_64/bin:/home/peek/usr/Linux/bin:/hom e/peek/usr/bin | ^ USER | Your user name. peek@catus:~$ echo "${USER}" peek | There are usually many, many more. Some are standard and used on nearly every Unix implementation that exists (line ''HOME'', ''USER'', and ''PATH''), others may be non-standard and only exist on that certain machine. If you want to see a comprehensive list of all of the environment variables set in your shell, type the ''set'' command. (Pro Tip: Pipe set to something like ''less'' so that you can actually read it before it scrolls off the top of the screen.) ====== Combining Commands and Subshells ====== ===== Combining Commands On A Single Line ===== Commands can be combined on the same line by separating each command with a semicolon, like so: ^ This ^ = ^ This ^ | ls -1 cd ~/Desktop df . | | ls -1 ; cd ~/Desktop ; df . | ===== Splitting (A) Command(s) Across Multiple Lines ===== Just like it's possible to combine commands, it's also possible to split commands. Any line that ends with a ''\'' character is taken by the shell to mean that the command is incomplete, and that there will be more coming on the next line. Usually you wouldn't do this for commands that you type yourself, but it's handy to use when writing shell scripts as it makes your script easier to read and understand. For an example, see the command substitution section below. ===== Executing Commands In A Subshell ===== Commands can also be run in a subshell. This means that the shell runs a copy of itself, and the copy executes your command. Why would you want to do this? Well, here's an example. Say you want to time how long it takes to log into a set of remote machines with ssh and run a command: peek@catus:~$ time ssh peek@alces01 "uptime" ; time ssh peek@alces02 "uptime" ; time ssh peek@alces03 "uptime" 09:09:03 up 4 days, 18:11, 0 users, load average: 0.06, 0.07, 0.06 real 0m0.591s user 0m0.020s sys 0m0.000s 09:09:04 up 4 days, 18:31, 0 users, load average: 0.01, 0.02, 0.05 real 0m0.667s user 0m0.020s sys 0m0.000s 09:09:04 up 4 days, 19:19, 0 users, load average: 0.00, 0.01, 0.05 real 0m0.673s user 0m0.020s sys 0m0.000s That's all nice and fine, but if you want to know the total time to execute all three commands then you have to do some math. Another method would be to run the three commands in a subshell, and the time the subshell: peek@catus:~$ time (ssh peek@alces01 "uptime" ; ssh peek@alces02 "uptime" ; ssh peek@alces03 "uptime") 09:10:27 up 4 days, 18:12, 0 users, load average: 0.12, 0.08, 0.06 09:10:27 up 4 days, 18:32, 0 users, load average: 0.00, 0.01, 0.05 09:10:28 up 4 days, 19:20, 0 users, load average: 0.04, 0.04, 0.05 real 0m1.788s user 0m0.056s sys 0m0.004s ===== Command Substitution ===== Command substitution allows the output of a command to replace the command name. There are two forms: ^ Old Form ^ New Form ^ | `` | $() | Why would you want to do this? Earlier you saw how to send the output of one command to the input of another command with pipes. But what if what you need is to take the output of one command and use it as a command line argument of another command? For example: $( $() ) This may not seem like much right now, but it becomes very powerful when you get into shell scripting. Here's an example. **NOTE: Don't worry if you don't understand what the code does!** It might look intimidating for the uninitiated -- especially if this is your first trip into terminal-land. It's just hard to relay how useful some of the shell's functions are without going deeper. For now, just bask in it as a glorious example, and for those that want to know more, I'll go into details below: peek@catus:~$ list_of_user_shells=$(for uid in $(seq 108 110); do grep "^[^:]*:[^:]*:${uid}:" /etc/passwd ; done | awk -F: '{print $7}' | sort | uniq) peek@catus:~$ echo "${list_of_user_shells}" /bin/false /usr/sbin/nologin This line is long and ugly to look at. I can break it up: peek@catus:~$ list_of_user_shells=$(\ > for uid in $(seq 108 110); do \ > grep "^[^:]*:[^:]*:${uid}:" /etc/passwd ; \ > done \ > | awk -F: '{print $7}' \ > | sort \ > | uniq \ > ) peek@catus:~$ echo "${list_of_user_shells}" /bin/false /usr/sbin/nologin **NOTE:** The line breaks are to make the code more readable. The ''>'' prompt is a sub-prompt printed by the shell, telling me that the shell understands that my ''\'' character on the end of my input denotes that I'm not done entering my command. You wouldn't actually type the ''>'' character yourself. What does this command do? It searches through ''/etc/passwd'' searching for any user with a user ID number between 108 and 110 inclusively, then pulls from their user record what their login shell is, puts the login shells into a list, sorts the list, and then removes duplicate entries. Here's a breakdown: ^ Command ^ Description ^ | seq 108 110 | This command prints out all integers between the two integers listed on it's command line arguments, inclusively. Ex: $ seq 108 110 108 109 110 | | for uid in $(seq 108 110); do \ ... ; \ done | This command reads in the integers output by the ''seq'' command and loops over each one, assigning each number to the variable ''uid'' and then executing the commands between ''do'' and ''done''. For Ex: $ for uid in $(seq 108 110); do \ echo "PROCESSING UID: ${uid}" ; \ done PROCESSING UID: 108 PROCESSING UID: 109 PROCESSING UID: 110 | | for uid in $(seq 108 110); do \ grep "^[^:]*:[^:]*:${uid}:" /etc/passwd ; \ done | This will run a ''grep'' command for every integer value of ''${uid}'' from 108 to 110. The ''grep'' command will pull out the user record for the user whose UID matches the value stored in ''${uid}''. Ex: $ for uid in $(seq 108 110); do \ > grep "^[^:]*:[^:]*:${uid}:" /etc/passwd ; \ > done sshd:x:108:65534::/var/run/sshd:/usr/sbin/nologin colord:x:109:116:colord colour management daemon,,,: /var/lib/colord:/bin/false statd:x:110:65534::/var/lib/nfs:/bin/false (Note: Line wrapped for readability) | | for uid in $(seq 108 110); do \ grep "^[^:]*:[^:]*:${uid}:" /etc/passwd ; \ done \ | awk -F: '{print $7}' | We only want to extract the shell from the user record. Here's where understanding the user record will come in handy. The format of ''/etc/passwd'' is such that each line is a separate record, and each field of the record is separated by a colon. The user's shell is stored in the 7th field of the record. The ''awk'' command here tells awk that the field separator is a colon, and that we want to print out field number 7. Ex: $ for uid in $(seq 108 110); do \ > grep "^[^:]*:[^:]*:${uid}:" /etc/passwd ; \ > done \ > | awk -F: '{print $7}' /usr/sbin/nologin /bin/false /bin/false | | for uid in $(seq 108 110); do \ grep "^[^:]*:[^:]*:${uid}:" /etc/passwd ; \ done \ | awk -F: '{print $7}' \ | sort \ | uniq | In building our list of shells, we don't want duplicate entries. There are two entries for ''/bin/false''. We can use ''sort'' and ''uniq'' to get rid of these extra entries. | | list_of_user_shells=$(\ for uid in $(seq 108 110); do \ grep "^[^:]*:[^:]*:${uid}:" /etc/passwd ; \ done \ | awk -F: '{print $7}' \ | sort \ | uniq \ ) | Finally, this last bit wraps the entire command into a sub-shell command substitution. The shell will take the output from the entire command and place it into the variable ''list_of_user_shells'', which we can use later. | ====== Exit Codes ====== Whenever a program exits it returns an exit code to the shell. An exit code of 0 means that the program exited normally. A non-zero exit code means that an error occurred. This is useful information for building conditional commands that may change behavior depending on what errors arise. For instance, the ''make'' command will execute a list of commands in a file named ''Makefile'', and exit the first time it encounters an error. Makefiles are often used to generate programs and content. But for now, it's sufficient for you to know that exit codes exist and that they are useful. ====== Playing Around ====== ^ Make a safe place to play around || ^ Type: | $ mkdir /tmp/playground $ cd /tmp/playground | ^ Get a text file to play around with || ^ Type: | $ wget -O file.txt 'http://ocw.mit.edu/ans7870/6/6.006/s08/lecturenotes/files/t8.shakespeare.txt' --2016-06-07 09:25:57-- http://ocw.mit.edu/ans7870/6/6.006/s08/lecturenotes/files/t8.shakespea re.txt Resolving ocw.mit.edu (ocw.mit.edu)... 23.15.135.8, 23.15.135.19 Connecting to ocw.mit.edu (ocw.mit.edu)|23.15.135.8|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 5458199 (5.2M) [text/plain] Saving to: ‘file.txt’ 100%[======================================>] 5,458,199 1.58MB/s in 3.3s 2016-06-07 09:26:00 (1.58 MB/s) - ‘file.txt’ saved [5458199/5458199] | ^ How many lines are in the file? || ^ Type: | $ wc -l file.txt 124456 file.txt | ^ How many words are in the file? || ^ Type: | $ wc -w file.txt 901325 file.txt | ^ What are the first 10 lines of this file? || ^ Type: | $ head -10 file.txt This is the 100th Etext file presented by Project Gutenberg, and is presented in cooperation with World Library, Inc., from their Library of the Future and Shakespeare CDROMS. Project Gutenberg often releases Etexts that are NOT placed in the Public Domain!! Shakespeare *This Etext has certain copyright implications you should read!* < | ^ What is the 3rd word on each line of the last ten lines? || ^ Type: | $ cat file.txt \ > | awk '{print $3}' \ > | tail -10 ONLY, COMMERCIAL CHARGES this | ^ What are the top 10 most frequently used words? || ^ Type: | $ cat file.txt \ > | awk '{a[$1]++}END{for(k in a)print a[k],k}' RS=" |\n" \ > | sort -nr \ > | head -10 517065 23242 the 19540 I 18297 and 15623 to 15544 of 12532 a 10824 my 9576 in 9081 you | ^ NOTE: How would you know to do that!?!? The easiest way is to just search online for someone who's already done it, and then copy what they typed. There are several online forums for command line usage too. That's what I did. Awk is so powerful, I've only scratched the surface of it myself. || ^ In the file '/etc/passwd', what is the 8th line? || ^ Type: | $ cat /etc/passwd | head -8 | tail -1 lp:x:7:7:lp:/var/spool/lpd:/usr/sbin/nologin | ^ The lines in /etc/passwd are fields separated by a colon. What is the value in the 5th field? || ^ Type: | $ cat /etc/passwd | head -8 | tail -1 | awk -F: '{print $5}' lp | ^ The 3rd field is the User ID number (UID). What is the sum of all UIDs in '/etc/passwd'? || ^ Type: | $ n=0 $ cat /etc/passwd \ > | awk -F: '{print $3}' \ > | while read d ; do let n=$(( $n + $d )) ; done $ echo $n 0 | ^ | NOTE: That didn't work! Why? Because the while loop executes in a subshell, and while it is possible to pass values of exported values from parent shell to child subshell, the child gets a copy and not the original variable. This means that when the child process sums up values for 'n', that value is lost when the child process exits. Since the parent's version of 'n' never changes, it's value is still zero. | ^ So what's the correct way to do it? Here's one way that works: || ^ Type: | $ n=0 $ for d in $(cat /etc/passwd | awk -F: '{print $3}') ; do \ > n=$(( $n + $d )) ; \ > done $ echo $n 68740 | ^ | NOTE: The for loop doesn't execute in a subshell. How would you know this? Well, reading the bash manual is probably the best way. :-/ |