User Tools

Site Tools


unix_102

UNIX 102

Required Reading

Dotfiles

A mention should be made to the existence of dotfiles, or files/directories whose names begin with a '.'. If you take a close look at the output of the ls command earlier in this document, you will notice that some invocations of ls listed these files and some did not. In particular, the command ls by itself did not list dotfiles. This is because dotfiles are “hidden files” under Unix. They are shown if you pass special command line arguments to ls, but otherwise ls will not show them to you. Dotfiles are typically used to hold configuration information used by the shell and other programs to store your user-specific preferences and data.

Dotfiles are almost always plain text files. This means that one way to change the behavior of a program is to edit the dotfiles that the program uses, and you can do this with any editor you choose. A word of warning however: incorrect content in a dotfile may result in unexpected behavior. You should always make a backup of the file you are about to change so that, if needed, you can restore the original version should something go terribly wrong.

The easiest way to find out what dotfiles are used by a program is to check the program's manual, help, or info pages.

Parents, Children, Orphans, and Zombies

It is often the case that a program will execute another program to either replace itself or to run along side itself. In fact, this is so common, that it's the first thing that happens on a UNIX machine on bootup. The very first program run by the system is usually called init, and it takes care of starting up all of the other programs that make your system run.

  • Every program has an associated process identification number, called PID. You can see a list of all of the processes running on a computer with the ps -ef command. The ps command shows you the process table – a special table kept by the kernel to keep up with all of the processes currently running on the computer. If you only want to see the processes running in your shell, just type ps without the -ef command line arguments.
  • When a program executes another program, the executor is called the parent, and the executed the child. The parent process owns the child process, and the child process inherits it's execution environment from it's parent. (Don't worry if you don't know what “execution environment” means.)
  • The init program is the great ancestor – every program has init somewhere in it's linage. If a parent program dies, the child becomes owned by init.
  • If a program exits, but it still has an entry in the process table, it's called a zombie. This will have almost no effect on you or your UNIX experience. Ever. I just wanted a reason to bring zombies into the conversation.

Managing Running Commands

When you run a command, that command takes over the terminal. It expects to read input from the keyboard and write output to the screen. In the meantime, your shell is sitting there waiting for the command to exit. But how do you regain control if a program goes haywire? Or what if your command is just taking a very long time to complete, and you want to get something else done while you wait?

  • The behavior where a command takes over the keyboard and screen is called “running a command in the foreground”. This is the default behavior of commands unless told otherwise.
  • If you want to interrupt a command, you can hold down the control key and press the “C” key (control-c). This will kill the command. (NOTE: If the command is held up by some kernel function, like performing I/O for instance, then the program cannot quit until the kernel first returns control to the program so that it can.)
  • If you expect a program to take a long time, then you can run it as a background process. Background processes should not expect keyboard input. They may be fed input from a file (see the section on redirection and pipes below). A process can be run in the background by appending a space and a & symbol at the end of the command line.
  • If you are running a program in the foreground, but it's taking too long, and you want to switch it to the background, you can do so by holding down the control key and pressing “Z” (ontrol-Z). This will interrupt the program without killing it. You can then type the command bg to tell the shell to resume running the program in the background. If you type control-Z and decide that you made a mistake, you can continue running the program in the foreground by typing fg.
  • If you have one or more programs running in the background, you can view a list of background processes by typing jobs.

For example, I'm going to use the sleep command to simulate a long-running process:

peek@catus:~$ # I can type something here and the shell will ignore it because
peek@catus:~$ # the line begins with a '#'.  In shell scripting, a line beginning
peek@catus:~$ # with a '#' is called a comment line.
peek@catus:~$ # This lets me tell you what I'm doing without confusing the shell
peek@catus:~$ # First, I'm going to start a program that's going to take a long time:
peek@catus:~$ sleep 300
^Z
[1]+  Stopped                 sleep 300
peek@catus:~$ # I just pressed control-Z to interrupt the sleep command.
peek@catus:~$ # The sleep command just sits and does nothing for the number of 
peek@catus:~$ # seconds I tell it to.
peek@catus:~$ # In this case, the sleep command is going to sit there for 5 minutes
peek@catus:~$ # (300 seconds). I didn't want to wait 5 minutes, so I'm going to move
peek@catus:~$ # the command into the background with the 'bg' command
peek@catus:~$ bg
[1]+ sleep 300 &
peek@catus:~$ # Now the sleep command is running again, but it's running in the 
peek@catus:~$ # background, which frees up the shell for me to run other commands.
peek@catus:~$ # Now I'll run another one:
peek@catus:~$ sleep 600
^Z
[2]+  Stopped                 sleep 600
peek@catus:~$ # This command will run for 10 minutes (600 seconds).
peek@catus:~$ # I can see a list of background processes with 'jobs'
peek@catus:~$ jobs
[1]-  Running                 sleep 300 &
[2]+  Stopped                 sleep 600
peek@catus:~$ # Notice that the 'sleep 300' command is "running", but the 'sleep 600'
peek@catus:~$ # command is "stopped".  I'll start the 'sleep 600' command running again:
peek@catus:~$ bg
[2]+ sleep 600 &
peek@catus:~$ # Now I can do other things.
peek@catus:~$ # But what if I want to connect with one of my background processes?
peek@catus:~$ # I can bring a background process to the foreground with the 'fg' command.
peek@catus:~$ # Using 'fg' by itself will bring to the foreground the last command
peek@catus:~$ # I put in the background.
peek@catus:~$ # If I want to bring some other process to the foreground, then I have to
peek@catus:~$ # use the numerical identifier displayed on the left in square brackets.
[1]-  Done                    sleep 300
peek@catus:~$ # Ah!  Now I see that the 'sleep 300' command has finished!
peek@catus:~$ # Well, now there's no point in bringing it to the foreground -- it's 
peek@catus:~$ # done and gone.
peek@catus:~$ # I'll create another 10-minute sleep...
peek@catus:~$ sleep 600 &
[3] 15157
peek@catus:~$ # I started this command of in the background right away.  Notice that
peek@catus:~$ # it was given a new, unique job number, 3, even though there are now only
peek@catus:~$ # two processes running in the background.
peek@catus:~$ jobs
[2]-  Running                 sleep 600 &
[3]+  Running                 sleep 600 &
peek@catus:~$ # I'll bring the first process to the foreground.
peek@catus:~$ fg 2
sleep 600
^Z
[2]+  Stopped                 sleep 600
peek@catus:~$ # There.  I did it.  But then I got bored again, so I put it back in
peek@catus:~$ # the background.  When these processes end, they will tell you so with 
peek@catus:~$ # a "Done", just like the 'sleep 300' above.  However, even after the 
peek@catus:~$ # program exits, you won't see the "Done" line until you press RETURN 
peek@catus:~$ # or enter another command.
peek@catus:~$ # Oh yeah, the job number is local the shell.  It's not the same thing
peek@catus:~$ # as the process identification number (PID).  The PID is something the
peek@catus:~$ # kernel uses.  The difference is like "I'm in apartment B" versus
peek@catus:~$ # "I live at 1234 Winston Way Apartments".  That is, if you view a shell
peek@catus:~$ # as an apartment building, processes as residents, and the kernel as
peek@catus:~$ # the city the apartment building is in.  But maybe that's just further
peek@catus:~$ # confounding an already confusing concept...?
[3]-  Done                    sleep 600
peek@catus:~$ # Ah, well done.  I see that in the amount of time that it took for me
peek@catus:~$ # to type, the last sleep command finished.
peek@catus:~$ # Wait a second, wasn't there another sleep command that should have
peek@catus:~$ # finished before that one?
peek@catus:~$ jobs
[2]+  Stopped                 sleep 600
peek@catus:~$ bg
[2]+ sleep 600 &
peek@catus:~$ # Did you catch that mistake I made?  I said earlier that I had put job
peek@catus:~$ # number 2 back into the background, but I forgot to actually type 'bg'.
peek@catus:~$ # I've just wasted valuable time that I could have been using for 
peek@catus:~$ # something else.
[2]+  Done                    sleep 600
peek@catus:~$

Input, Output, and Redirection

In computer programming, standard streams are input and output communications channels between a computer program and it's environment. These streams are preconnected when the program begins it's execution. There are three standard I/O channels that are available to every program: standard input (stdin), standard output (stdout), and standard error (stderr).

Unless told otherwise, the operating system will assume that a program's stdin comes from the keyboard, and a program's stdout and stderr will go to the screen. However, it's often useful to redirect input and output, or to connect the output of one program to the input of another. This is called redirection.

Command Line Argument Redirection Type
>
Standard output redirection – Output from the program is sent to the given file, pipe, or specified destination.
$ ls -ald /etc/passwd > /tmp/ls-output.txt
$ cat /tmp/ls-output.txt
-rw-r--r-- 1 root root 2803 May 10 16:40 /etc/passwd
<
Standard input redirection – Input to the program is read from the given file, pipe, or specified source.
$ cat < /etc/passwd
root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
bin:x:2:2:bin:/bin:/usr/sbin/nologin
sys:x:3:3:sys:/dev:/usr/sbin/nologin
[**> The rest of this output removed for brevity <**]
2>
Standard error redirection – Output written to standard error is instead written to the given file, pipe, or specified destination.
$ ls -ald /this-file-does-not-exist 2> /tmp/ls-output.txt
$ cat /tmp/ls-output.txt
ls: cannot access '/this-file-does-not-exist': No such file or directory
|
Pipe – This is used to shuttle output from one command to another command's input.
$ cat /etc/passwd | wc -c
2803
$ cat /etc/passwd | wc -l
51
$ cat /etc/passwd | wc --max-line-length
87

(This shows that my /etc/passwd file contains 2,803 bytes, and 51 lines. The longest line is 87 characters. These command line arguments and more can be found in the wc man page, or by typing wc –help (two dashes before “help”).)

2>&1
Standard error redirection – Stderr is written to wherever stdout goes. For example, if writing output to a file, then this:
<command> > logfile.txt 2> logfile.txt

Is functionally equivalent to this:

<command> > logfile.txt 2>&1
>&2
Standard output redirection – Stdout is written to wherever stderr goes.

Variables and Environment Variables

A variable is simply a mapping between a string name and a value. In the shell, values can be strings or integers. (Fractions and decimal values are treated like strings.) Variables are created by naming the variable, followed immediately by an equal sign (no spaces), and the value. If the value is to be a string with spaces, then the value needs to be wrapped in single or double quotes.

For example:

peek@catus:~/Documents/Software$ echo "${v}"

peek@catus:~/Documents/Software$ v="Hello World"
peek@catus:~/Documents/Software$ echo "${v}"
Hello World

NOTE:

  • Here I'm introducing a new command, echo. This command will print out whatever you give it as an argument. In addition to printing out the value of a variable, it's also very useful to use inside of scripts for giving the user feedback about what the script is doing.
  • Also notice that while I assign the variable as <variablename>=<value>, I must access the variable by pre-pending a dollar sign to it's name and using curly braces. Actually, the curly braces are optional, but if you do any scripting for very long, then you'll find that using curly braces keeps things clean and bug-free. So by introducing variable referencing to you with curly braces, I'm hoping you will avoid potential heartache down the road.

In the first command I use the echo command try to print out the value of the v variable. The shell has no v variable defined, so the shell prints out an empty line. The second command I set a value to the variable v. And in the third command I print out the value of v again – this time it works.

Environment variables are just variables that the shell shares with any program that it executes. To turn a variable into an environment variable, you only need to export it:

export v

Now, any program that is executed by the shell will be able to see and use the v variable.

If you haven't guessed already, the system had a set of standard environment variables that are defined automatically. Here's a list of the most common environment variables:

Variable Name Description
HOME The location of the user's home directory.
peek@catus:~$ echo "${HOME}"
/home/peek
PATH A colon-separated list of directories in which to look for command programs. For every command you type, the shell will search each of these directories in turn until it finds the command you want. The first match is used, which means the order in which these directories appear is important. Here's my PATH (which will differ from yours):
peek@catus:~$ echo "${PATH}"
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/loc
al/games:/snap/bin:/home/peek/usr/Linux/x86_64/bin:/home/peek/usr/Linux/bin:/hom
e/peek/usr/bin
USER Your user name.
peek@catus:~$ echo "${USER}"
peek

There are usually many, many more. Some are standard and used on nearly every Unix implementation that exists (line HOME, USER, and PATH), others may be non-standard and only exist on that certain machine.

If you want to see a comprehensive list of all of the environment variables set in your shell, type the set command. (Pro Tip: Pipe set to something like less so that you can actually read it before it scrolls off the top of the screen.)

Combining Commands and Subshells

Combining Commands On A Single Line

Commands can be combined on the same line by separating each command with a semicolon, like so:

This = This
ls -1
cd ~/Desktop
df .
ls -1 ; cd ~/Desktop ; df .

Splitting (A) Command(s) Across Multiple Lines

Just like it's possible to combine commands, it's also possible to split commands. Any line that ends with a \ character is taken by the shell to mean that the command is incomplete, and that there will be more coming on the next line. Usually you wouldn't do this for commands that you type yourself, but it's handy to use when writing shell scripts as it makes your script easier to read and understand. For an example, see the command substitution section below.

Executing Commands In A Subshell

Commands can also be run in a subshell. This means that the shell runs a copy of itself, and the copy executes your command. Why would you want to do this? Well, here's an example. Say you want to time how long it takes to log into a set of remote machines with ssh and run a command:

peek@catus:~$ time ssh peek@alces01 "uptime" ; time ssh peek@alces02 "uptime" ; time ssh peek@alces03 "uptime"
 09:09:03 up 4 days, 18:11,  0 users,  load average: 0.06, 0.07, 0.06

real	0m0.591s
user	0m0.020s
sys	0m0.000s
 09:09:04 up 4 days, 18:31,  0 users,  load average: 0.01, 0.02, 0.05

real	0m0.667s
user	0m0.020s
sys	0m0.000s
 09:09:04 up 4 days, 19:19,  0 users,  load average: 0.00, 0.01, 0.05

real	0m0.673s
user	0m0.020s
sys	0m0.000s

That's all nice and fine, but if you want to know the total time to execute all three commands then you have to do some math. Another method would be to run the three commands in a subshell, and the time the subshell:

peek@catus:~$ time (ssh peek@alces01 "uptime" ; ssh peek@alces02 "uptime" ; ssh peek@alces03 "uptime")
 09:10:27 up 4 days, 18:12,  0 users,  load average: 0.12, 0.08, 0.06
 09:10:27 up 4 days, 18:32,  0 users,  load average: 0.00, 0.01, 0.05
 09:10:28 up 4 days, 19:20,  0 users,  load average: 0.04, 0.04, 0.05

real	0m1.788s
user	0m0.056s
sys	0m0.004s

Command Substitution

Command substitution allows the output of a command to replace the command name. There are two forms:

Old Form New Form
`<commands>`
$(<commands>)

Why would you want to do this? Earlier you saw how to send the output of one command to the input of another command with pipes. But what if what you need is to take the output of one command and use it as a command line argument of another command?

For example:

<command1> $(<command2> $(<command3>) )

This may not seem like much right now, but it becomes very powerful when you get into shell scripting. Here's an example. NOTE: Don't worry if you don't understand what the code does! It might look intimidating for the uninitiated – especially if this is your first trip into terminal-land. It's just hard to relay how useful some of the shell's functions are without going deeper. For now, just bask in it as a glorious example, and for those that want to know more, I'll go into details below:

peek@catus:~$ list_of_user_shells=$(for uid in $(seq 108 110); do grep "^[^:]*:[^:]*:${uid}:" /etc/passwd ; done | awk -F: '{print $7}' | sort | uniq)
peek@catus:~$ echo "${list_of_user_shells}"
/bin/false /usr/sbin/nologin

This line is long and ugly to look at. I can break it up:

peek@catus:~$ list_of_user_shells=$(\
> for uid in $(seq 108 110); do \
>   grep "^[^:]*:[^:]*:${uid}:" /etc/passwd ; \
> done \
> | awk -F: '{print $7}' \
> | sort \
> | uniq \
> )
peek@catus:~$ echo "${list_of_user_shells}"
/bin/false /usr/sbin/nologin

NOTE: The line breaks are to make the code more readable. The > prompt is a sub-prompt printed by the shell, telling me that the shell understands that my \ character on the end of my input denotes that I'm not done entering my command. You wouldn't actually type the > character yourself.

What does this command do? It searches through /etc/passwd searching for any user with a user ID number between 108 and 110 inclusively, then pulls from their user record what their login shell is, puts the login shells into a list, sorts the list, and then removes duplicate entries. Here's a breakdown:

Command Description
seq 108 110
This command prints out all integers between the two integers listed on it's command line arguments, inclusively. Ex:
$ seq 108 110
108
109
110
for uid in $(seq 108 110); do \
   ... ; \
done
This command reads in the integers output by the seq command and loops over each one, assigning each number to the variable uid and then executing the commands between do and done. For Ex:
$ for uid in $(seq 108 110); do \
  echo "PROCESSING UID: ${uid}" ; \
done
PROCESSING UID: 108
PROCESSING UID: 109
PROCESSING UID: 110
for uid in $(seq 108 110); do \
  grep "^[^:]*:[^:]*:${uid}:" /etc/passwd ; \
done
This will run a grep command for every integer value of ${uid} from 108 to 110. The grep command will pull out the user record for the user whose UID matches the value stored in ${uid}. Ex:
$ for uid in $(seq 108 110); do \
>   grep "^[^:]*:[^:]*:${uid}:" /etc/passwd ; \
> done
sshd:x:108:65534::/var/run/sshd:/usr/sbin/nologin
colord:x:109:116:colord colour management daemon,,,:
/var/lib/colord:/bin/false
statd:x:110:65534::/var/lib/nfs:/bin/false

(Note: Line wrapped for readability)

for uid in $(seq 108 110); do \
  grep "^[^:]*:[^:]*:${uid}:" /etc/passwd ; \
done \
| awk -F: '{print $7}'
We only want to extract the shell from the user record. Here's where understanding the user record will come in handy. The format of /etc/passwd is such that each line is a separate record, and each field of the record is separated by a colon. The user's shell is stored in the 7th field of the record. The awk command here tells awk that the field separator is a colon, and that we want to print out field number 7. Ex:
$ for uid in $(seq 108 110); do \
>   grep "^[^:]*:[^:]*:${uid}:" /etc/passwd ; \
> done \
> | awk -F: '{print $7}'
/usr/sbin/nologin
/bin/false
/bin/false
for uid in $(seq 108 110); do \
  grep "^[^:]*:[^:]*:${uid}:" /etc/passwd ; \
done \
| awk -F: '{print $7}' \
| sort \
| uniq
In building our list of shells, we don't want duplicate entries. There are two entries for /bin/false. We can use sort and uniq to get rid of these extra entries.
list_of_user_shells=$(\
  for uid in $(seq 108 110); do \
    grep "^[^:]*:[^:]*:${uid}:" /etc/passwd ; \
  done \
  | awk -F: '{print $7}' \
  | sort \
  | uniq \
  )
Finally, this last bit wraps the entire command into a sub-shell command substitution. The shell will take the output from the entire command and place it into the variable list_of_user_shells, which we can use later.

Exit Codes

Whenever a program exits it returns an exit code to the shell. An exit code of 0 means that the program exited normally. A non-zero exit code means that an error occurred. This is useful information for building conditional commands that may change behavior depending on what errors arise. For instance, the make command will execute a list of commands in a file named Makefile, and exit the first time it encounters an error. Makefiles are often used to generate programs and content. But for now, it's sufficient for you to know that exit codes exist and that they are useful.

Playing Around

Make a safe place to play around
Type:
$ mkdir /tmp/playground
$ cd /tmp/playground
Get a text file to play around with
Type:
$ wget -O file.txt 'http://ocw.mit.edu/ans7870/6/6.006/s08/lecturenotes/files/t8.shakespeare.txt'
--2016-06-07 09:25:57--  http://ocw.mit.edu/ans7870/6/6.006/s08/lecturenotes/files/t8.shakespea
re.txt
Resolving ocw.mit.edu (ocw.mit.edu)... 23.15.135.8, 23.15.135.19
Connecting to ocw.mit.edu (ocw.mit.edu)|23.15.135.8|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 5458199 (5.2M) [text/plain]
Saving to: ‘file.txt’

100%[======================================>] 5,458,199   1.58MB/s   in 3.3s   

2016-06-07 09:26:00 (1.58 MB/s) - ‘file.txt’ saved [5458199/5458199]
How many lines are in the file?
Type:
$ wc -l file.txt
124456 file.txt
How many words are in the file?
Type:
$ wc -w file.txt
901325 file.txt
What are the first 10 lines of this file?
Type:
$ head -10 file.txt
This is the 100th Etext file presented by Project Gutenberg, and
is presented in cooperation with World Library, Inc., from their
Library of the Future and Shakespeare CDROMS.  Project Gutenberg
often releases Etexts that are NOT placed in the Public Domain!!

Shakespeare

*This Etext has certain copyright implications you should read!*

<<THIS ELECTRONIC VERSION OF THE COMPLETE WORKS OF WILLIAM
What is the 3rd word on each line of the last ten lines?
Type:
$ cat file.txt \
> | awk '{print $3}' \
> | tail -10
ONLY,
COMMERCIAL
CHARGES



this


What are the top 10 most frequently used words?
Type:
$ cat file.txt \
> | awk '{a[$1]++}END{for(k in a)print a[k],k}' RS=" |\n" \
> | sort -nr \
> | head -10
517065 
23242 the
19540 I
18297 and
15623 to
15544 of
12532 a
10824 my
9576 in
9081 you
NOTE: How would you know to do that!?!? The easiest way is to just search online for someone who's already done it, and then copy what they typed. There are several online forums for command line usage too. That's what I did. Awk is so powerful, I've only scratched the surface of it myself.
In the file '/etc/passwd', what is the 8th line?
Type:
$ cat /etc/passwd | head -8 | tail -1
lp:x:7:7:lp:/var/spool/lpd:/usr/sbin/nologin
The lines in /etc/passwd are fields separated by a colon. What is the value in the 5th field?
Type:
$ cat /etc/passwd | head -8 | tail -1 | awk -F: '{print $5}'
lp
The 3rd field is the User ID number (UID). What is the sum of all UIDs in '/etc/passwd'?
Type:
$ n=0
$ cat /etc/passwd \
> | awk -F: '{print $3}' \
> | while read d ; do let n=$(( $n + $d )) ; done
$ echo $n
0
NOTE: That didn't work! Why? Because the while loop executes in a subshell, and while it is possible to pass values of exported values from parent shell to child subshell, the child gets a copy and not the original variable. This means that when the child process sums up values for 'n', that value is lost when the child process exits. Since the parent's version of 'n' never changes, it's value is still zero.
So what's the correct way to do it? Here's one way that works:
Type:
$ n=0
$ for d in $(cat /etc/passwd | awk -F: '{print $3}') ; do \
> n=$(( $n + $d )) ; \
> done
$ echo $n
68740
NOTE: The for loop doesn't execute in a subshell. How would you know this? Well, reading the bash manual is probably the best way. :-/
unix_102.txt · Last modified: 2016/06/07 13:57 by peek