~/ajkule


Search and extract data from files

A file can be viewed with the cat command.

$ cat /etc/passwd

The cat command doesn't provide any way to move forward and backwards in the file so viewing large files with the cat is not an ideal choice. For larger files, you will want to use a less command to view the contents. less command will display one page at a time, allowing you to move forward and backwards in the file by using movement keys. The easiest way to move forward is to press the space key.

$ less /etc/group

The head and tail commands can be used to display only the first few or last few lines of a file. By default, the head and tail commands will display ten lines. The following example will display the first ten lines of the /etc/passwd file.

$ head /etc/passwd

In the next example, the last ten lines of the /etc/group file will be displayed.

$ tail /etc/group

The pipe ('|') can be used to send the output of one command to another. Instead of being printed to the screen, the output of one command becomes input for the next command. This is often used to refine the results of an initial command. The pipe can be useful when listing a large directory, for example the /etc directory. Instead of displaying the full output, command will display only the first ten lines.

$ ls /etc | head

In the previous example, the full output of the ls command is passed to the head command instead of being printed to the screen. The head command takes output from ls and the output of head is then printed to the screen. The sort command can be used to rearrange the lines of files or input in either dictionary or numeric order based upon the contents of one or more fields. The following example sorts a /etc/passwd file, using the head command to grab the first 10 lines.

$ sort /etc/passwd | head

The wc command counts the number of lines, words and bytes (1 byte = 1 character in a text file) for each given file.

$ wc /etc/passwd

The cut command cuts out columns of text from a file or input.

Input/Output (I/O) redirection allows for command line information to be passed to different streams. Standard input, or STDIN, is information entered normally by the user via the keyboard. Standard output, or STDOUT, is the normal output of commands. Standard error, or STDERR, is error message generated by commands. I/O redirection allows the user to redirect STDIN so data comes from a file and STDOUT/STDERR so output goes to a file. Redirection is achieved by using the '<' and '>' characters. STDOUT can be redirected to a file using the '>'.

$ echo 1st line > example
$ ls
example
$ cat example
1st line


echo command displays no output, because STDOUT was sent to the file example instead of the screen. You can see the new file with the output of the ls command. The new file contains the output of the echo command when the file is viewed with the cat command. It is important to know that the '>' will overwrite any contents of an existing file. It is possible to preserve the contents of an existing file with '>>'. The output of the most recent echo command will be added to the bottom of the file.

$ echo 2nd line >> example
$ cat example
1st line
2nd line


The following command will produce an error because the junk directory does not exist.

$ ls /junk
ls: cannot access '/junk': No such file or directory


The STDERR can be redirected to a file using '2>'.

$ ls /junk 2> error
$ cat error
ls: cannot access '/junk': No such file or directory


It is possible to direct both the STDOUT and STDERR at the same time. The following command will produce both STDOUT and STDERR because one of the specified directories exists and the other does not.

$ ls /junk /home

If only the STDOUT is sent to a file with '>', STDERR will still be printed to the screen. If only the STDERR is sent to a file with '2>', STDOUT will still be printed to the screen. Both STDOUT and STDERR can be sent to a file by using '&>'.

$ ls /junk /home &> all

If you don't want STDERR and STDOUT to both go to the same file, they can be redirected to different files by using both > and 2>. For example:

$ ls /junk /home >example 2>error

A regular expression is a collection of characters that are used to match patterns. Some characters have special meanings when used within patterns by commands like the grep command. Basic regular expressions include '.', '[ ]', '*' and '?'. If you use the '.' character, then any possible character could match. For example, a.c matches "abc", etc. In some cases you want to specify exactly which characters you want to match. For this, you can use the '[ ]' characters and specify the characters inside the '[ ]' characters. For example, [abc] matches a, b or c. The '*' character can be used to match zero or more character. For example, ab*c matches "ac", "abc", "abbbc", etc. To match zero or one character, use '?'.