This chapter will give an overview of awk
syntax and some examples to show what kind of problems you could solve using awk
. These features will be covered in depth in later, but you shouldn’t skip this chapter.
Filtering
awk
provides filtering capabilities like those supported by the grep
and sed
commands. As a programming language, there are additional nifty features as well. Similar to many command line utilities, awk
can accept input from both stdin and files.
# sample stdin data
$ printf 'gatenapplenwhatnkiten'
gate
apple
what
kite
# same as: grep 'at' and sed -n '/at/p'
# filter lines containing 'at'
$ printf 'gatenapplenwhatnkiten' | awk '/at/'
gate
what
# same as: grep -v 'e' and sed -n '/e/!p'
# filter lines NOT containing 'e'
$ printf 'gatenapplenwhatnkiten' | awk '!/e/'
what
By default, awk
automatically loops over the input content line by line. You can then use programming instructions to process those lines. As awk
is often used from the command line, many shortcuts are available to reduce the amount of typing needed.
In the above examples, a regular expression (defined by the pattern between a pair of forward slashes) has been used to filter the input. Regular expressions (regexp) will be covered in detail in the next chapter. String values without any special regexp characters are used in this chapter. The full syntax is string ~ /regexp/
to check if the given string matches the regexp and string !~ /regexp/
to check if doesn’t match. When the string isn’t specified, the test is performed against a special variable $0
, which has the contents of the input line. The correct term would be input record, but that’s a discussion for a later chapter.
Also, in the above examples, only the filtering condition was given. By default, when the condition evaluates to true
, the contents of $0
is printed. Thus:
awk '/regexp/'
is a shortcut forawk '$0 ~ /regexp/{print $0}'
awk '!/regexp/'
is a shortcut forawk '$0 !~ /regexp/{print $0}'
# same as: awk '/at/'
$ printf 'gatenapplenwhatnkiten' | awk '$0 ~ /at/{print $0}'
gate
what
# same as: awk '!/e/'
$ printf 'gatenapplenwhatnkiten' | awk '$0 !~ /e/{print $0}'
what
In the above examples, {}
is used to specify a block of code to be executed when the condition that precedes the block evaluates to true
. One or more statements can be given separated by the ;
character. You’ll see such examples and learn more about awk
syntax later.
Idiomatic use of 1
In a conditional expression, non-zero numeric values and non-empty string values are evaluated as true
. Idiomatically, 1
is used to denote a true
condition in one-liners as a shortcut to print the contents of $0
.
# same as: printf 'gatenapplenwhatnkiten' | cat
# same as: awk '{print $0}'
$ printf 'gatenapplenwhatnkiten' | awk '1'
gate
apple
what
kite
Substitution
awk
has three functions to cover search and replace requirements. Two of them are shown below. The sub
function replaces only the first match, whereas the gsub
function replaces all the matching occurrences. By default, these functions operate on $0
when the input string isn’t provided. Both sub
and gsub
modifies the input source on successful substitution.
# for each input line, change only the first ':' to '-'
# same as: sed 's/:/-/'
$ printf '1:2:3:4na:b:c:dn' | awk '{sub(/:/, "-")} 1'
1-2:3:4
a-b:c:d
# for each input line, change all ':' to '-'
# same as: sed 's/:/-/g'
$ printf '1:2:3:4na:b:c:dn' | awk '{gsub(/:/, "-")} 1'
1-2-3-4
a-b-c-d
The first argument to the sub
and gsub
functions is the regexp to be matched against the input content. The second argument is the replacement string. String literals are specified within double quotes. In the above examples, sub
and gsub
are used inside a block as they aren’t intended to be used as a conditional expression. The 1
after the block is treated as a conditional expression as it is used outside a block. You can also use the variations presented below to get the same results:
awk '{sub(/:/, "-")} 1'
is same asawk '{sub(/:/, "-"); print $0}'
- You can also just use
print
instead ofprint $0
as$0
is the default string
You might wonder why to use or learn
grep
andsed
when you can achieve the same results withawk
. It depends on the problem you are trying to solve. A simple line filtering will be faster withgrep
compared tosed
orawk
becausegrep
is optimized for such cases. Similarly,sed
will be faster thanawk
for substitution cases. Also, not all features easily translate among these t