Awk can be used to select data from files and then to format the selected data into a report. We will use a three-step process to generate reports: prepare, sort, format.
The preparation step involves selecting data and perhaps performing computations on it to obtain the desired information.
The sort step is necessary if we want to order the data in some particular fashion. To perform this step we pass the output of the preparation program into the system sort command.
The formatting step is done by a second awk program that generates the desired report from the sorted data. In this section we will generate a few reports from the countries file of Chapter 2 to illustrate the approach.
1.1. A Simple Report
Suppose we want a report giving the population, area, and population density of each country. We would like the countries to be grouped by continent, and the continents to be sorted alphabetically; within each continent the countries are to be listed in decreasing order of population density, like this:
CONTINENT COUNTRY POPULATION AREA POP. DEN.
Asia Japan 120 144 833.3
Asia India 746 1267 588.8
Asia China 1032 3705 278.5
Asia USSR 275 8649 31.8
Europe Germany 61 96 635.4
Europe England 56 94 595.7
Europe France 55 211 260.7
North America Mexico 78 762 102.4
North America USA 237 3615 65.6
North America Canada 25 3852 6.5
South America Brazil 134 3286 40.8
The first two steps in preparing this report are done by the program prep1, which, when applied to the file countries.txt, determines the relevant information and sorts it:
prep1.awk
The output is a sequence of lines containing 5 fields, separated by colons, that give the continent, country, population, area, and population density:
We wrote prep1 to print directly into the Unix sort command.
The -t: argument tells sort to use a colon(:) as its field separator.
The +0-1 arguments make the first field the primary sort key, it specifies that sorting should be done based on the first field (+0) in reverse numeric order (-1)
The +4rn argument makes the fifth field, in reverse numeric order, the secondary sort key.
(In Section 6.3, we will show a sort-generator program that creates these lists of options from a description in words.)
Tips:
prep1-1.awk
sort just once:
We have completed the preparation and sort steps; all we need now is to format this information into the desired report. The program form1 does the job:
form1.awk
The desired report can be generated by typing the command line
The peculiar arguments to sort in prep1 can be avoided by having the program format its output so that sort doesn't need any arguments, and then having the formatting program reformat the lines.
By default, the sort command sorts its input lexicographically. In the final report, the output needs to be sorted alphabetically by continent and in reverse numerical order by population density. To avoid arguments to sort, the preparation program can put at the beginning of each line a quantity depending on continent and population density that, when sorted lexicographically, will automatically order the output correctly. One possibility is a fixed-width representation of the continent followed by the reciprocal(倒数,e.g., n's reciprocal value is1/n) of the population density, as in prep2:
prep2.awk
With the countries file as input, here is the output from prep2:
The format %-15s is wide enough for all the continent names, and %12. 8f covers a wide range of reciprocal densities.
The final formatting program is like form1 but skips the new second field. The trick of manufacturing a sort key that simplifies the sorting options is quite general. We'll use it again in an indexing program in Chapter 5.
If we would like a slightly fancier report in which only the first occurrence of each continent name is printed, we can use the formatting program form2 in place of form1:
form2.awk
The command line
generates this report:
The formatting program form2 is a "control-break" program. The variable prev keeps track of the value of the continent field; only when it changes is the continent name printed. In the next section, we will see a more complicated example of control-break programming.
# prep1 - prepare countries by continent and pop. den.
BEGIN { FS = "\t" }
{ printf("%s:%s:%d:%d:%.1f\n",
$4, $1, $3, $2, 1000*$3/$2) | "sort -t: +0 -1 +4rn"
}
cat countries.txt | awk -f prep1.awk
Asia:Japan:120:144:833.3
Asia:India:746:1267:588.8
Asia:China:1032:3705:278.5
Asia:USSR:275:8649:31.8
Europe:Germany:61:96:635.4
Europe:England:56:94:595.7
Europe:France:55:211:260.7
North America:Mexico:78:762:102.4
North America:USA:237:3615:65.6
North America:Canada:25:3852:6.5
South America:Brazil:134:3286:40.8
# prep1-1 - prepare countries by continent and pop. den.
BEGIN { FS = "\t" }
{ printf("%s:%s:%d:%d:%.1f\n",
$4, $1, $3, $2, 1000*$3/$2)
}
# prep2 - prepare countries by continent, inverse pop. den.
BEGIN { FS = "\t"}
{ den = 1000*$3/$2
printf("%-15s:%12.8f:%s:%d:%d:%.1f\n",
$4, 1/den, $1, $3, $2, den) | "sort"
}
Asia : 0.00120000:Japan:120:144:833.3
Asia : 0.00169839:India:746:1267:588.8
Asia : 0.00359012:China:1032:3705:278.5
Asia : 0.03145091:USSR:275:8649:31.8
Europe : 0.00157377:Germany:61:96:635.4
Europe : 0.00167857:England:56:94:595.7
Europe : 0.00383636:France:55:211:260.7
North America : 0.00976923:Mexico:78:762:102.4
North America : 0.01525316:USA:237:3615:65.6
North America : 0.15408000:Canada:25:3852:6.5
South America : 0.02452239:Brazil:134:3286:40.8
# form2 - format countries by continent, pop. den.
BEGIN { FS = ":"
printf("%-15s %-10s %10s %7s %12s\n",
"CONTINENT", "COUNTRY", "POPULATION",
"AREA", "POP. DEN.")
}
{ if ($1 != prev) {
print ""
prev = $1
} else
$1 = ""
printf("%-15s %-10s %7d %10d %10.1f\n",
$1, $2, $3, $4, $5)
}
CONTINENT COUNTRY POPULATION AREA POP. DEN.
Asia Japan 120 144 833.3
India 746 1267 588.8
China 1032 3705 278.5
USSR 275 8649 31.8
Europe Germany 61 96 635.4
England 56 94 595.7
France 55 211 260.7
North America Mexico 78 762 102.4
USA 237 3615 65.6
Canada 25 3852 6.5
South America Brazil 134 3286 40.8