The AWK book’s 60-line version of Make

By admin
The

AWK
ORG

book’s

60
CARDINAL

-line version of Make


September 2023
DATE

In the wonderful book

The AWK Programming Language
WORK_OF_ART

by

Aho, Weinberger
PERSON

, and

Kernighan
PERSON

, there are a few pages at the end of

chapter 7
LAW

that present a simplified version of the Make utility – written in a single page of

AWK
ORG

code.

Before we look at that, I want to mention that the

second
ORDINAL

edition of the

AWK
ORG

book is coming out

next month
DATE

.

Brian Kernighan
PERSON

’s done a great job of updating it, most notably with a new chapter on exploratory data analysis, and adding proper

CSV
ORG

support to

AWK
ORG

to enable this. I was honoured to be asked to review a draft of the

second
ORDINAL

edition.


AWK
ORG

still shines for exploring data in

2023
DATE

, especially with the new –csv option. CSV mode has also been added to

Gawk
PERSON

(GNU AWK), the most widely-installed version of

AWK
ORG

. My own

GoAWK
PERSON

implementation has had proper

CSV
ORG

support for some time, and I’ve added the –csv option to match the others.

The

second
ORDINAL

edition of the book still includes the Make program, though it’s been made more readable with the addition of some “spacing and bracing” – this took it from

50
CARDINAL

lines to

62
CARDINAL

lines.

This article presents the Make program, to show how

AWK
ORG

is not just great for

one
CARDINAL

-liners, but can be used as a scripting language too – though whether you should or not is another question.

I’m then going to compare what the same program would look like in

Python
ORG

, and briefly discuss when you’d choose

AWK
ORG

or Python for this kind of thing.

It should go without saying, but I intend this purely as a learning exercise (for me and my readers), not a program I’d recommend you use to build your projects!

Original

AWK
ORG

version

The

second
ORDINAL

edition of the book introduces the Make program as follows. (For what it’s worth, I find the term “target” confusing here – I think “source” or “dependency” would fit better.)

This section develops a rudimentary updating program, patterned after the Unix make command, that is based on the depth-

first
ORDINAL

search technique of the previous section. To use the updater, one must explicitly describe what the components of the system are, how they depend upon one another, and what commands are needed to construct them. We’ll assume these dependencies and commands are stored in a file, called a makefile , that contains a sequence of rules of the form name: t1 t2 … tn commands The

first
ORDINAL

line of a rule is a dependency relation that states that the program or file name depends on the targets t1, t2, …, tn where each ti is a filename or another name. Following each dependency relation may be

one
CARDINAL

or more lines of commands that list the commands necessary to generate name. Here is an example of a makefile for a small program with

two
CARDINAL

C files called

a.c
PRODUCT

and

b.c
PERSON

, and a yacc grammar file c.y , a typical program-development application. prog:

a.o b.o c.o gcc
PERSON

a.o b.o c.o -ly -o prog a.o: prog.h a.c gcc -c prog.h a.c b.o: prog.h b.c gcc -c prog.h b.c c.o: c.c gcc -c c.c c.c: c.y yacc c.y mv y.tab.c c.c print: pr prog.h a.c b.c c.y The

first
ORDINAL

line states that prog depends on the target files

a.o , b.o ,
ORG

and

c.o
PERSON

. The

second
ORDINAL

line says that prog is generated by using the C compiler command gcc to link

a.o , b.o , c.o
ORG

, and a

yacc
GPE

library y into the file prog . The next rule (

third
ORDINAL

line) states that a.o depends on the targets prog.h and

a.c
ORG

and is created by compiling these targets; b.o is the same. The file c.o depends on c.c , which in turn depends on c.y , which has to be processed by the

yacc
ORG

parser generator. Finally, the name print does not depend on any target; by convention, for targetless names make will always perform the associated action, in this case printing all the source files with the command pr . The dependency relations in the makefile can be represented by a graph in which there is an edge from node x to node y whenever there is a dependency rule with x on the left side and y one of the targets on the right. For a rule with no targets, a successorless node with the name on the left is created. For the makefile above, we have the following dependency graph: prog print / | \ / | \ a.o b.o c.o / \ / \ \ / \/ \ \

a.c prog.h
ORG

b.c c.c | c.y

It’s a highly-simplified version of Make, of course, but still has the core concepts of outputs, dependencies, and build commands.

Before we look at how it works, I’ve included the full source code below, as it appears in the

second
ORDINAL

edition of the

AWK
ORG

book. Click on the bold text to expand it, or skip down to “How it works” to see the code explained in detail.

The

AWK
ORG

book’s Make program (full source code). BEGIN { while ( getline < "makefile" > 0 ) { if ( $

0
MONEY

~ /^ [

A-Za-z
PERSON

] / ) {

# $1
MONEY

:

$2 $
MONEY

3 … sub ( /:/ , "" ) if ( ++ names [ nm = $

1
MONEY

] >

1
CARDINAL

) error ( nm " is multiply defined" ) for ( i =

2
CARDINAL

; i <= NF ; i ++ ) # remember targets slist [ nm , ++ scnt [ nm ]] = $i } else if ( $

0
MONEY

~ /^ \t / ) { # remember cmd for cmd [ nm ] = cmd [ nm ] $

0
MONEY

"

" # current name } else if ( NF > 0 ) { error ( "illegal line in makefile: " $

0
MONEY

) } } ages () # compute initial ages if ( ARGV [

1
CARDINAL

] in names ) { if ( update ( ARGV [

1
CARDINAL

]) == 0 ) print ARGV [

1
CARDINAL

] " is up to date" } else { error ( ARGV [

1
CARDINAL

] " is not in makefile" ) } } function ages ( f , n , t ) { for ( t =

1
CARDINAL

; ( "ls -t" | getline f ) >

0
CARDINAL

; t ++ ) age [ f ] = t # all existing files get an age close ( "ls -t" ) for ( n in names ) if ( ! ( n in age )) # if n has not been created age [ n ] = 9999 # make n really old } function update ( n , changed , i , s ) { if ( ! ( n in age )) error ( n " does not exist" ) if ( ! ( n in names )) return

0
CARDINAL

changed = 0 visited [ n ] =

1
CARDINAL

for ( i =

1
CARDINAL

; i <= scnt [ n ]; i ++ ) { if ( visited [ s = slist [ n , i ]] == 0 ) update ( s ) else if ( visited [ s ] ==

1
CARDINAL

) error ( s " and " n " are circularly defined" ) if ( age [ s ] <= age [ n ]) changed ++ } visited [ n ] =

2
CARDINAL

if ( changed || scnt [ n ] == 0 ) { printf ( "%s" , cmd [ n ]) system ( cmd [ n ]) # execute cmd associated with

n ages
ORG

() # recompute all ages age [ n ] = 0 # make n very new return

1
CARDINAL

} return

0
CARDINAL

} function error ( s ) { print "error: " s ; exit }

How it works

There’s an explanation of how the program works in the book, but I’ll explain it in my own words here, focussing on the aspects I find interesting.

The BEGIN block is the main entry point for a program like this. Unlike most

AWK
ORG

programs which implicitly read lines from standard input, this one uses an explicit loop with getline to read the makefile :

BEGIN { while ( getline < "makefile" > 0 ) { if ( $

0
MONEY

~ /^ [

A-Za-z
PERSON

] / ) {

# $1
MONEY

:

$2 $
MONEY

3 … sub ( /:/ , "" ) if ( ++ names [ nm = $

1
MONEY

] >

1
CARDINAL

) error ( nm " is multiply defined" ) for ( i =

2
CARDINAL

; i <= NF ; i ++ ) # remember targets slist [ nm , ++ scnt [ nm ]] = $i } else if ( $

0
MONEY

~ /^ \t / ) { # remember cmd for cmd [ nm ] = cmd [ nm ] $

0
MONEY

"

" # current name } else if ( NF > 0 ) { error ( "illegal line in makefile: " $

0
MONEY

) } } … }

The getline <filename is a redirect clause that opens makefile (the

first
ORDINAL

time) and reads it line-by-line until the end. If the line ( $

0
MONEY

) starts with a letter (

/^[A-Za-z]/
PERSON

), it’s considered a name: targets rule.

The sub(/:/, "") call removes the colon from the current line (the $

0
MONEY

is implicit in the

two
CARDINAL

-argument form of sub ).

We then ensure that this rule hasn’t already been defined by checking the names array. An

AWK
ORG

array is actually an associative array, an old-school term for a key-value map.

The inner for loop adds each target (or dependency) to the slist /

scnt
ORG

data structure. This is really a map of lists, but it’s flattened to work around the fact that

AWK
ORG

doesn’t support nested collections. The body of the loop is very terse:

for ( i =

2
CARDINAL

; i <= NF ; i ++ ) slist [ nm , ++ scnt [ nm ]] = $i

This loops through each dependency: every field $i from field

2
CARDINAL

to NF (the number of fields in the line).

For each dependency, it increments scnt[nm] , the count of sources for the current rule ( nm ). Then, store the dependency $i in slist , indexed by the multi-key name and count.

AWK
ORG

simulates multi-dimensional or multi-key arrays by creating a concatenated key where each key is separated by the

SUBSEP
ORG

separator (which defaults to "

\x1c
PERSON

" ).

After the loop, in the prog example we’d end up with slist and

scnt
ORG

looking like this:

slist a.o,1:

prog.h a.o,2: a.c
ORG

b.o,1: prog.h b.o,2: b.c c.c,1: c.y c.o,1: c.c prog,1: a.o prog,2: b.o prog,3:

c.o scnt a.o:
PERSON


2
CARDINAL

b.o:

2 c.c
QUANTITY

:

1
CARDINAL

c.o:

1
CARDINAL

prog:

3

CARDINAL

Coming back up, if the line starts with a tab, it’s a command, so we append it to the name’s command string:

cmd [ nm ] = cmd [ nm ] $

0
MONEY

"

"

Otherwise, if the line is not a blank line ( NF > 0 ), it’s a makefile error.

Finally, after reading the makefile in the while loop, we uses ages() to compute the ages of all files in the current directory, and then call update(ARGV[1]) to update the rule passed on the command line:

BEGIN { … ages () # compute initial ages if ( ARGV [

1
CARDINAL

] in names ) { if ( update ( ARGV [

1
CARDINAL

]) == 0 ) print ARGV [

1
CARDINAL

] " is up to date" } else { error ( ARGV [

1
CARDINAL

] " is not in makefile" ) } }

The ages function is where things start to get interesting:

function ages ( f , n , t ) { for ( t =

1
CARDINAL

; ( "ls -t" | getline f ) >

0
CARDINAL

; t ++ ) age [ f ] = t # all existing files get an age close ( "ls -t" ) for ( n in names ) if ( ! ( n in age )) # if n has not been created age [ n ] = 9999 # make n really old }

The parameter names f , n , and t are prefixed with a bunch of spaces to show they’re actually local variables, and not expected as arguments. This is an

AWK
ORG

quirk (which

Kernighan
PERSON

regrets): the only way to define local variables is as function parameters, and if a function is called with fewer arguments than it has parameters, the extras take on the default value (

0
CARDINAL

for numbers, "" for strings). So you’ll see these extra spaces a lot in

AWK
ORG

function definitions.

The next thing is quite neat:

AWK
ORG

supports shell-like | syntax to pipe the output of a program,

one
CARDINAL

getline at a time, to a variable (in this case f ). The ls -t command lists files in the current directory ordered by modification time, newest

first
ORDINAL

.

After the loop that’s assigned each file’s age to age[f] , we call close to close the ls -t pipe and avoid too many open file handles.

Finally, we loop through the rule names and assign an arbitrary large number to age[n] to pretend that files that haven’t been created are really old and need to be updated.


Next
ORG

is the recursive update function, where the meat of the algorithm lives:

function update ( n , changed , i , s ) { if ( ! ( n in age )) error ( n " does not exist" ) if ( ! ( n in names )) return

0
CARDINAL

changed = 0 visited [ n ] =

1
CARDINAL

for ( i =

1
CARDINAL

; i <= scnt [ n ]; i ++ ) { if ( visited [ s = slist [ n , i ]] == 0 ) update ( s ) else if ( visited [ s ] ==

1
CARDINAL

) error ( s " and " n " are circularly defined" ) if ( age [ s ] <= age [ n ]) changed ++ } visited [ n ] =

2
CARDINAL

if ( changed || scnt [ n ] == 0 ) { printf ( "%s" , cmd [ n ]) system ( cmd [ n ]) # execute cmd associated with

n ages
ORG

() # recompute all ages age [ n ] = 0 # make n very new return

1
CARDINAL

} return

0
CARDINAL

}

Once again you’ll note the parameter list: n is an expected argument (the name to update), and changed,i,s are the locals.

After initial checks, we loop through the list of dependencies by iterating from slist[n,

1
CARDINAL

] to slist[n, scnt[n]] . If we haven’t visited this dependency yet, we perform a depth-

first
ORDINAL

traversal of the dependency graph by recursively calling update to see if we need to update that dependency

first
ORDINAL

:

if ( visited [ s = slist [ n , i ]] == 0 ) update ( s )

The recursion is terminated by the if (!(n in names)) return

0
CARDINAL

block near the top. We stop when the file being updated isn’t in the list of rule names – which is a leaf node in the dependency graph.

The block if (age[s] <= age[n]) changed++ increments the changed count if any dependency is newer than the age of the current file being updated.

After the traversal loop, if any of the dependencies or sub-dependencies had changed, we run the associated command using system() , recompute the ages of all files, and return

1
CARDINAL

to the caller to indicate we did make an update.

The scnt[n] == 0 clause handles the case where the rule being updated doesn’t have any dependencies specified, like the print rule in the example. In that case, always re-run its command.

And there you have it! A minimalist Make in

one
CARDINAL

page of

AWK
ORG

.

Python version

For interest, I ported the book’s

AWK
ORG

Make to Python, and have included it below. Once again, click the bold text to expand the program.

My Python port of the Make program (full source code). import os , re , sys slist = {} # slist[name] is list of rule’s sources cmd = {} # cmd[name] is shell command to run for rule def main (): for line in open ( ‘makefile’ ): if re . match ( ‘[

A-Za-z
WORK_OF_ART

]’ , line ): line = line . replace ( ‘:’ , ” ) fields = line . split () nm = fields [ 0 ] if nm in slist : error ( f ‘ { nm } is multiply defined’ ) slist [ nm ] = fields [

1
CARDINAL

:] # remember targets elif line .

startswith
ORG

( ‘ \t ‘ ): # remember cmd for current name cmd [ nm ] = cmd . get ( nm , ” ) + line elif line . strip (): error ( f ‘illegal line in makefile: { line } ‘ ) if sys . argv [

1
CARDINAL

] in slist : if not update ( sys . argv [

1
CARDINAL

]): print ( sys . argv [

1
CARDINAL

], ‘is up to date’ ) else : error ( f ‘ { sys . argv [

1
CARDINAL

] } is not in makefile’ ) def mtime ( n ): try : return os . stat ( n ). st_mtime except FileNotFoundError : return

0 # mark
MONEY

as old if it doesn’t exist def update ( n , visited = {}): ntime =

mtime ( n
ORG

) if n not in slist and

ntime
PERSON

== 0 : error ( f ‘ { n } does not exist’ ) if n not in slist : return

0
CARDINAL

changed = False visited [ n ] =

1
CARDINAL

for s in slist . get ( n , []): if s not in visited : update ( s ) elif visited [ s ] ==

1
CARDINAL

: error ( f ‘ { s } and { n } are circularly defined’ ) if

mtime ( s )
ORG

> ntime : changed = True visited [ n ] =

2
CARDINAL

if changed or len ( slist . get ( n , [])) == 0 : print ( cmd [ n ], end = ” ) os .

system
PERSON

( cmd [ n ]) # execute cmd associated with n return

1
CARDINAL

return 0 def error ( msg ): print ( ‘error:’ , msg , file = sys . stderr ) sys . exit (

1
CARDINAL

) if __name__ == ‘__main__’ : main ()

It’s very similar in structure to the original

AWK
ORG

version, though I made

two
CARDINAL

simplifications which I think make it somewhat easier to understand:

Simpler data structures to avoid the slist /

scnt quirkiness
PERSON

– in Python we can just use a dictionary of lists. (See diff.) Determine ages more directly using os.stat() to fetch file modification times (mtimes), rather than using the ls -t trick. This also removes the need for the age map and the ages function. (See diff.)

I didn’t plan for this, but even if you include the import line and the if __name__ == ‘__main__’ dance, it’s

58
CARDINAL

lines of code – basically the same length as the

AWK
ORG

program.

When making the

Python
ORG

version, I realized we could simplify the

AWK
ORG

version in a similar way:

It’s conceptually simpler to store the slist directly as an

AWK
ORG

array: a key-value map where the key is the rule name and the value is the list of dependencies as a space-separated string (just like in the makefile ). We can use split as needed to turn the dependencies string into a list (an array from

1
CARDINAL

to the number of dependencies). This avoids the need for

scnt
GPE

and names altogether. (See diff.) Similar to the

Python
ORG

version, we can get the mtime directly by shelling out to stat , instead of listing all files in age order with ls -t . I’ve used stat –format %y to do this. I believe this is a

GNU
ORG

extension, so it’s not as portable as ls -t , but it’s simpler and avoids the need for recomputing the age array. (See diff.)

Update:

Volodymyr Gubarkov
PERSON

pointed out that the stat version “adds multiple external process invocations”, and he’s quite right. It might be more direct, but it is significantly slower.

For what it’s worth, the modified version is

four
CARDINAL

lines shorter than the original. I think the simpler slist is clearer, and I like the more direct approach to fetching mtimes, though I realize the lack of portability of stat –format is a downside (macOS’s stat looks quite different).

Conclusion

The

AWK
ORG

Make program is a neat little piece of code that shows how useful a language

AWK
ORG

is, even for medium-sized scripts.

However,

Python
ORG

is definitely a nicer language for this kind of thing: it has much richer data types, better tools like os.stat , and local variables without quirky syntax.

I consider

AWK
ORG

amazing, but I think it should remain where it excels: for exploratory data analysis and for

one
CARDINAL

-liner data extraction scripts.

As the author of

GoAWK
PERSON

, which has had native

CSV
ORG

support for a while, I’m especially pleased to see both

Kernighan
PERSON

’s “

one
CARDINAL

true

AWK
ORG

” and

Gawk
PERSON

gain proper

CSV
ORG

support in the form of the –csv option.

Kernighan
PERSON

’s

AWK
ORG

updates will be merged soon, and

Gawk
PERSON

will include this feature in version

5.3.0
CARDINAL

, which is coming out soon.

You can also view my awkmake repo on

GitHub
ORG

, which contains the full source for both the

AWK
ORG

book’s Make program and my Python version, as well as a runnable example project based on the example in the

AWK
ORG

book.