The AWK book’s 60-line version of Make

Created on November 12, 2023 at 11:23 am

The AWK ORG book’s 60 CARDINAL -line version of Make

September 2023 DATE

In the wonderful book The AWK Programming Language WORK_OF_ART by Aho, Weinberger PERSON , and Kernighan PERSON , there are a few pages at the end of chapter 7 LAW that present a simplified version of the Make utility – written in a single page of AWK ORG code.

Before we look at that, I want to mention that the second ORDINAL edition of the AWK ORG book is coming out next month DATE . Brian Kernighan PERSON ’s done a great job of updating it, most notably with a new chapter on exploratory data analysis, and adding proper CSV ORG support to AWK ORG to enable this. I was honoured to be asked to review a draft of the second ORDINAL edition.

AWK ORG still shines for exploring data in 2023 DATE , especially with the new –csv option. CSV mode has also been added to Gawk PERSON (GNU AWK), the most widely-installed version of AWK ORG . My own GoAWK PERSON implementation has had proper CSV ORG support for some time, and I’ve added the –csv option to match the others.

The second ORDINAL edition of the book still includes the Make program, though it’s been made more readable with the addition of some “spacing and bracing” – this took it from 50 CARDINAL lines to 62 CARDINAL lines.

This article presents the Make program, to show how AWK ORG is not just great for one CARDINAL -liners, but can be used as a scripting language too – though whether you should or not is another question.

I’m then going to compare what the same program would look like in Python ORG , and briefly discuss when you’d choose AWK ORG or Python for this kind of thing.

It should go without saying, but I intend this purely as a learning exercise (for me and my readers), not a program I’d recommend you use to build your projects!

Original AWK ORG version

The second ORDINAL edition of the book introduces the Make program as follows. (For what it’s worth, I find the term “target” confusing here – I think “source” or “dependency” would fit better.)

This section develops a rudimentary updating program, patterned after the Unix make command, that is based on the depth- first ORDINAL search technique of the previous section. To use the updater, one must explicitly describe what the components of the system are, how they depend upon one another, and what commands are needed to construct them. We’ll assume these dependencies and commands are stored in a file, called a makefile , that contains a sequence of rules of the form name: t1 t2 … tn commands The first ORDINAL line of a rule is a dependency relation that states that the program or file name depends on the targets t1, t2, …, tn where each ti is a filename or another name. Following each dependency relation may be one CARDINAL or more lines of commands that list the commands necessary to generate name. Here is an example of a makefile for a small program with two CARDINAL C files called a.c PRODUCT and b.c PERSON , and a yacc grammar file c.y , a typical program-development application. prog: a.o b.o c.o gcc PERSON a.o b.o c.o -ly -o prog a.o: prog.h a.c gcc -c prog.h a.c b.o: prog.h b.c gcc -c prog.h b.c c.o: c.c gcc -c c.c c.c: c.y yacc c.y mv y.tab.c c.c print: pr prog.h a.c b.c c.y The first ORDINAL line states that prog depends on the target files a.o , b.o , ORG and c.o PERSON . The second ORDINAL line says that prog is generated by using the C compiler command gcc to link a.o , b.o , c.o ORG , and a yacc GPE library y into the file prog . The next rule ( third ORDINAL line) states that a.o depends on the targets prog.h and a.c ORG and is created by compiling these targets; b.o is the same. The file c.o depends on c.c , which in turn depends on c.y , which has to be processed by the yacc ORG parser generator. Finally, the name print does not depend on any target; by convention, for targetless names make will always perform the associated action, in this case printing all the source files with the command pr . The dependency relations in the makefile can be represented by a graph in which there is an edge from node x to node y whenever there is a dependency rule with x on the left side and y one of the targets on the right. For a rule with no targets, a successorless node with the name on the left is created. For the makefile above, we have the following dependency graph: prog print / | \ / | \ a.o b.o c.o / \ / \ \ / \/ \ \ a.c prog.h ORG b.c c.c | c.y

It’s a highly-simplified version of Make, of course, but still has the core concepts of outputs, dependencies, and build commands.

Before we look at how it works, I’ve included the full source code below, as it appears in the second ORDINAL edition of the AWK ORG book. Click on the bold text to expand it, or skip down to “How it works” to see the code explained in detail.

The AWK ORG book’s Make program (full source code). BEGIN { while ( getline < "makefile" > 0 ) { if ( $ 0 MONEY ~ /^ [ A-Za-z PERSON ] / ) { # $1 MONEY : $2 $ MONEY 3 … sub ( /:/ , "" ) if ( ++ names [ nm = $ 1 MONEY ] > 1 CARDINAL ) error ( nm " is multiply defined" ) for ( i = 2 CARDINAL ; i <= NF ; i ++ ) # remember targets slist [ nm , ++ scnt [ nm ]] = $i } else if ( $ 0 MONEY ~ /^ \t / ) { # remember cmd for cmd [ nm ] = cmd [ nm ] $ 0 MONEY "

" # current name } else if ( NF > 0 ) { error ( "illegal line in makefile: " $ 0 MONEY ) } } ages () # compute initial ages if ( ARGV [ 1 CARDINAL ] in names ) { if ( update ( ARGV [ 1 CARDINAL ]) == 0 ) print ARGV [ 1 CARDINAL ] " is up to date" } else { error ( ARGV [ 1 CARDINAL ] " is not in makefile" ) } } function ages ( f , n , t ) { for ( t = 1 CARDINAL ; ( "ls -t" | getline f ) > 0 CARDINAL ; t ++ ) age [ f ] = t # all existing files get an age close ( "ls -t" ) for ( n in names ) if ( ! ( n in age )) # if n has not been created age [ n ] = 9999 # make n really old } function update ( n , changed , i , s ) { if ( ! ( n in age )) error ( n " does not exist" ) if ( ! ( n in names )) return 0 CARDINAL changed = 0 visited [ n ] = 1 CARDINAL for ( i = 1 CARDINAL ; i <= scnt [ n ]; i ++ ) { if ( visited [ s = slist [ n , i ]] == 0 ) update ( s ) else if ( visited [ s ] == 1 CARDINAL ) error ( s " and " n " are circularly defined" ) if ( age [ s ] <= age [ n ]) changed ++ } visited [ n ] = 2 CARDINAL if ( changed || scnt [ n ] == 0 ) { printf ( "%s" , cmd [ n ]) system ( cmd [ n ]) # execute cmd associated with n ages ORG () # recompute all ages age [ n ] = 0 # make n very new return 1 CARDINAL } return 0 CARDINAL } function error ( s ) { print "error: " s ; exit }

How it works

There’s an explanation of how the program works in the book, but I’ll explain it in my own words here, focussing on the aspects I find interesting.

The BEGIN block is the main entry point for a program like this. Unlike most AWK ORG programs which implicitly read lines from standard input, this one uses an explicit loop with getline to read the makefile :

BEGIN { while ( getline < "makefile" > 0 ) { if ( $ 0 MONEY ~ /^ [ A-Za-z PERSON ] / ) { # $1 MONEY : $2 $ MONEY 3 … sub ( /:/ , "" ) if ( ++ names [ nm = $ 1 MONEY ] > 1 CARDINAL ) error ( nm " is multiply defined" ) for ( i = 2 CARDINAL ; i <= NF ; i ++ ) # remember targets slist [ nm , ++ scnt [ nm ]] = $i } else if ( $ 0 MONEY ~ /^ \t / ) { # remember cmd for cmd [ nm ] = cmd [ nm ] $ 0 MONEY "

" # current name } else if ( NF > 0 ) { error ( "illegal line in makefile: " $ 0 MONEY ) } } … }

The getline <filename is a redirect clause that opens makefile (the first ORDINAL time) and reads it line-by-line until the end. If the line ( $ 0 MONEY ) starts with a letter ( /^[A-Za-z]/ PERSON ), it’s considered a name: targets rule.

The sub(/:/, "") call removes the colon from the current line (the $ 0 MONEY is implicit in the two CARDINAL -argument form of sub ).

We then ensure that this rule hasn’t already been defined by checking the names array. An AWK ORG array is actually an associative array, an old-school term for a key-value map.

The inner for loop adds each target (or dependency) to the slist / scnt ORG data structure. This is really a map of lists, but it’s flattened to work around the fact that AWK ORG doesn’t support nested collections. The body of the loop is very terse:

for ( i = 2 CARDINAL ; i <= NF ; i ++ ) slist [ nm , ++ scnt [ nm ]] = $i

This loops through each dependency: every field $i from field 2 CARDINAL to NF (the number of fields in the line).

For each dependency, it increments scnt[nm] , the count of sources for the current rule ( nm ). Then, store the dependency $i in slist , indexed by the multi-key name and count. AWK ORG simulates multi-dimensional or multi-key arrays by creating a concatenated key where each key is separated by the SUBSEP ORG separator (which defaults to " \x1c PERSON " ).

After the loop, in the prog example we’d end up with slist and scnt ORG looking like this:

slist a.o,1: prog.h a.o,2: a.c ORG b.o,1: prog.h b.o,2: b.c c.c,1: c.y c.o,1: c.c prog,1: a.o prog,2: b.o prog,3: c.o scnt a.o: PERSON

2 CARDINAL b.o: 2 c.c QUANTITY : 1 CARDINAL c.o: 1 CARDINAL prog: 3

CARDINAL Coming back up, if the line starts with a tab, it’s a command, so we append it to the name’s command string:

cmd [ nm ] = cmd [ nm ] $ 0 MONEY "

"

Otherwise, if the line is not a blank line ( NF > 0 ), it’s a makefile error.

Finally, after reading the makefile in the while loop, we uses ages() to compute the ages of all files in the current directory, and then call update(ARGV[1]) to update the rule passed on the command line:

BEGIN { … ages () # compute initial ages if ( ARGV [ 1 CARDINAL ] in names ) { if ( update ( ARGV [ 1 CARDINAL ]) == 0 ) print ARGV [ 1 CARDINAL ] " is up to date" } else { error ( ARGV [ 1 CARDINAL ] " is not in makefile" ) } }

The ages function is where things start to get interesting:

function ages ( f , n , t ) { for ( t = 1 CARDINAL ; ( "ls -t" | getline f ) > 0 CARDINAL ; t ++ ) age [ f ] = t # all existing files get an age close ( "ls -t" ) for ( n in names ) if ( ! ( n in age )) # if n has not been created age [ n ] = 9999 # make n really old }

The parameter names f , n , and t are prefixed with a bunch of spaces to show they’re actually local variables, and not expected as arguments. This is an AWK ORG quirk (which Kernighan PERSON regrets): the only way to define local variables is as function parameters, and if a function is called with fewer arguments than it has parameters, the extras take on the default value ( 0 CARDINAL for numbers, "" for strings). So you’ll see these extra spaces a lot in AWK ORG function definitions.

The next thing is quite neat: AWK ORG supports shell-like | syntax to pipe the output of a program, one CARDINAL getline at a time, to a variable (in this case f ). The ls -t command lists files in the current directory ordered by modification time, newest first ORDINAL .

After the loop that’s assigned each file’s age to age[f] , we call close to close the ls -t pipe and avoid too many open file handles.

Finally, we loop through the rule names and assign an arbitrary large number to age[n] to pretend that files that haven’t been created are really old and need to be updated.

Next ORG is the recursive update function, where the meat of the algorithm lives:

function update ( n , changed , i , s ) { if ( ! ( n in age )) error ( n " does not exist" ) if ( ! ( n in names )) return 0 CARDINAL changed = 0 visited [ n ] = 1 CARDINAL for ( i = 1 CARDINAL ; i <= scnt [ n ]; i ++ ) { if ( visited [ s = slist [ n , i ]] == 0 ) update ( s ) else if ( visited [ s ] == 1 CARDINAL ) error ( s " and " n " are circularly defined" ) if ( age [ s ] <= age [ n ]) changed ++ } visited [ n ] = 2 CARDINAL if ( changed || scnt [ n ] == 0 ) { printf ( "%s" , cmd [ n ]) system ( cmd [ n ]) # execute cmd associated with n ages ORG () # recompute all ages age [ n ] = 0 # make n very new return 1 CARDINAL } return 0 CARDINAL }

Once again you’ll note the parameter list: n is an expected argument (the name to update), and changed,i,s are the locals.

After initial checks, we loop through the list of dependencies by iterating from slist[n, 1 CARDINAL ] to slist[n, scnt[n]] . If we haven’t visited this dependency yet, we perform a depth- first ORDINAL traversal of the dependency graph by recursively calling update to see if we need to update that dependency first ORDINAL :

if ( visited [ s = slist [ n , i ]] == 0 ) update ( s )

The recursion is terminated by the if (!(n in names)) return 0 CARDINAL block near the top. We stop when the file being updated isn’t in the list of rule names – which is a leaf node in the dependency graph.

The block if (age[s] <= age[n]) changed++ increments the changed count if any dependency is newer than the age of the current file being updated.

After the traversal loop, if any of the dependencies or sub-dependencies had changed, we run the associated command using system() , recompute the ages of all files, and return 1 CARDINAL to the caller to indicate we did make an update.

The scnt[n] == 0 clause handles the case where the rule being updated doesn’t have any dependencies specified, like the print rule in the example. In that case, always re-run its command.

And there you have it! A minimalist Make in one CARDINAL page of AWK ORG .

Python version

For interest, I ported the book’s AWK ORG Make to Python, and have included it below. Once again, click the bold text to expand the program.

My Python port of the Make program (full source code). import os , re , sys slist = {} # slist[name] is list of rule’s sources cmd = {} # cmd[name] is shell command to run for rule def main (): for line in open ( ‘makefile’ ): if re . match ( ‘[ A-Za-z WORK_OF_ART ]’ , line ): line = line . replace ( ‘:’ , ” ) fields = line . split () nm = fields [ 0 ] if nm in slist : error ( f ‘ { nm } is multiply defined’ ) slist [ nm ] = fields [ 1 CARDINAL :] # remember targets elif line . startswith ORG ( ‘ \t ‘ ): # remember cmd for current name cmd [ nm ] = cmd . get ( nm , ” ) + line elif line . strip (): error ( f ‘illegal line in makefile: { line } ‘ ) if sys . argv [ 1 CARDINAL ] in slist : if not update ( sys . argv [ 1 CARDINAL ]): print ( sys . argv [ 1 CARDINAL ], ‘is up to date’ ) else : error ( f ‘ { sys . argv [ 1 CARDINAL ] } is not in makefile’ ) def mtime ( n ): try : return os . stat ( n ). st_mtime except FileNotFoundError : return 0 # mark MONEY as old if it doesn’t exist def update ( n , visited = {}): ntime = mtime ( n ORG ) if n not in slist and ntime PERSON == 0 : error ( f ‘ { n } does not exist’ ) if n not in slist : return 0 CARDINAL changed = False visited [ n ] = 1 CARDINAL for s in slist . get ( n , []): if s not in visited : update ( s ) elif visited [ s ] == 1 CARDINAL : error ( f ‘ { s } and { n } are circularly defined’ ) if mtime ( s ) ORG > ntime : changed = True visited [ n ] = 2 CARDINAL if changed or len ( slist . get ( n , [])) == 0 : print ( cmd [ n ], end = ” ) os . system PERSON ( cmd [ n ]) # execute cmd associated with n return 1 CARDINAL return 0 def error ( msg ): print ( ‘error:’ , msg , file = sys . stderr ) sys . exit ( 1 CARDINAL ) if __name__ == ‘__main__’ : main ()

It’s very similar in structure to the original AWK ORG version, though I made two CARDINAL simplifications which I think make it somewhat easier to understand:

Simpler data structures to avoid the slist / scnt quirkiness PERSON – in Python we can just use a dictionary of lists. (See diff.) Determine ages more directly using os.stat() to fetch file modification times (mtimes), rather than using the ls -t trick. This also removes the need for the age map and the ages function. (See diff.)

I didn’t plan for this, but even if you include the import line and the if __name__ == ‘__main__’ dance, it’s 58 CARDINAL lines of code – basically the same length as the AWK ORG program.

When making the Python ORG version, I realized we could simplify the AWK ORG version in a similar way:

It’s conceptually simpler to store the slist directly as an AWK ORG array: a key-value map where the key is the rule name and the value is the list of dependencies as a space-separated string (just like in the makefile ). We can use split as needed to turn the dependencies string into a list (an array from 1 CARDINAL to the number of dependencies). This avoids the need for scnt GPE and names altogether. (See diff.) Similar to the Python ORG version, we can get the mtime directly by shelling out to stat , instead of listing all files in age order with ls -t . I’ve used stat –format %y to do this. I believe this is a GNU ORG extension, so it’s not as portable as ls -t , but it’s simpler and avoids the need for recomputing the age array. (See diff.)

Update: Volodymyr Gubarkov PERSON pointed out that the stat version “adds multiple external process invocations”, and he’s quite right. It might be more direct, but it is significantly slower.

For what it’s worth, the modified version is four CARDINAL lines shorter than the original. I think the simpler slist is clearer, and I like the more direct approach to fetching mtimes, though I realize the lack of portability of stat –format is a downside (macOS’s stat looks quite different).

Conclusion

The AWK ORG Make program is a neat little piece of code that shows how useful a language AWK ORG is, even for medium-sized scripts.

However, Python ORG is definitely a nicer language for this kind of thing: it has much richer data types, better tools like os.stat , and local variables without quirky syntax.

I consider AWK ORG amazing, but I think it should remain where it excels: for exploratory data analysis and for one CARDINAL -liner data extraction scripts.

As the author of GoAWK PERSON , which has had native CSV ORG support for a while, I’m especially pleased to see both Kernighan PERSON ’s “ one CARDINAL true AWK ORG ” and Gawk PERSON gain proper CSV ORG support in the form of the –csv option. Kernighan PERSON ’s AWK ORG updates will be merged soon, and Gawk PERSON will include this feature in version 5.3.0 CARDINAL , which is coming out soon.

You can also view my awkmake repo on GitHub ORG , which contains the full source for both the AWK ORG book’s Make program and my Python version, as well as a runnable example project based on the example in the AWK ORG book.

Connecting to blog.lzomedia.com... Connected... Page load complete