University of Calgary
UofC Navigation

Simple Way to Document Code with Markdown, grep, and pandoc

LogBlog Has Moved!

You are looking at the old blog archive. LogBlog has moved to richardzach.org.
If you'd like to receive updates on new posts, please subscribe there!

Submitted by Richard Zach on Thu, 05/29/2014 - 2:26pm

Here's a simple way to pretty-print documentation included as comments in a source file (I'm mainly interested in LaTEX code), with or without the intervening code.  It's useful if you don't want to bother with a more complicated solution such as LaTeX's docstrip + ltxdoc.  It uses the ubiquitous bash tools grep and cut (available on Linux and probably (?) on Mac OS) plus John MacFarlane's pandoc , which you might have to install separately.

Include your documentation in the source file as comments, i.e., in LaTeX, on lines which start with %. Make sure they in fact start with "% ", i.e., % followed by a space. Your documentation can use any format pandoc understands, but Markdown is probably the simplest.  Your comments will be easily readable in the source file, but you can include markup, e.g., # headers, *italicized* or **boldface**, `code`, \$math\$, itemized lists, etc. Because your documentation is a regular comment that doesn't have to be stripped for the file to compile, you can use/compile your source file as you ordinarily would without running it through a pre-processor. To get the documentation, you filter out the non-commented lines, removethe comment signs, and run them through pandoc.

First, we have to filter out the non-comment lines from the source file and throw away the rest. This can be done using grep:

    grep -e "^%" -e "^$"

The first filter "^%" matches any line beginning with a %, the second "^$" matches empty lines (so you get paragraph breaks between documentation blocks in the output).

Then you want to strip the initial comment character; in fact, we can just throw away the first two characters of every remaining line, using the cut command

    cut --bytes=3-

The result is a file in Markdown format which you can now run through pandoc to create your favorite output, e.g., HTML or PDF.

    pandoc -f markdown -t html

You can put all of this together into a pipe, or make a bash script, or use it in a Makefile target. For instance, if you save the following as "makedoc"

    #!/bin/bash

    grep -e "^%" -e "^$$" $1.tex | cut --bytes=3-| pandoc -f markdown  -o $1.pdf

you can use "./makedoc <mytexfile>" to produce a PDF of the documentation included in <mytexfile>.tex in <mytexfile>.pdf. (Note the double $$ and make sure you make the file executable, e.g., via chmod u+x makedoc).

To set title, author, and date of the documentation, include them, preceded by an extra "% " in the first three lines of your file (in that order).

The procedure above will filter out the comments and turn them into the documentation, e.g., a user guide or something like that.  You can also set it up to print a documented source by beginning and ending every block of code with a commented "code fence", i.e., a line that contains "% ```" (%, space, three backticks). Then we'll run the file first through sed and tell it to remove all the "% " at the beginning of a line.

    sed "s/^% *//" $1.tex | pandoc -f markdown -o $1.pdf

Extra geek points for the person to come up with a sed command to print any commented line and all lines between code fences but remove code not between code fences!

UPDATE: The extra geek points go to Mark van Atten who sent in this solution using awk:

 awk '/^% ```/ {print substr($0,3,length($0)-2) ; printcode=1-printcode;next } printcode {print;next} /^% / {print substr($0,3,length($0)-2);next} /^[ \t]*$/ {print}'  $1.tex | pandoc -f markdown  -o $1.pdf