University of Calgary
UofC Navigation

Simple Way to Document Code with Markdown, grep, and pandoc

Submitted by Richard Zach on Thu, 05/29/2014 - 2:26pm

Here's a simple way to pretty-print documentation included as comments in a source file (I'm mainly interested in LaTEX code), with or without the intervening code.  It's useful if you don't want to bother with a more complicated solution such as LaTeX's docstrip + ltxdoc.  It uses the ubiquitous bash tools grep and cut (available on Linux and probably (?) on Mac OS) plus John MacFarlane's pandoc , which you might have to install separately.

Include your documentation in the source file as comments, i.e., in LaTeX, on lines which start with %. Make sure they in fact start with "% ", i.e., % followed by a space. Your documentation can use any format pandoc understands, but Markdown is probably the simplest.  Your comments will be easily readable in the source file, but you can include markup, e.g., # headers, *italicized* or **boldface**, `code`, \$math\$, itemized lists, etc. Because your documentation is a regular comment that doesn't have to be stripped for the file to compile, you can use/compile your source file as you ordinarily would without running it through a pre-processor. To get the documentation, you filter out the non-commented lines, removethe comment signs, and run them through pandoc.

First, we have to filter out the non-comment lines from the source file and throw away the rest. This can be done using grep:

    grep -e "^%" -e "^$"

The first filter "^%" matches any line beginning with a %, the second "^$" matches empty lines (so you get paragraph breaks between documentation blocks in the output).

Then you want to strip the initial comment character; in fact, we can just throw away the first two characters of every remaining line, using the cut command

    cut --bytes=3-

The result is a file in Markdown format which you can now run through pandoc to create your favorite output, e.g., HTML or PDF.

    pandoc -f markdown -t html

You can put all of this together into a pipe, or make a bash script, or use it in a Makefile target. For instance, if you save the following as "makedoc"

    #!/bin/bash

    grep -e "^%" -e "^$$" $1.tex | cut --bytes=3-| pandoc -f markdown  -o $1.pdf

you can use "./makedoc <mytexfile>" to produce a PDF of the documentation included in <mytexfile>.tex in <mytexfile>.pdf. (Note the double $$ and make sure you make the file executable, e.g., via chmod u+x makedoc).

To set title, author, and date of the documentation, include them, preceded by an extra "% " in the first three lines of your file (in that order).

The procedure above will filter out the comments and turn them into the documentation, e.g., a user guide or something like that.  You can also set it up to print a documented source by beginning and ending every block of code with a commented "code fence", i.e., a line that contains "% ```" (%, space, three backticks). Then we'll run the file first through sed and tell it to remove all the "% " at the beginning of a line.

    sed "s/^% *//" $1.tex | pandoc -f markdown -o $1.pdf

Extra geek points for the person to come up with a sed command to print any commented line and all lines between code fences but remove code not between code fences!

UPDATE: The extra geek points go to Mark van Atten who sent in this solution using awk:

 awk '/^% ```/ {print substr($0,3,length($0)-2) ; printcode=1-printcode;next } printcode {print;next} /^% / {print substr($0,3,length($0)-2);next} /^[ \t]*$/ {print}'  $1.tex | pandoc -f markdown  -o $1.pdf