The
mdoc functions produce an abstract syntax tree (AST) describing input in a regular form. It may be reviewed at any time with
mdoc_nodes(); however, if called before
mdoc_endparse(), or after
mdoc_endparse() or
mdoc_parseln() fail, it may be incomplete.
This AST is governed by the ontological rules dictated in
mdoc(7) and derives its terminology accordingly. “In-line” elements described in
mdoc(7) are described simply as “elements”.
The AST is composed of
struct mdoc_node nodes with block, head, body, element, root and text types as declared by the
type field. Each node also provides its parse point (the
line,
sec, and
pos fields), its position in the tree (the
parent,
child,
nchild,
next and
prev fields) and some type-specific data, in particular, for nodes generated from macros, the generating macro in the
tok field.
The tree itself is arranged according to the following normal form, where capitalised non-terminals represent nodes.
mnode
← BLOCK | ELEMENT | TEXT
BLOCK
← HEAD [TEXT] (BODY [TEXT])+ [TAIL [TEXT]]
BODY
← mnode* [ENDBODY mnode*]
TEXT
← [[:printable:],0x1e]*
Of note are the TEXT nodes following the HEAD, BODY and TAIL nodes of the BLOCK production: these refer to punctuation marks. Furthermore, although a TEXT node will generally have a non-zero-length string, in the specific case of ‘.Bd -literal', an empty line will produce a zero-length string. Multiple body parts are only found in invocations of ‘Bl -column', where a new body introduces a new phrase.