The Document AST

Laika decouples the semantics of the various markup formats and those of the supported output formats by representing every document in a generic AST between parsing and rendering. This allows to add custom processing logic only operating on the document structure itself, so it can be used with all supported input and output formats unchanged.

This chapter gives a brief overview over the hierarchy of the main base traits for AST nodes as well as short listings for about 80% of available node types.

Trait Hierarchy

All AST nodes extend one of the base traits from the hierarchy as well as optionally various additional mixins.

Providing a rich collection of traits that assist with classification and optional functionality helps with developing generic processing logic that does not need to know all available concrete types. You can, for example, collectively process all BlockContainer or ListContainer nodes without caring about their concrete implementation.

The traits are not sealed as the model is designed to be extensible.

Base Traits

At the top of the hierarchy the AST contains the following node types:

Base Traits of the Document AST

Containers

Most element types in the AST are containers.

Special Types

The first group of types are traits that can be mixed in to a concrete type:

Finally there is a group of concrete types with special behaviour:

Container Elements

Since the document AST is a recursive structure most elements are either a container of other elements or a TextContainer. The few that are neither are listed in Other Elements.

The following lists describe the majority of the available types, only leaving out some of the more exotic options.

Lists

There are four different list types in the core AST model. All types listed below have a corresponding ListItem container, e.g. BulletList nodes hold BulletListItem children.

Block Containers

Most links and references are also a SpanContainer (for the link text), but a few are not. Span containers that are not links or references are listed in the next section.

The Link types differ from the Reference types in that the former are fully resolved, "ready to render", whereas the latter still need to be resolved based on the surrounding content or other documents.

Reference nodes only appear in the AST before the AST transformation step, where they will be either translated to a corresponding Link node or an Invalid node in case of errors.

For details on this functionality from a markup author's perspective, see Navigation.

Fully resolved Link nodes:

Reference nodes that need to be resolved during AST rewriting:

Span Containers

Text Containers

Other Elements

This section lists the block and span elements that are not containers and the special elements representing templates.

Block Elements

Most block elements are either Span Containers or Block Containers, but a few are neither:

Span Elements

Most span elements are either Span Containers or Text Containers, but a few are neither:

Template Spans

Templates get parsed into an AST in a similar way as markup documents, but the model is much simpler. The most important element types are listed below.

AST Element Companions

Laika includes base traits called BlockContainerCompanion and SpanContainerCompanion for adding companion objects that provide convenient shortcuts for creating containers.

They allow to shorten constructor invocations for simple use cases, e.g. Paragraph(Seq(Text("hello"))) can be changed to Paragraph("hello"). They also add an empty constructor and a vararg constructor for passing the content.

If you create a custom element type you can use these base traits to get all these shortcuts with a few lines:

import laika.ast._

case class MyElement(content: Seq[Block], options: Options = Options.empty) extends Block
    with BlockContainer {
  type Self = MyElement
  def withContent(newContent: Seq[Block]): MyElement = copy(content = newContent)
  def withOptions(options: Options): MyElement       = copy(options = options)
}

object MyElement extends BlockContainerCompanion {
  type ContainerType = MyElement
  override protected def createBlockContainer (blocks: Seq[Block]): ContainerType = 
    MyElement(blocks)
}

Document Trees

So far we dealt with Element types which are used to represent the content of a single markup document or template.

When a transformation of inputs from an entire directory gets processed, the content gets assembled into a DocumentTree, consisting of nested DocumentTree nodes and leaf Document nodes.

The Document Type

This is the signature of the Document class:

import laika.ast.{Path, RootElement, Element, Config, TreePosition, DocumentStructure, TreeContent}

case class Document (
  path: Path,
  content: RootElement,
  fragments: Map[String, Element] = Map.empty,
  config: Config = Config.empty,
  position: TreePosition = TreePosition.orphan
) extends DocumentStructure with TreeContent

The DocumentStructure mixin provides additional methods as shortcuts for selecting content from the document:

The DocumentTree Type

In a multi-input transformation all Document instances get assembled into a recursive structure of DocumentTree instances.

The API provides additional methods as shortcuts for selecting content from the tree:

Cursors

During an AST transformation the rewrite rule might require access to more than just the AST node passed to it. When it is a rule for resolving references or creating a table of contents, it needs access to the AST of other documents, too.

The Cursor type provides this access, but in contrast to the recursive DocumentTree which is a classic tree structure, it represents the tree from the perspective of the current document, with methods to navigate to parents, siblings or the root tree.

An instance of DocumentCursor is passed to rewrite rules for AST transformations (see AST Rewriting, and to directive implementations that request access to it (see Implementing Directives).

Let's look at some of its properties:

AST Transformation Phases

In most transformations the AST moves through three different phases between parsing and rendering:

1) The first shape will be the AST produced by the parsers for text markup or templates. Since parsers do not have access to the surrounding nodes or the configuration, some parsers for elements like links or navigation structures need to insert temporary node types.

2) After parsing of all participating documents and templates completes, the first AST transformation is performed. It resolves links, variables, footnotes, builds the document's section structure or generates a table of contents. These transformations are defined in rewrite rules. The library contains a basic set of rules internally for linking and navigation, but users can provide additional, custom rules. See AST Rewriting for details.

3) Finally the resolved AST representing the markup document is applied to the AST of the template. It is merely the insertion of one AST at a particular node in another AST.

The result obtained from step 3 is then passed to renderers. When rendering the same content to multiple output formats, the steps 1 and 2 are always only executed once. Only step 3 has to be repeated for each output format, as each format comes with its own templates.