Thomas Tuerk

Advanced Documentation And Testing Tool

I'm currently developing a tool for advanced testing and documentation in my spare time. This is still work in progress. However, here some preliminary information can already been found. If you are interested in learning details or even in playing around with very early, unfinished versions, please feel free to contact me.

Presentation at ARW 2018

Some talk and corresponding short abstract from early stages in April 2018 are available.

Very high level description

Advanced Documentation and Testing Tool (Adatt) is a tool for writing formal, executable specifications. It is intended to be used by both domain and formal-method experts. Different users with different backgrounds can contribute different parts of a specification. One core idea is using the same Adatt documents for multiple purposes and allowing different users with different backgrounds to contribute to different parts of a specification.

Superficially, Adatt looks like a functional programming language with special support for writing documentation and specifications. Once finished, one will be able to translate Adatt specifications to

high quality human readable documentation (Latex, HTML)
executable code (Ocaml, SML, Haskell, Javascript, ...)
formal specifications for interactive theorem provers (HOL 4, Isabelle/HOL, Coq)

One will be able to start with writing human readable documentation without any formal content. Declaring types or constants (including functions), which are referred to in the natural language text, can improve the documentation by e.g. ensuring there are no spelling mistakes. The documentation can be further improved by adding some type definitions, constant signatures or simple test-cases.

At this stage, the documention / specification is perhaps already detailed enough to generate little bits of executable code or at least code stubs. Such code can be used e.g. for test-oracles. By adding more test cases and simple executable specifications the generated executable code becomes more complete and thereby useful.

Even if not all types and constants are defined yet, one can already use formal methods to reason about the existing specifications and test-cases. However, by adding non-executable properties and by asking theorem prover experts to add more complicated specifications in the end a good formal specification can be written that maps well to theories already present in various interactive theorem provers.

Current Status (May 2021)

The main language of Adatt and its typechecking are finished. Specifications that look similar to a functional program can be written in this language, parsed and checked. There is a Language Server Protocol server for Adatt and a Visual Studio Code plugin using this server. While editing files, only the affected parts of the currently opened files are (re)-typechecked. Unaffected parts already typechecked are just kept. Together with a concurrent implementation that can use multiple cores effectively, this leads to very rapid feedback (usually within a few milliseconds).

Typechecking might sound like a very early stage. Notice however, that in constrast to nearly all compilers for programming languages, Adatt outputs results at the same level of abstraction as the input. Therefore, mst of the work that needs to be performed by Adatt is already performed during the typechecking stage. This includes:

pattern compilation, in particular
- checking whether pattern matches are complete
- find overlapping or redundant patterns
- automatically add missing cases
resolving of type-class intances
a built-in template language for generating Adatt code based on e.g. type definitions
- mainly used for automatically generating type-class instances
- also used for generating recognizers and lenses
- intended to be used later for backends as well
producing auxiliary information about the analysed code
- annotations like the type of locally used variables
- automatic code fixes (i.e. automatic edits like adding a declaration of a constant)
- links like jumping from an usage point to the definition point of a constant
enforce user-configurable code styles
- naming conventions
- detection of similar and therefore potentially problematic names for different concepts
build-up detailled internal datastructures suitable for producing output later

The output needs to be highly configurable. For human readable output like html, the preferred style depends a lot on the intended usage and also personal taste. For output of executable code, programming languages and especially libraries evolve over time. Moreover, if the generated code is used by hand-written code, fine tuning the generated code might be very useful to ease usage and integrating in existing code-bases. These same issues also exist for interactive theorem provers. In this (as in many other) respect they are similar to programming languages.

To provide this needed flexibility, the output is generated by user-changeable LUA-scripts. The simplest case is the output of human readable documentation. In this simple case, Adatt loads a project and executes user-written LUA scripts in this context of this project. There are commands for typechecking files and modules available. One can check for errors and warnings and report these. One can also get information about the checked files and modules. This includes information like the list of seen files or modules, the constants, types, type-classes etc. One can access information like where a constant is defined or used or which instances are known for a type-class. Most importantly, one can access the input (both whole files and just parts) enriched with annotations collected during type-checking. This enriched input is the main source for generating human readable output. Adatt comes with specialised LUA libraries which allow to easily access all this functionality of Adatt. Moreover, there are example LUA scripts that use this libray to e.g. generate HTML output. These example scripts an be used as they are or as a template for writing own LUA-scripts.

The support for generating human readable output is essentially finished. A few convenience functions could still be added and the output tuned a bit, but the main functionality is all there.

Outputting executable code or specifications is more difficult and unluckily still work in progress. For these cases, information needs to be passed from LUA to Adatt when processing a project. It should be possible to e.g. state inside an Adatt module, how to output a given Adatt constant for a given backend. This can be specified using Adatt's template language. More interesting, it should be possible to add backend-specific annotations. For example, most theorem proving backends need to prove that recursive constants are well-defined (which in practice is very similar to a termination argument) while defining such constants. A backend-specific annotation might either state that some built-in termination-proof-method should be tried or provide a termination argument manually. It should be possible to add such backend-specific annotations in Adatt files next to the definition of the recursive constant. To support this, Lua-written backends need to pass information to Adatt before even parsing files.

Another complication is that programming language and theorem prover backends need to be able to ask Adatt to perform inlining, (partial) pattern compilation and other code-operations. Thus, these backends require much more infrastructure than human readable documentation ones. Currently, I'm working on implementing this infrastructure.

Rough plan for next steps

continue implementing infrastructure for theorem proving and programming language backends
implement SML, HOL and Coq backends
add annotations to Adatt library for mapping Adatt definitions to commonly used backend libraries
play around with some small, very simple examples
consider releasing a very first, alpha version of Adatt
implement some bigger, more interesting example
add further backends (Isabelle/HOL, OCaml, Haskell, Java-Script, ...)
drastically improve support for writing natural language documentation
- more work on comments in Adatt, e.g.
  - links to Adatt entities
  - allow subset of markdown in comments
  - allow special comments with Latex contents
  - like Haddock associate certain comments with types, constants, fields, arguments, ...
- add additional input file format which supports some kind of literal programming style
add additional features like perhaps
- inductive relations
- type-invariants
- indexed types
- ...

Some documents

So far, I have been pretty lazy writing documentation. Here are some notes describing mostly some language features and the reasoning behind them. This needs a lot(!) of work before one can even think of calling it documentation. Nevertheless, it might already provide a rough idea about Adatt. For the same reason, the pretty-printed library of Adatt is provided. The pretty-printing is done using Adatt itself via a LUA-written HTML and Latex backends. Adatt's library still needs lots of work. However, it already provides an impression, how Adatt code looks like.