TxlProp - Detect Emergent Properties in TXL Programs

Currently detects properties that arise within a single rule, hence aren't very "emergent". Produces pretty displays of TXL programs, with annotations on the crisis points. Uses assertion-propagation (but for now only within single rules) to determine that about half the crisis points are "obviously safe".

Download the whole thing.

Some small example files:

File	Source	Annotated			What it shows
File	Source	Just parsed	With crisis points	With assertions	What it shows
`wacky`	.Txl	.xml	.xml	.xml	Parsing of fancy token-types.
`loopy`	.Txl	.xml	.xml	.xml *	Detection of crisis-points in fancy loops.
`dopey`	.Txl	.xml	.xml	.xml *	Dopey programming errors, automatically detected.

(* means "has interesting crisis-points")

The project:

Some component files for the project as a whole:

File Source Annotated What it does

tp-final .Txl .xml Delete “uninteresting” crisis-points at end of run.

tp-run .sh Run the whole thing. Tie all stages together.

tp-selftest .sh Use tp-run.sh to generate all .xml files shown here.

tp-format .css Shared stylesheet imported by all stages.

tp-basegrammar .Txl Shared grammar of "TXL in XML", used by Stage 2 and beyond.

AVL .Module TXL module for balanced-tree indexing. Written by Jim Cordy; see website.

File	Source	Annotated	What it does
`tp-final`	.Txl	.xml	Delete “uninteresting” crisis-points at end of run.
`tp-run`	.sh		Run the whole thing. Tie all stages together.
`tp-selftest`	.sh		Use `tp-run.sh` to generate all `.xml` files shown here.
`tp-format`	.css		Shared stylesheet imported by all stages.
`tp-basegrammar`	.Txl		Shared grammar of "TXL in XML", used by Stage 2 and beyond.
`AVL`	.Module		TXL module for balanced-tree indexing. Written by Jim Cordy; see website.

STAGE 1: The parser.

Function: Produces XML versions of target TXL programs, with type-annotations on the literals. Comments on the TXL source are mostly carried through. Using the provided stylesheet, the target program is still readable but somewhat cubist in appearance.

Status: A few odds and ends remain unhandled, such as external statements, -escchar '"', and the use of [?rule] for match-only behaviour.

Difficulty: TXL is not the easiest language to parse. Three different tokenizing methods are used for various parts of the same source file (line-oriented, space-separated, and the usual define/rule method). User token-types, quoted literals of user types, context-dependent self-quoting identifiers, and the pragmas for changing the set of characters used in identifiers all contribute to TXL's difficulty, but also to its programming power.

Components of the parser stage:

File Source Annotated What it does

tpp-directives .Txl .xml * Preprocessor #-directives and include statements. Pragmas ⇒ side file.

tpp-lex .Txl .xml * comments, tokens, compounds, and keys statements ⇒ side file.

tpp-toxml .Txl .xml define and rule statements ⇒ XML format. Mark unquoted identifiers. Rename variables defined by deconstruct not.

tpp-xmlfix .Txl .xml * Fix up & ⇒ & and < ⇒ <.

tpp-xmlclean .Txl .xml * Delete extraneous markup emitted by TXL's -xml option. Delete syntactic keywords and punctuation marks; detect whole-line comments.

tpp-merge .Txl .xml Merge pragmas and lex-tokens from side files; annotate uses of keywords.

tpp-common .Txl
Common subroutines.

tpp-run .sh
Tie all of above together.

tpp-format .css
Stylesheet to make output look pretty. Based on tp-format.css but with support for things like <redefine/> that exist only at this stage.

File	Source	Annotated	What it does
`tpp-directives`	.Txl	.xml *	Preprocessor `#`-directives and `include` statements. Pragmas ⇒ side file.
`tpp-lex`	.Txl	.xml *	`comments`, `tokens`, `compounds`, and `keys` statements ⇒ side file.
`tpp-toxml`	.Txl	.xml	`define` and `rule` statements ⇒ XML format. Mark unquoted identifiers. Rename variables defined by `deconstruct not`.
`tpp-xmlfix`	.Txl	.xml *	Fix up & ⇒ `&` and < ⇒ `<`.
`tpp-xmlclean`	.Txl	.xml *	Delete extraneous markup emitted by TXL's `-xml` option. Delete syntactic keywords and punctuation marks; detect whole-line comments.
`tpp-merge`	.Txl	.xml	Merge pragmas and lex-tokens from side files; annotate uses of keywords.
`tpp-common`	.Txl		Common subroutines.
`tpp-run`	.sh		Tie all of above together.
`tpp-format`	.css		Stylesheet to make output look pretty. Based on `tp-format.css` but with support for things like `<redefine/>` that exist only at this stage.

STAGE 2: Mark the crisis points.

Function: Detects "crisis points", which are hot-spots where the program might exhibit unsafe behaviour. Any crisis points not detected in this stage will never be found later, so a missing result here would lead to inconsistency (i.e., TxlProp says your program is safe but actually it isn't).

Status: Done for now. Apparently detects all crisis points in rules, but doesn't detect any in defines—an ambiguous grammar with competing [repeat xxx] clauses can produce infinite parses!

Difficulty: This stage is very easy in TXL, which has no function pointers, signals, function overloading, or pointer aliasing. It seems there are only five or six unsafe things you can do in TXL: infinite parse, infinite rule-recursion (implicit or explicit), writing output files, executing shell commands, and run-time errors. For now the only run-time errors that TxlProp detects are assert statement failures.

Components of the mark-up stage:

File Source Annotated What it does

tpm-grammar .Txl .xml Combine define/redefine, bunch all defines together at top.

tpm-callers .Txl .xml Produce calls-list and callers-list for each rule. Delete rules not reachable from main.

tpm-crisis .Txl .xml Detect loops and other crisis points. Insert annotations for them.

tpm-run .sh
Tie all of above together.

File	Source	Annotated	What it does
`tpm-grammar`	.Txl	.xml	Combine `define`/`redefine`, bunch all defines together at top.
`tpm-callers`	.Txl	.xml	Produce calls-list and callers-list for each rule. Delete rules not reachable from `main`.
`tpm-crisis`	.Txl	.xml	Detect loops and other crisis points. Insert annotations for them.
`tpm-run`	.sh		Tie all of above together.

STAGE 3: Add assertions.

Function: Creates assertions describing what must be true in order for a crisis to occur at a crisis-point. Propagates these to the tops of their rules, taking into account the effects of preceding statements.

Status: Barely works well enough to prove itself correct. Contains only about 70 expression-simplification rules (other systems have over 2000). Contains only about 10 safe-looping paradigms (surely there are many many more). Not handled: recursions where multiple matches are possible on same scope, patterns that require backtracking, [integernumber] and [floatnumber].

Difficulty: This stage depends heavily on TXL's unifier. I've tried writing versions of this program in other languages such as Lisp, but could never get all the bugs out because so much pattern-matching code needed to be written. In TXL my rules often work on the first try!

Documentation: See Appendix A for the notation of assertions, Appendix B for the uses of assertion-splits, and Appendix C for safe-looping paradigms.

Components of the assert stage:

File Source Annotated What it does

tpa-assert .Txl .xml * Generate assertions at crisis-points. See Appendix A for more info.

tpa-migrate .Txl .xml * Migrate assertions to tops of rules.

tpa-construct .Txl .xml * Expand pattern-constructors to fully elaborated form.

tpa-match .Txl .xml * Substitute replacements for patterns in assertions.

tpa-expr .Txl .xml * Simplify assertion-expressions.

tpa-paradigm .Txl .xml * Detect safe-looping paradigms. See Appendix C for more info.

tpa-split .Txl .xml * Split disjunctions into separate timelines. See Appendix B for more info.

tpa-join .Txl .xml Roll up proven and failed timelines.

tpa-cleanup .Txl .xml Delete undecided assertions for decided timelines. Delete typedefs for deleted variables.

tpa-common .Txl
Common subroutines.

tpa-minmax .Txl
Subroutines for determining the value range for an expression.

tpa-run .sh
Tie all of above together. Note: this shell script contains a loop! It calls expr, split, join repeatedly until done.

File	Source	Annotated	What it does
`tpa-assert`	.Txl	.xml *	Generate assertions at crisis-points. See Appendix A for more info.
`tpa-migrate`	.Txl	.xml *	Migrate assertions to tops of rules.
`tpa-construct`	.Txl	.xml *	Expand pattern-constructors to fully elaborated form.
`tpa-match`	.Txl	.xml *	Substitute replacements for patterns in assertions.
`tpa-expr`	.Txl	.xml *	Simplify assertion-expressions.
`tpa-paradigm`	.Txl	.xml *	Detect safe-looping paradigms. See Appendix C for more info.
`tpa-split`	.Txl	.xml *	Split disjunctions into separate timelines. See Appendix B for more info.
`tpa-join`	.Txl	.xml	Roll up proven and failed timelines.
`tpa-cleanup`	.Txl	.xml	Delete undecided assertions for decided timelines. Delete typedefs for deleted variables.
`tpa-common`	.Txl		Common subroutines.
`tpa-minmax`	.Txl		Subroutines for determining the value range for an expression.
`tpa-run`	.sh		Tie all of above together. Note: this shell script contains a loop! It calls `expr, split, join` repeatedly until done.

STAGE 4: Raise Assertions to Callers.

Status: Nothing to show yet.
Foretaste: A previous rev of this document suggested two test cases in tpp-directives.xml: the crisis point labelled T18 (which is actually a serious safety issue with this program) vs. the one called T4 (not really a problem at all). Progress so far:

T4 TxlProp has determined that this crisis-point is unsafe if the From parameter to [quoteXML] is an empty string. It needs to propagate this assertion back to quoteXML's callers, each of which passes a non-empty string literal as the value for From.

T18 TxlProp has determined that this crisis point is unsafe if the scope-string for [testFileExists] contains shell meta-characters. It needs to propagate this assertion all the way back, to determine that nowhere in this program does it ever test the filenames for safety!