TxlProp - Detect Emergent Properties in TXL Programs

Currently detects properties that arise within a single rule, hence aren't very "emergent".  Produces pretty displays of TXL programs, with annotations on the crisis points.  Uses assertion-propagation (but for now only within single rules) to determine that about half the crisis points are "obviously safe".

Download the whole thing.

Some small example files:
File Source Annotated What it shows
Just parsed With crisis points With assertions
wacky .Txl .xml .xml .xml Parsing of fancy token-types.
loopy .Txl .xml .xml .xml * Detection of crisis-points in fancy loops.
dopey .Txl .xml .xml .xml * Dopey programming errors, automatically detected.
(* means "has interesting crisis-points")

The project:

Some component files for the project as a whole:
File Source Annotated What it does
tp-final .Txl .xml Delete “uninteresting” crisis-points at end of run.
tp-run .sh Run the whole thing.  Tie all stages together.
tp-selftest .sh Use tp-run.sh to generate all .xml files shown here.
tp-format .css Shared stylesheet imported by all stages.
tp-basegrammar .Txl Shared grammar of "TXL in XML", used by Stage 2 and beyond.
AVL .Module TXL module for balanced-tree indexing.  Written by Jim Cordy; see website.
 

STAGE 1: The parser.

Function:  Produces XML versions of target TXL programs, with type-annotations on the literals.  Comments on the TXL source are mostly carried through.  Using the provided stylesheet, the target program is still readable but somewhat cubist in appearance.
 
Status:  A few odds and ends remain unhandled, such as external statements, -escchar '"', and the use of [?rule] for match-only behaviour.
 
Difficulty:  TXL is not the easiest language to parse.  Three different tokenizing methods are used for various parts of the same source file (line-oriented, space-separated, and the usual define/rule method).  User token-types, quoted literals of user types, context-dependent self-quoting identifiers, and the pragmas for changing the set of characters used in identifiers all contribute to TXL's difficulty, but also to its programming power.
 
Components of the parser stage:
File Source Annotated What it does
tpp-directives .Txl .xml * Preprocessor #-directives and include statements.  Pragmas ⇒ side file.
tpp-lex .Txl .xml * comments, tokens, compounds, and keys statements ⇒ side file.
tpp-toxml .Txl .xml define and rule statements ⇒ XML format.  Mark unquoted identifiers.  Rename variables defined by deconstruct not.
tpp-xmlfix .Txl .xml * Fix up &&amp; and <&lt;.
tpp-xmlclean .Txl .xml * Delete extraneous markup emitted by TXL's -xml option.  Delete syntactic keywords and punctuation marks; detect whole-line comments.
tpp-merge .Txl .xml Merge pragmas and lex-tokens from side files; annotate uses of keywords.
tpp-common .Txl
Common subroutines.
tpp-run .sh
Tie all of above together.
tpp-format .css
Stylesheet to make output look pretty.  Based on tp-format.css but with support for things like <redefine/> that exist only at this stage.
 

STAGE 2: Mark the crisis points.

Function: Detects "crisis points", which are hot-spots where the program might exhibit unsafe behaviour.  Any crisis points not detected in this stage will never be found later, so a missing result here would lead to inconsistency (i.e., TxlProp says your program is safe but actually it isn't).
 
Status: Done for now.  Apparently detects all crisis points in rules, but doesn't detect any in defines—an ambiguous grammar with competing [repeat xxx] clauses can produce infinite parses!
 
Difficulty: This stage is very easy in TXL, which has no function pointers, signals, function overloading, or pointer aliasing.  It seems there are only five or six unsafe things you can do in TXL: infinite parse, infinite rule-recursion (implicit or explicit), writing output files, executing shell commands, and run-time errors.  For now the only run-time errors that TxlProp detects are assert statement failures.
 
Components of the mark-up stage:
File Source Annotated What it does
tpm-grammar .Txl .xml Combine define/redefine, bunch all defines together at top.
tpm-callers .Txl .xml Produce calls-list and callers-list for each rule.  Delete rules not reachable from main.
tpm-crisis .Txl .xml Detect loops and other crisis points.  Insert annotations for them.
tpm-run .sh
Tie all of above together.
 

STAGE 3: Add assertions.

Function: Creates assertions describing what must be true in order for a crisis to occur at a crisis-point.  Propagates these to the tops of their rules, taking into account the effects of preceding statements.
 
Status: Barely works well enough to prove itself correct.  Contains only about 70 expression-simplification rules (other systems have over 2000).  Contains only about 10 safe-looping paradigms (surely there are many many more).  Not handled: recursions where multiple matches are possible on same scope, patterns that require backtracking, [integernumber] and [floatnumber].
 
Difficulty: This stage depends heavily on TXL's unifier.  I've tried writing versions of this program in other languages such as Lisp, but could never get all the bugs out because so much pattern-matching code needed to be written.  In TXL my rules often work on the first try!
 
Documentation: See Appendix A for the notation of assertions, Appendix B for the uses of assertion-splits, and Appendix C for safe-looping paradigms.
 
Components of the assert stage:
File Source Annotated What it does
tpa-assert .Txl .xml * Generate assertions at crisis-points.  See Appendix A for more info.
tpa-migrate .Txl .xml * Migrate assertions to tops of rules.
tpa-construct .Txl .xml * Expand pattern-constructors to fully elaborated form.
tpa-match .Txl .xml * Substitute replacements for patterns in assertions.
tpa-expr .Txl .xml * Simplify assertion-expressions.
tpa-paradigm .Txl .xml * Detect safe-looping paradigms.  See Appendix C for more info.
tpa-split .Txl .xml * Split disjunctions into separate timelines.  See Appendix B for more info.
tpa-join .Txl .xml Roll up proven and failed timelines.
tpa-cleanup .Txl .xml Delete undecided assertions for decided timelines.  Delete typedefs for deleted variables.
tpa-common .Txl
Common subroutines.
tpa-minmax .Txl
Subroutines for determining the value range for an expression.
tpa-run .sh
Tie all of above together.  Note: this shell script contains a loop!  It calls expr, split, join repeatedly until done.
 

STAGE 4: Raise Assertions to Callers.

Status: Nothing to show yet.
Foretaste: A previous rev of this document suggested two test cases in tpp-directives.xml: the crisis point labelled T18 (which is actually a serious safety issue with this program) vs. the one called T4 (not really a problem at all).  Progress so far:

©2005 by Jonathan Yavner.  AVL module ©1999 by Jim Cordy.