File | Source | Annotated | What it shows | ||
---|---|---|---|---|---|
Just parsed | With crisis points | With assertions | |||
wacky | .Txl | .xml | .xml | .xml | Parsing of fancy token-types. |
loopy | .Txl | .xml | .xml | .xml * | Detection of crisis-points in fancy loops. |
dopey | .Txl | .xml | .xml | .xml * | Dopey programming errors, automatically detected. |
Some component files for the project as a whole:
File Source Annotated What it does tp-final .Txl .xml Delete “uninteresting” crisis-points at end of run. tp-run .sh Run the whole thing. Tie all stages together. tp-selftest .sh Use tp-run.sh to generate all .xml files shown here. tp-format .css Shared stylesheet imported by all stages. tp-basegrammar .Txl Shared grammar of "TXL in XML", used by Stage 2 and beyond. AVL .Module TXL module for balanced-tree indexing. Written by Jim Cordy; see website.
Function: Produces XML versions of target TXL programs, with type-annotations on the literals. Comments on the TXL source are mostly carried through. Using the provided stylesheet, the target program is still readable but somewhat cubist in appearance.Status: A few odds and ends remain unhandled, such as external statements, -escchar '"', and the use of [?rule] for match-only behaviour.Difficulty: TXL is not the easiest language to parse. Three different tokenizing methods are used for various parts of the same source file (line-oriented, space-separated, and the usual define/rule method). User token-types, quoted literals of user types, context-dependent self-quoting identifiers, and the pragmas for changing the set of characters used in identifiers all contribute to TXL's difficulty, but also to its programming power.Components of the parser stage:
File Source Annotated What it does tpp-directives .Txl .xml * Preprocessor #-directives and include statements. Pragmas ⇒ side file. tpp-lex .Txl .xml * comments, tokens, compounds, and keys statements ⇒ side file. tpp-toxml .Txl .xml define and rule statements ⇒ XML format. Mark unquoted identifiers. Rename variables defined by deconstruct not. tpp-xmlfix .Txl .xml * Fix up & ⇒ & and < ⇒ <. tpp-xmlclean .Txl .xml * Delete extraneous markup emitted by TXL's -xml option. Delete syntactic keywords and punctuation marks; detect whole-line comments. tpp-merge .Txl .xml Merge pragmas and lex-tokens from side files; annotate uses of keywords. tpp-common .Txl Common subroutines. tpp-run .sh Tie all of above together. tpp-format .css Stylesheet to make output look pretty. Based on tp-format.css but with support for things like <redefine/> that exist only at this stage.
Function: Detects "crisis points", which are hot-spots where the program might exhibit unsafe behaviour. Any crisis points not detected in this stage will never be found later, so a missing result here would lead to inconsistency (i.e., TxlProp says your program is safe but actually it isn't).Status: Done for now. Apparently detects all crisis points in rules, but doesn't detect any in defines—an ambiguous grammar with competing [repeat xxx] clauses can produce infinite parses!Difficulty: This stage is very easy in TXL, which has no function pointers, signals, function overloading, or pointer aliasing. It seems there are only five or six unsafe things you can do in TXL: infinite parse, infinite rule-recursion (implicit or explicit), writing output files, executing shell commands, and run-time errors. For now the only run-time errors that TxlProp detects are assert statement failures.Components of the mark-up stage:
File Source Annotated What it does tpm-grammar .Txl .xml Combine define/redefine, bunch all defines together at top. tpm-callers .Txl .xml Produce calls-list and callers-list for each rule. Delete rules not reachable from main. tpm-crisis .Txl .xml Detect loops and other crisis points. Insert annotations for them. tpm-run .sh Tie all of above together.
Function: Creates assertions describing what must be true in order for a crisis to occur at a crisis-point. Propagates these to the tops of their rules, taking into account the effects of preceding statements.Status: Barely works well enough to prove itself correct. Contains only about 70 expression-simplification rules (other systems have over 2000). Contains only about 10 safe-looping paradigms (surely there are many many more). Not handled: recursions where multiple matches are possible on same scope, patterns that require backtracking, [integernumber] and [floatnumber].Difficulty: This stage depends heavily on TXL's unifier. I've tried writing versions of this program in other languages such as Lisp, but could never get all the bugs out because so much pattern-matching code needed to be written. In TXL my rules often work on the first try!Documentation: See Appendix A for the notation of assertions, Appendix B for the uses of assertion-splits, and Appendix C for safe-looping paradigms.Components of the assert stage:
File Source Annotated What it does tpa-assert .Txl .xml * Generate assertions at crisis-points. See Appendix A for more info. tpa-migrate .Txl .xml * Migrate assertions to tops of rules. tpa-construct .Txl .xml * Expand pattern-constructors to fully elaborated form. tpa-match .Txl .xml * Substitute replacements for patterns in assertions. tpa-expr .Txl .xml * Simplify assertion-expressions. tpa-paradigm .Txl .xml * Detect safe-looping paradigms. See Appendix C for more info. tpa-split .Txl .xml * Split disjunctions into separate timelines. See Appendix B for more info. tpa-join .Txl .xml Roll up proven and failed timelines. tpa-cleanup .Txl .xml Delete undecided assertions for decided timelines. Delete typedefs for deleted variables. tpa-common .Txl Common subroutines. tpa-minmax .Txl Subroutines for determining the value range for an expression. tpa-run .sh Tie all of above together. Note: this shell script contains a loop! It calls expr, split, join repeatedly until done.
Status: Nothing to show yet.
Foretaste: A previous rev of this document suggested two test cases in tpp-directives.xml: the crisis point labelled T18 (which is actually a serious safety issue with this program) vs. the one called T4 (not really a problem at all). Progress so far:
- T4 TxlProp has determined that this crisis-point is unsafe if the From parameter to [quoteXML] is an empty string. It needs to propagate this assertion back to quoteXML's callers, each of which passes a non-empty string literal as the value for From.
- T18 TxlProp has determined that this crisis point is unsafe if the scope-string for [testFileExists] contains shell meta-characters. It needs to propagate this assertion all the way back, to determine that nowhere in this program does it ever test the filenames for safety!