Expand description
§Dada Parser Overview
This document provides an architectural overview of the Dada parser, designed to help contributors (both human and AI) understand the parsing system and make informed modifications.
§Architecture
The Dada parser is a recursive descent parser with the following key characteristics:
- Token-based: Source text is first tokenized, then parsed from the token stream
- Error-recovering: The parser accumulates diagnostics and continues parsing invalid code
- Speculative: Can fork state and backtrack when needed
- Deferred: Complex nested structures (like function bodies) are parsed lazily
§Key Components
§Entry Point
The main entry point is the parse method from the SourceFileParse trait, which:
- Tokenizes the source text
- Creates a
Parserinstance - Parses the module structure
- Returns an AST and any diagnostics
§Core Abstractions
§The Parse Trait
All parseable constructs implement the Parse trait. See the trait documentation for the full interface and method signatures.
§Commitment and Error Recovery
The parser follows a “commitment” model:
- If no relevant tokens are found → parsing hasn’t started
- If we start consuming tokens → we’re committed to parsing that construct
- If parsing fails after commitment → must report an error
This enables robust error recovery - the parser knows when it’s in the middle of a construct versus when it hasn’t started one yet.
§Parse vs Eat Methods
The parser distinguishes between two parsing approaches based on the commitment model:
§parse methods
- Purpose: Attempt to parse an element, but it’s OK if it’s not there
- Returns:
Ok(Some(T))- Successfully parsed TOk(None)- No instance found (no tokens consumed, no commitment)Err(ParseFail)- Started parsing but encountered an error (tokens were consumed, committed)
- Key principle: Only returns
Errafter committing to parse (consuming tokens)
§eat methods
- Purpose: Parse an element that must be present
- Returns:
Ok(T)- Successfully parsed TErr(ParseFail)- Element not found or malformed
- Special case: Keywords only have
eatmethods since they’re either present or not
Important: Returning Err does NOT directly report an error to the user. Errors are accumulated in the parser and may be reported later depending on context.
§Token Stream
The tokenizer (tokenize in tokenizer.rs) produces a flat vector of tokens. Each token includes:
- Token type (keyword, identifier, operator, etc.)
- Span information
- Whether it follows a newline (for layout-sensitive parsing)
§Parsing Strategy
§Expression Parsing
Expressions use precedence climbing for binary operators:
- Operators have defined precedence levels
- The parser handles left/right associativity
- Postfix operations (field access, calls) are parsed iteratively
§Statement/Item Disambiguation
The parser uses speculative parsing to distinguish between:
- Items (functions, classes) that can appear at module level
- Statements that must be wrapped in a function
§Error Recovery
The parser employs several recovery strategies:
- Skipping to known synchronization points
- Inserting synthetic tokens for common mistakes
- Continuing with partial ASTs when possible
§Adding New Syntax
When adding new syntax to the parser:
- Update the tokenizer if new tokens are needed
- Define the AST node in
dada-ir-ast - Implement
Parsefor the new construct - Update parent parsers to recognize the new syntax
- Add error recovery for common mistakes
§Design Decisions
§Why Token-Based?
Pre-tokenization simplifies the parser and enables:
- Clean separation between lexical and syntactic analysis
- Efficient lookahead and backtracking
- Better error messages with token expectations
§Why Deferred Parsing?
Deferring nested content parsing:
- Improves incremental compilation performance
- Allows parallel parsing of independent functions
- Simplifies the parser state machine
§Comment Handling
Comments use # (not //) and are filtered during tokenization. This means the parser never sees comments, simplifying grammar rules.
§Common Patterns
§Parsing Lists
The Parse trait provides helper methods for parsing multiple items, comma-separated lists, delimited sequences, and arbitrary separators. See how these are used in practice in generics.rs for type parameter lists.
§Optional Syntax
Many constructs have optional components. See functions.rs for examples of optional return types and visibility modifiers.
§Keyword-Initiated Constructs
Constructs that begin with keywords should check for the keyword before attempting to parse. See module_body.rs for how items like class and fn are dispatched based on their leading keyword.
§Debugging Tips
- Enable trace logging to see parser decisions
- Use
parser.fork()to experiment without consuming tokens - Check
parser.cursorto see current position - Examine accumulated diagnostics for error context
Modules§
- classes 🔒
- expr 🔒
- functions 🔒
- generics 🔒
- miscellaneous 🔒
- module_
body 🔒 - prelude
- square_
bracket_ 🔒args - tokenizer 🔒
- types 🔒
Structs§
- Parser 🔒
Enums§
Traits§
- Parse 🔒
- Parse an instance of
Selffrom the givenParser. - Parse
Guard 🔒