Skip to content

Compiler Overview

Compiler Overview

The Thagore compiler is entirely self-hosted — it is written in Thagore (.tg files) and compiles itself. The compiler lives in the src/ directory and follows a classic multi-stage pipeline.

Compiler Pipeline

Source (.tg file)
┌──────────────────────────────────────────┐
│ 1. Lexer (tokenize_native) │
│ Source text → Token stream │
│ │
│ 2. Parser (parse_to_ast) │
│ Token stream → AST (ProgramAstNative)│
│ │
│ 3. Typechecker (typecheck_to_typed) │
│ AST → Typed-IR (ProgramT) │
│ │
│ 4. Lowering (transform) │
│ Typed-IR → Core-IR (CoreProgram) │
│ │
│ 5. LLVM Emitter (emit_program_core) │
│ Core-IR → LLVM IR (.ll file) │
│ │
│ 6. Clang/LLVM (external) │
│ LLVM IR → Native binary │
└──────────────────────────────────────────┘
Native Executable (.exe / ELF binary)

Source Organization

src/
├── syntax/ # Stage 1-2: Lexer & Parser
│ ├── lexer/ # Token definitions
│ ├── parser/ # Header parsing utilities
│ ├── native/
│ │ ├── lexer.tg # Main lexer (tokenizer)
│ │ └── parser.tg # Main parser (AST builder)
│ └── ast/ # AST node definitions
├── semantics/ # Stage 3: Type checking & validation
│ ├── typecheck/
│ │ └── program.tg # Main typecheck entry point
│ ├── typedir/
│ │ └── nodes.tg # ProgramT type definition
│ ├── intent/
│ │ └── matcher.tg # Intent goal validation
│ ├── pass/
│ │ └── program.tg # Semantic preflight checks
│ └── traits/ # Trait resolution
├── lowering/ # Stage 4: IR transformation
│ ├── coreir/
│ │ └── nodes.tg # CoreProgram definition
│ └── transform/
│ ├── program.tg # Main lowering entry
│ ├── comptime/ # Compile-time evaluation
│ └── preprocess/ # Source preprocessing
├── codegen/ # Stage 5: Code generation
│ ├── native/
│ │ ├── emitter.tg # Main emitter entry point
│ │ └── emit/ # Emission sub-modules
│ │ ├── source.tg # Source-level emission
│ │ ├── expr.tg # Expression emission
│ │ ├── stmt.tg # Statement emission
│ │ ├── flow.tg # Control flow emission
│ │ ├── string.tg # String handling
│ │ ├── helpers.tg # Utility functions
│ │ └── text.tg # Text processing
│ ├── llvm/ # LLVM API bindings
│ │ ├── entry.tg
│ │ └── api/
│ │ ├── core.tg
│ │ └── target.tg
│ └── interpreter/
│ └── eval.tg # Interpreter mode
├── driver/ # Stage 6: CLI & tooling
│ ├── cli/
│ │ ├── main.tg # CLI entry point
│ │ ├── compat.tg # Compatibility shim
│ │ ├── common/core.tg # Shared utilities
│ │ ├── build/flow.tg # Build command
│ │ ├── test/ # Test runner
│ │ ├── fix/flow.tg # Autofix command
│ │ └── install/ # Toolchain installer
│ └── autofix/ # Autofix engine
│ ├── engine.tg
│ ├── run/
│ ├── suggest/
│ ├── rules/
│ ├── report/
│ └── common/
├── thagore.tg # Compiler entry point
└── thg.tg # Compat entry point

Entry Point

The compiler’s entry point is src/thagore.tg:

src/thagore.tg
import "src/driver/cli/main.tg" as cli
func main() -> i32:
return cli.main()

This delegates to the CLI driver which parses command-line arguments and orchestrates the build pipeline.

Key Data Structures

StructureFilePurpose
ProgramAstNativesrc/syntax/native/parser.tgParser output — AST with feature counters
ProgramTsrc/semantics/typedir/nodes.tgTyped IR after type checking
CoreProgramsrc/lowering/coreir/nodes.tgCore IR after lowering, ready for emission

Build Process

When you run thagore build input.tg -o output.exe, the compiler:

  1. Reads the source file
  2. Strips comments (# and //)
  3. Tokenizes using indentation-aware lexer
  4. Parses to AST, extracting functions, structs, enums, imports
  5. Type-checks against feature edges and semantic constraints
  6. Lowers Typed-IR to Core-IR
  7. Emits LLVM IR (.ll file)
  8. Invokes Clang to compile LLVM IR → native binary, linking against the runtime ABI