TECHNICAL WHITE PAPER

Mermaid Diagram Driven Development: Verifiable Semantic Networks for Agentic Software Engineering

Abbreviation: MDD Date: May 2026 Classification: Open Technical Standard Proposal Target Audience: AI Systems Architects, Agent Runtime Teams, Formal Methods Engineers, Software Engineering Researchers

Abstract

Current LLM-based software engineering workflows rely heavily on ambiguous Markdown documents, loosely structured prompts, and post-hoc tests. These artifacts are readable but weakly verifiable. They allow semantic drift between requirements, diagrams, agent plans, code, tests, and explanations.

Mermaid Diagram Driven Development (MDD) is a graph-first methodology for transforming source documents into auditable, verifiable development networks. It also functions as a knowledge refinement method for processing specific domain knowledge into explicit state/transition structures, and as a deployment pattern where the heavy toolchain compiles reviewed diagrams once while downstream environments distribute only the lightweight mdd CLI plus compiled .mddbundle artifacts. MDD treats Mermaid diagrams as the first structured abstraction above source material. Mermaid nodes and edges preserve provenance back to original text so humans and agents can audit architectural, behavioral, and explanatory claims.

Mermaid is not the final formal system. Mermaid is compiled into a typed intermediate representation, MDD-IR, which becomes the canonical substrate for deterministic verification:

LLMs extract semantics, propose finite domains, identify ambiguity, explain counterexamples, and suggest graph repairs.
TLA+ currently checks finite state integrity and deadlock freedom when TLC is supplied; richer temporal, safety/liveness, and refinement obligations remain roadmap work.
Z3 currently checks bounded reachability/path expectations over finite MDD-IR transitions.
Lean4 currently checks generated transition source/target obligations; richer semantic extrapolations and refinement claims remain proof obligations.
mdd-engine performs heavyweight compilation and verification.
mdd-agent orchestrates replayable generation, review, source-grounding, verification, bundling, and runtime traversal lanes.
mdd is the lightweight deterministic traversal CLI for agents using prebuilt verified bundles.

MDD reframes autonomous software development and domain-knowledge operationalization as a pipeline of source-backed graph abstraction, knowledge refinement, state-space checking, bounded path solving, explicit proof obligations, evidence-carrying compilation, and deterministic runtime traversal.

1. Core Thesis

Mermaid diagrams are the first auditable abstraction from source reality. LLMs provide semantic interpretation against linked evidence. Deterministic tools verify the consequences, hashes, selectors, and traversal constraints of that interpretation.

The core split is:

LLM        = semantics
Mermaid    = auditable graph abstraction
MDD-IR     = typed canonical graph
TLA+       = state integrity
Z3         = verifiable paths
Lean4      = semantic extrapolation/proof
mdd-engine = heavyweight compiler/verifier
mdd-agent  = replayable orchestration/review lane
mdd        = lightweight runtime traversal CLI

Compactly:

LLMs give states meaning.
Mermaid makes meaning visible.
MDD-IR makes meaning typed.
TLA+ keeps finite states honest when TLC runs.
Z3 checks bounded paths exist or do not.
Lean4 checks generated transition obligations.
mdd keeps agents inside the compiled, reviewed design space.

MDD should not claim to prove a whole implementation correct. It verifies a design-level abstraction and emits artifacts that guide tests, assertions, explanations, and agent traversal.

2. System Architecture

Source docs / code / issues / logs / policies
        ↓
Source manifest: stable IDs, hashes, selectors, spans
        ↓
LLM semantic extraction with provenance
        ↓
Mermaid audit graph
        ↓
Typed MDD-IR
        ↓
TLA+ state integrity checks
        ↓
Z3 path witnesses / forbidden-route checks
        ↓
Lean4 semantic proof obligations
        ↓
Verification-manifest .mddbundle
        ↓
mdd runtime traversal for AI agents

The reference package exposes three command-line entrypoints while preserving a two-surface compiler/runtime split:

mdd-engine = heavy semantic/formal compiler
mdd-agent  = orchestration and review lane over mdd-engine + mdd
mdd        = lightweight verified graph traversal VM

The compiler/runtime split is essential. Full verification may require LLMs, Java/TLA+, Apalache, Z3, Lean4, Lake, theorem-proving libraries, and expensive model-checking runs. Many AI agent environments cannot install or run all of that. They can often run a small CLI and sometimes a Python Z3 package. Therefore mdd-engine performs knowledge refinement and verification once, mdd-agent provides a replayable orchestration/review workflow for agents and operators, and mdd lets many downstream agents safely reuse the emitted portable .mddbundle without shipping the full toolchain.

3. Mermaid as Auditable Abstraction

Mermaid is valuable because it is compact, readable, diffable, and LLM-friendly. It gives humans a reviewable surface between prose and formal models.

However:

Valid Mermaid syntax is not semantic correctness.

The reference compiler intentionally supports only a strict Mermaid subset until the full MDD parser is implemented: graph/flowchart headers, NodeId[Label] declarations, Source -- event --> Target, Source --|event|> Target, and Source --> Target edges, plus comments and fenced-code markers. Unsupported non-comment syntax must produce line-numbered diagnostics and fail closed by default rather than silently dropping semantics. Exploratory workflows may pass --allow-lossy; in that mode the compiler emits the supported subset but carries lossy_parse: true and parse_diagnostics in MDD-IR.

Current parser-compatible edge labels should be deterministic transition identifiers rather than prose captions:

flowchart TD
Draft[Draft]
Gold[Gold]
Draft -- all_edges_approved --> Gold

Common Mermaid pipe-label syntax such as Draft -->|all edges approved| Gold is valid Mermaid, but it is not accepted by the current fail-closed MDD compiler. Normalize that form to Draft -- all_edges_approved --> Gold until pipe-label support is implemented.

Every graph element should preserve provenance:

node:
  id: Authenticated
  label: Authenticated Session
  provenance:
    file: docs/auth.md
    lines: [42, 47]
    status: source_explicit

edge:
  id: refresh_invalid_logout
  from: TokenExpired
  to: Unauthenticated
  event: Refresh token invalid, revoked, replayed, or malformed
  guard: refresh_case != valid
  provenance:
    file: docs/auth.md
    lines: [52, 55]
    status: source_inferred

Required provenance labels:

source_explicit  = directly supported by source
source_inferred  = inferred from source, not directly stated
llm_hypothesis   = proposed by LLM; needs review/checking
human_confirmed  = confirmed by reviewer/user

Every diagram is therefore a claim about a larger source reality. Existing Mermaid syntax can be ingested deterministically, but syntax does not establish semantic authority. Source-linked MDD must preserve the evidence that makes a node, edge, guard, action, or refinement a faithful abstraction:

{
  "schema": "mdd-source-map/v0",
  "sources": [
    {
      "id": "auth_spec",
      "kind": "requirements",
      "path": "docs/auth.md",
      "sha256": "..."
    }
  ],
  "links": [
    {
      "target_type": "edge",
      "target_id": "TokenExpired__refresh_token_valid__Authenticated",
      "source_id": "auth_spec",
      "selector": {"type": "line_span", "start": 42, "end": 56},
      "claim": "source supports this transition"
    }
  ]
}

mdd-agent with Qwen can then review each linked excerpt and emit structured semantic findings such as supported, contradicted, missing, oversimplified, or ambiguous. This review is auditable evidence, not deterministic proof. Deterministic tooling should check that source files exist, hashes match, selectors resolve, required node/edge coverage exists, and stale source hashes force conservative status downgrades until remapping and review are repeated.

The first deterministic source-grounding commands are:

mdd-engine source-link diagram.ir.json \
  --source docs/auth.md \
  --mapping auth-source-map.json \
  --out diagram.sourced.ir.json

mdd-engine source-check diagram.sourced.ir.json \
  --require-edge-coverage \
  --out diagram.source-check.json

mdd-agent review-source-map diagram.sourced.ir.json \
  --llm-source-review-file fixtures/auth-source-review.json \
  --out diagram.source-review.json \
  --prompt-out diagram.source-review-prompt.md

# Or let mdd-agent process run source-link, source-check, source-review, bundle inclusion,
# and runtime traversal as one replayable source-grounded lane:
mdd-agent process docs/auth.md \
  --out-dir build/auth-agent-source-grounded \
  --source-map auth-source-map.json \
  --require-source-edge-coverage \
  --llm-source-review-file fixtures/auth-source-review.json \
  --start-state TokenExpired

mdd-engine approve-edge diagram.sourced.ir.json \
  --edge refresh_token_valid \
  --approver jackie.eth \
  --basis source_linked_review \
  --require-source-review \
  --source-review diagram.source-review.json \
  --out diagram.reviewed.ir.json

mdd-engine bundle diagram.reviewed.ir.json \
  --source-check diagram.source-check.json \
  --source-review diagram.source-review.json \
  --out diagram.mddbundle

mdd view diagram.mddbundle --state TokenExpired

source-link writes source records and link claims into ir["source_map"] while preserving the compiler's original parser provenance under ir["source_map"]["parser_source_map"]. source-check emits mdd-source-check/v0 with stale-source, selector, target-ID, and optional coverage findings. mdd-agent review-source-map builds exact source excerpts and writes mdd-source-review/v0; replay mode accepts --llm-source-review-file for deterministic tests. mdd-agent process --source-map now wires these same artifacts into the full process lane: it compiles, source-links, source-checks, source-reviews, uses the source-linked IR for spec/explore/introspect/verify/bundle/runtime, and includes source-check.json plus source-review.json in the bundle. approve-edge --require-source-review fails closed unless each approved edge has supported source-review evidence covering all current source_map links, and it blocks any non-supported finding for that edge before recording the review IDs on the Gold approval. mdd-engine bundle --source-check --source-review packages those manifests, hashes them in bundle.json, and mdd view exposes bundle-level plus per-operation source-grounding status so deployment agents can traverse compiled diagrams without hiding missing, stale, contradicted, oversimplified, or ambiguous evidence. Source review remains semantic evidence, not deterministic proof.

Autoform-inspired source-to-target lifecycle

MDD also includes an Autoform-inspired source-to-target lifecycle that adapts the useful prior-art loop—source corpus, extracted targets, dependency DAG, deterministic gates, evaluation, and trace learning—into MDD-native artifacts. Autoform Bot is non-commercial prior art only; Mermaid Engine does not vendor or copy its code, prompts, docs, or test data. The product boundary remains deterministic CLI artifacts with replay files for live LLM-dependent steps.

mdd-agent extract-targets docs/requirements \
  --out-dir build/targets \
  --llm-extractions-file fixtures/extractions.json

mdd-agent obligations init \
  --targets build/targets/mdd-targets.json \
  --out build/run/obligations.json

mdd-agent obligations list build/run/obligations.json --status all
mdd-agent eval-run build/run --out build/run/eval-report.json
mdd-agent learn failure build/run --out-dir build/run/learned

mdd-agent ingest docs/requirements/auth.md \
  --task-type generic_domain_ingest \
  --extract-targets \
  --llm-extractions-file fixtures/extractions.json \
  --obligation-dag build/run/obligations.json \
  --eval-report \
  --out-dir build/run \
  --llm-diagrams-file fixtures/diagrams.md

extract-targets chunks source corpora, validates replayed mdd-target-extractions/v0, and emits mdd-targets/v0 plus source-map drafts. obligations persists a typed mdd-obligation-dag/v0 for source review, formal backend, bundle, runtime smoke, and Gold-approval work items. eval-run emits mdd-eval-report/v0 support-cone reports and fails closed on missing IR, required source-map gaps, stale source checks, unsupported source-review findings, verification overclaims, bundle manifest gaps, or Gold claims without approval evidence. learn failure creates advisory guides from failed traces; those guides may help future runs but cannot override parser/source/verification/backend/Gold guardrails.

4. MDD-IR

MDD-IR is the canonical typed representation. It separates Mermaid syntax from formal semantics.

MDD-IR should contain:

source artifacts, source hashes, and selector/span maps
source-review findings and stale-source status
graph kind: state machine, workflow, dependency graph, protocol, explanation graph
stable node IDs and edge IDs
labels and human descriptions
source spans and provenance status
actors/events
guards, preconditions, postconditions
finite semantic domains
environment actions
safety invariants
liveness candidates and fairness assumptions
forbidden transitions
coverage obligations
backend projection hints for TLA+, Z3, Lean4, tests, and runtime
edge-level approval status and approval evidence

Metal status is computed from reviewed artifacts and edge approvals:

Draft  = generated/compiled candidate graph; source grounding may be unknown
Source-linked Draft = source artifacts are linked by stable IDs, hashes, and selectors
Bronze = current source-linked TLA+ state/node/edge mapping has been proposed and reviewed
Silver = semantic state exploration has been proposed and reviewed against current linked source reality
Gold edge = a user or supervising AI agent has approved that formalized edge using current source-linked review evidence
Gold diagram = every edge in the diagram is Gold and linked source evidence is current

Gold is intentionally edge-level. A diagram can be Silver with some Gold edges while remaining non-Gold overall. The whole diagram becomes Gold only when every edge carries status: gold plus an approval record backed by current supported source-review evidence. If linked source changes, the lifecycle model treats existing mapping, exploration, source-review, and Gold approval for that edge as stale; the diagram drops back to Draft until the source is refreshed, remapped, re-explored, source-reviewed, and re-approved. If semantic-frontier introspection expands a Silver/Gold diagram with new edges, those new edges do not inherit mapping, exploration, source-review support, or Gold approval; the evolved diagram must re-enter Bronze/Silver review and edge approval.

MDD should also preserve semantic compression. The top-level diagram should express the solid idea, not every hidden dynamic. A coarse parent edge may remain compressed and still be approved at the abstract level. When detail matters, MDD attaches an edge-refinement subdiagram rather than flattening all internal states into the parent diagram:

Parent diagram:  A -- important_transition --> B
Subdiagram:      A -- micro_step_1 --> X -- micro_step_2 --> B
Projection:      A .. X .. B collapses back to important_transition

Refinement is a contract, not decoration. An attached subdiagram must preserve the parent edge's entry/exit meaning, classify every terminal path, and pass projection/refinement checks. Parent Gold does not automatically transfer downward: subedges start below Gold, and the expanded parent refinement becomes Gold only when the subdiagram mapping, exploration, projection check, and subedge approvals are complete.

The lifecycle itself is modeled in TLA+ as specs/DiagramEvolution.tla. The bounded config specs/DiagramEvolution.cfg checks that:

lifecycle state remains well-typed;
source-link/current evidence gates mapping, and source-review support gates approval after exploration;
approval evidence is chained after current source review, semantic exploration, and mapping;
compressed edges stay in the parent diagram instead of forcing every detail into the main graph;
edge refinement subdiagrams attach only to compressed parent edges;
subdiagram refinement evidence is chained through mapping, exploration, projection validity, and subedge approval;
Gold diagram status is exactly the rollup where every current edge is approved with current source evidence and every attached refinement is approved;
Gold cannot appear without current source, source-review, and exploration evidence;
source staleness removes current evidence, resets frontier review, and returns the diagram to Draft until remapped/re-reviewed;
expansion below full mapping returns the diagram to Draft until the new edges are reviewed.

Run it with:

java -jar /path/to/tla2tools.jar -tool -config specs/DiagramEvolution.cfg specs/DiagramEvolution.tla

Example Gold edge metadata:

{
  "id": "refresh_invalid_logout",
  "from": "TokenExpired",
  "to": "Unauthenticated",
  "event": "refresh_invalid_logout",
  "status": "gold",
  "approval": {
    "approved_by": "jackie.eth",
    "supervisor_kind": "user",
    "approved_at": "2026-05-23T00:00:00+00:00",
    "basis": ["semantic_state_exploration", "source_linked_review", "manual_domain_review"],
    "source_review_ids": ["source-review:refresh_invalid_logout:v1"],
    "notes": "Invalid refresh must remain a rejection/logout path."
  }
}

Example finite domain:

refresh_case:
  values: [valid, expired, revoked, replayed, malformed]
properties:
  - revoked token never reaches Authenticated
  - replayed token reaches Unauthenticated
  - valid token may re-enter Authenticating

This finite-domain step is crucial. Vague prose such as “handle auth edge cases” is not checkable. MDD turns it into bounded state/path obligations.

5. LLM Role

LLMs are semantic assistants, not proof engines.

Good LLM responsibilities:

extract candidate states, events, actors, and edges
map source text to graph elements
identify ambiguity and missing cases
propose finite edge-case domains
propose safety/liveness properties
explain model-checker counterexamples
suggest graph repairs
generate human-facing documentation from checked artifacts

Forbidden LLM responsibilities:

inventing unsupported requirements without provenance labels
treating Mermaid syntax as proof
declaring a path valid without deterministic checking
treating a semantic extrapolation as verified without Lean4 or another checker
bypassing mdd when a transition is rejected

Prompt rule:

Your output is a semantic proposal, not proof.
Preserve provenance and produce typed obligations for deterministic checking.

6. TLA+ for State Integrity

TLA+ is the state-space integrity layer. MDD projects MDD-IR into TLA+ when the graph describes behavior over time. The current reference implementation now includes a deterministic TLC subset: mdd-engine verify IR --tla --tla-check --tla-out-dir DIR --tla2tools-jar /path/to/tla2tools.jar emits a finite MDDDiagram.tla / MDDDiagram.cfg pair from the compiled IR and checks TypeOk plus deadlock freedom over the emitted transition relation. The generated module includes exact MDD_NODE_COVERAGE and MDD_EDGE_COVERAGE comments. This is real TLC execution for finite state integrity, but it does not yet encode rich guards, domains, liveness/fairness, refinement obligations, or temporal requirements beyond the emitted transition model.

Mapping:

Mermaid nodes       → finite States
Mermaid edges       → actions in Next
current node        → state variable
previous node       → prev variable when needed
guards/domains      → constants and variables
failure events      → environment actions
safety claims       → invariants
liveness claims     → temporal properties/fairness

TLA+ checks questions like:

Can the system deadlock?
Can a forbidden state be reached?
Can a terminal state have outgoing transitions?
Can a retry loop continue forever under stated assumptions?
Can an agent deploy without test success?
Does every allowed path preserve the declared invariant?

TLA+ should not model everything. Following the AWS lesson, the model should capture the smallest dangerous state space that can expose design bugs.

7. Z3 for Verifiable Paths

Z3 handles bounded path satisfiability and route constraints. The current reference implementation executes a deterministic subset with z3-solver: bounded reachability over the finite MDD-IR transition relation. It records solver version, bound, expectation, reachable/satisfied booleans, and a witness path when one exists. This is real bounded Z3 execution, but it is not yet full guard evaluation or liveness/fairness proof.

Z3 answers:

Is there a path from A to B under these guards?
Is this forbidden route UNSAT?
What witness path covers this semantic case?
What minimal path set covers all domain values?
Can an agent reach the requested target without violating constraints?

Example result:

{
  "query": "cover refresh_case = replayed",
  "result": "sat",
  "witness_path": ["Authenticated", "TokenExpired", "Unauthenticated"],
  "edges": ["ttl_exceeded", "refresh_invalid_logout"]
}

Z3 is also useful in mdd when lightweight runtime path checks are needed and the certified bundle allows bounded queries.

8. Lean4 for Semantic Extrapolation

LLMs often generalize from examples:

All non-valid refresh token cases log out.

MDD treats such statements as proof obligations, not conclusions. Lean4 checks whether the generalized claim follows from formalized definitions. The current reference implementation now includes a deterministic Lean4 subset: mdd-engine verify IR --lean --lean-check --lean-out-dir DIR --lean-bin /path/to/lean emits a finite MDDObligations.lean file from MDD-IR and runs Lean4 over state/edge definitions plus one source/target soundness theorem per edge. This is real Lean4 execution for generated finite transition obligations, but it does not yet prove rich semantic extrapolations, runtime guard semantics, refinement obligations, or liveness/fairness.

Example obligation:

inductive RefreshCase where
  | valid | expired | revoked | replayed | malformed
  deriving DecidableEq, Repr

inductive AuthState where
  | unauthenticated | authenticating | authenticated | tokenExpired
  deriving DecidableEq, Repr

def refreshTransition (r : RefreshCase) : AuthState :=
  match r with
  | RefreshCase.valid => AuthState.authenticating
  | _ => AuthState.unauthenticated

theorem non_valid_refresh_logs_out
  (r : RefreshCase)
  (h : r ≠ RefreshCase.valid) :
  refreshTransition r = AuthState.unauthenticated := by
  cases r <;> simp [refreshTransition] at *

Lean4 is not a semantic oracle. It checks only formalized definitions and theorems. The LLM may propose those definitions, but deterministic tooling must check them.

9. AWS-Derived Principles

The AWS paper “Use of Formal Methods at Amazon Web Services” (Newcombe et al., CACM 2015, https://cacm.acm.org/research/how-amazon-web-services-uses-formal-methods/) teaches that formal methods work in industry when framed as debugging designs.

AWS found that prose, diagrams, reviews, tests, and fault injection were necessary but not sufficient. Subtle bugs often lived in the design itself and appeared only under rare interleavings of failures, retries, crashes, recoveries, message reorderings, and operator actions.

MDD adopts this lesson:

MDD is not diagram drawing.
MDD is source-backed, exhaustively testable design debugging for agentic systems.

9.1 Start with “what must go right?”

Before drawing the happy path, MDD prompts should ask for:

safety invariants
liveness candidates
forbidden transitions
recovery obligations
environment assumptions
source provenance for each claim

Prompt block:

Before producing Mermaid, identify what must go right.
List safety properties, liveness candidates, forbidden transitions, and assumptions.
Mark every item as source_explicit, source_inferred, llm_hypothesis, or human_confirmed.

9.2 Model the environment with the system

A model containing only the application happy path is misleading. Environment actions must be first-class.

For agentic software, the environment includes:

network_timeout
message_loss
message_duplicate
message_reorder
process_crash
process_restart
external_service_error
operator_action
user_correction
permission_denial
tool_failure
malformed_tool_output
stale_memory
conflicting_instruction
llm_invalid_intent
retry/backoff
repair/recovery

Prompt block:

Expand the graph with environment actions.
Do not model only the happy path.
For each environment action, specify actor, trigger, affected state, guard, postcondition, and provenance.

9.3 Prefer small bounded dangerous models

AWS found serious bugs in relatively small models. MDD should not reproduce every implementation detail. It should capture the essential state space where correctness can fail.

Prompt block:

Build the smallest finite model that can expose the dangerous behavior.
Omit implementation details unless they affect a stated property.
State every abstraction decision and omission.

9.4 Counterexamples are product artifacts

MDD should translate verifier results into useful artifacts:

TLA+/Z3 counterexample
  → Mermaid sequence/state trace
  → violated-property explanation
  → graph repair proposal
  → regression test
  → runtime assertion
  → updated bundle after re-verification

Counterexamples should be classified as:

product_design_bug
graph_bug
formalization_bug
abstraction_gap

9.5 Generate tests and assertions

Formal models do not prove the implementation matches the model. They do help generate better assertions and edge-case tests.

MDD should emit:

tests from Z3 witnesses
tests from TLA+ counterexamples
tests from coverage gaps
assertions from verified invariants
runtime checks for transition legality

9.6 Know the scope

MDD is strongest for:

state validity
path validity
forbidden transitions
workflow correctness
recovery logic
agent traversal constraints
concurrency/interleaving risks
semantic edge-case coverage
source-backed design documentation

MDD is weaker unless separately modeled:

emergent performance collapse
queueing/load dynamics
hard real-time latency
probabilistic availability
economic/adversarial incentives
implementation correctness without tests/assertions/instrumentation

10. Reusable LLM Prompt Preamble

You are assisting with Mermaid Diagram Driven Development (MDD).
Your job is to convert source material into auditable graph semantics that can be checked by deterministic tools.
Do not treat your output as proof. Your output is a semantic proposal that must preserve provenance and be compiled into typed MDD-IR, then checked by TLA+, Z3, Lean4, or deterministic CLI logic.

Prefer small bounded models that capture dangerous state/path behavior over large implementation-level models.
Always model the system together with its environment: failures, retries, crashes, restarts, external services, operator actions, user corrections, tool errors, and LLM decision points.
Start from “what must go right?” before enumerating “what might go wrong?”

Extraction prompt:

Given the source material, extract an MDD graph proposal.
Return:
1. Mermaid diagram with stable node/edge IDs.
2. Node table with type, label, terminal flag, and source provenance.
3. Edge table with event, actor, guard, postcondition, and source provenance.
4. Environment actions.
5. Safety invariants.
6. Liveness candidates and fairness assumptions.
7. Forbidden transitions.
8. Finite semantic domains.
9. Ambiguities and assumptions.
10. Suggested TLA+/Z3/Lean obligations.

Rules:
- Do not invent unsupported requirements.
- Mark unsupported proposals as llm_hypothesis.
- Prefer finite domains and explicit guards over vague prose.
- Model the environment together with the system.
- Mermaid is audit syntax, not proof.

Runtime prompt for agents using mdd:

You are an LLM agent operating under MDD runtime constraints.
Use the verified .mddbundle as the authority for legal traversal.
You may propose an intent, but mdd determines whether the transition is allowed.
If mdd rejects a transition, do not bypass it. Explain the rejection, inspect allowed alternatives, and choose a legal next step or ask for human review.

Before each action:
1. State current graph node/state.
2. State intended transition and why.
3. Ask mdd whether the transition is allowed.
4. If allowed, proceed and record trace.
5. If rejected, use the rejection reason and choose a legal path.

11. `mdd-engine`

mdd-engine is the heavyweight compiler/verifier.

Responsibilities:

ingest source docs, code, tickets, and diagrams
deterministically ingest existing .mmd / .mermaid files and folders with manifests
call LLMs for semantic extraction
generate Mermaid audit graphs
enforce provenance requirements
parse Mermaid deterministically
compile typed MDD-IR
generate TLA+, Z3, and Lean4 artifacts
run model checking and solver queries
collect counterexamples and witnesses
render counterexamples back into Mermaid
emit tests and assertions
sign and package .mddbundle

Example:

mdd-engine ingest docs/auth.md --out graphs/auth.mmd
# Or ingest existing Mermaid files/folders without LLM generation:
mdd-engine ingest graphs --out build/ingested
mdd-engine compile graphs/auth.mmd --out build/auth.ir.json
mdd-engine verify build/auth.ir.json --tla --tla-check --tla-out-dir build/tla --tla2tools-jar /path/to/tla2tools.jar --z3 --z3-start TokenExpired --z3-target Authenticated --z3-bound 1 --z3-expect reachable --lean --lean-check --lean-out-dir build/lean --lean-bin /path/to/lean --out build/auth.verification.json
mdd-engine semantic-index build/auth.ir.json --mapping build/auth-semantic-mapping.json --out build/auth-semantic-index.json
mdd-engine bundle build/auth.ir.json --verification build/auth.verification.json --semantic-index build/auth-semantic-index.json --out dist/auth.mddbundle
mdd-engine sign dist/auth.mddbundle --cert dist/auth.mddcert

For existing Mermaid sources, mdd-engine ingest is deterministic: file ingest copies the .mmd/.mermaid source, writes a compiled IR next to it, and writes an adjacent mdd-ingest-manifest/v0; folder ingest preserves relative paths under OUT/diagrams/, writes IRs under OUT/ir/<relative>.ir.json, and records every diagram in OUT/mdd-ingest-manifest.json. Non-Mermaid source files keep the Qwen-backed generation path.

The output is a proof-carrying bundle containing compiled graph data, verification metadata, transition tables, path constraints, provenance, tests, assertions, and optional audit artifacts.

12. `mdd`

mdd is the lightweight runtime traversal CLI. It is not a full verifier. It consumes already-verified bundles and enforces deterministic traversal.

Responsibilities:

load .mddbundle
validate bundle hash/signature
expose current states, nodes, edges, and allowed actions
expose one unified semantic/logical operating view for coding agents
check proposed transitions
reject invalid paths
explain rejection reasons
record runtime traces
optionally run bounded Z3 queries if available and certified
resolve approved fuzzy utterances through bundled semantic indexes without inventing transitions
return machine-readable JSON to agents

Example commands:

mdd view dist/auth.mddbundle \
  --state TokenExpired

mdd state dist/auth.mddbundle

mdd next dist/auth.mddbundle \
  --state TokenExpired

mdd check-transition dist/auth.mddbundle \
  --from TokenExpired \
  --to Authenticated \
  --event refresh_token_valid

mdd explain dist/auth.mddbundle \
  --edge refresh_invalid_logout

mdd resolve dist/auth.mddbundle \
  --state TokenExpired \
  --utterance "renew session"

mdd trace dist/auth.mddbundle \
  --input runtime-trace.jsonl

Runtime rule:

The LLM proposes intent.
mdd computes legal transitions.
The LLM chooses among legal transitions.
mdd records the trace.

If mdd rejects a transition, an agent must not bypass it. It should inspect allowed alternatives, explain the rejection, repair the graph through mdd-engine, or ask for human review. mdd view is the preferred agent entrypoint when the agent needs one JSON object containing current state, available operations, verification level, edge approval status, and approved aliases. Semantic fuzziness is deterministic and fail-closed: mdd-engine semantic-index accepts only approved mappings that cite human approval or named/versioned external policies such as translation, abbreviation, acronym, domain glossary, or controlled vocabulary. Runtime mdd resolve checks exact legal edges first, then uses the bundled semantic-index.json to select among currently legal outgoing transitions only; ambiguity returns candidates instead of choosing arbitrarily.

13. `.mddbundle`

A .mddbundle is the portable verified artifact emitted by mdd-engine and consumed by mdd.

14. Proof-Carrying Semantic Coverage

MDD introduces proof-carrying semantic coverage: each meaningful edge case is tracked from source text through graph representation, formal checks, generated tests, and runtime traces.

Example:

{
  "semantic_case": "refresh_case = replayed",
  "source": "docs/auth.md:52-55",
  "mermaid_edge": "refresh_invalid_logout",
  "tla_status": "safe",
  "z3_witness_path": ["Authenticated", "TokenExpired", "Unauthenticated"],
  "lean_theorem": "non_valid_refresh_logs_out",
  "generated_test": "refresh_replayed_logs_out.test.ts",
  "runtime_assertion": "no_authenticated_after_replayed_refresh",
  "status": "covered"
}

This is stronger than ordinary code coverage. It connects meaning to source, verification, tests, and runtime behavior.

15. Related Work and Differentiation

MDD sits at the intersection of several existing areas.

Spec-driven AI development

Projects such as GitHub Spec Kit, OpenSpec, Kiro Specs, and spec-workflow-mcp validate the demand for structured AI-native development. They organize requirements, designs, and tasks, but generally remain Markdown/workflow systems rather than graph-compiled formal verification systems.

MDD difference:

Spec tools organize intent.
MDD verifies and executes intent as state/path constraints.

Agent graph runtimes

LangGraph, XState, and Stately Agent show the value of graph/state-machine control for agents and applications.

MDD difference:

Graph runtimes execute workflows.
MDD verifies source-provenanced semantic graphs before agents traverse them.

Formal methods

TLA+, Apalache, P, FizzBee, Alloy, Lean4, and related tools provide the verification backbone. AWS’s industrial use of TLA+ demonstrates that formal design models find important bugs missed by conventional methods.

MDD difference:

Formal tools require formal authoring.
MDD provides a source-doc → Mermaid → MDD-IR → formal backend → runtime bundle pipeline.

LLMs for formalization

Research on LLM-assisted formal specification, TLA+ proof generation, Lean theorem proving, and proof-carrying code completions shows that LLMs can help produce formal artifacts.

MDD difference:

LLM formalization work often targets specs/proofs directly.
MDD inserts an auditable graph layer and produces runtime traversal constraints for agents.

GraphRAG and knowledge graphs

GraphRAG extracts document graphs for retrieval and synthesis.

MDD difference:

GraphRAG answers questions from document graphs.
MDD verifies and executes state/path semantics from document graphs.

16. Contribution Claims

MDD contributes:

Source-provenanced Mermaid abstraction: a human-auditable layer between natural language and formal models.
Typed MDD-IR: a canonical graph representation with guards, domains, invariants, coverage obligations, and provenance.
Multi-backend formal verification: TLA+ for state integrity, Z3 for path witnesses, Lean4 for semantic extrapolation.
Proof-carrying semantic coverage: edge cases linked from source text to verification, tests, assertions, and runtime traces.
Engine/runtime split: mdd-engine performs expensive verification; mdd enforces lightweight deterministic traversal.
Agent traversal protocol: LLMs may propose intent, but deterministic runtime logic approves or rejects transitions.
Environment-aware modeling discipline: failures, retries, tool errors, user corrections, operator actions, and LLM invalid intents are modeled with the system.

Novelty statement:

Unlike spec-driven development tools, MDD does not stop at structured Markdown artifacts. Unlike graph runtimes, MDD does not merely execute workflows. Unlike formal methods tools, MDD does not require engineers to directly author raw formal specifications. MDD introduces a source-provenanced Mermaid layer and typed graph IR that compile into formal backends and lightweight runtime bundles for AI agents.

17. Limitations

MDD has important limits:

A verified model is not the real system.
Source documents may be incomplete, stale, contradictory, or wrong.
Source-linked semantic review can find unsupported abstractions, but Qwen review evidence is not deterministic proof.
LLM extraction may mislabel semantics.
Formal checks only cover encoded assumptions and bounded domains.
Lean4 checks formalized definitions, not informal intent.
mdd verifies transitions under a bundle, not global reality.
Implementation correctness still needs code review, tests, assertions, monitoring, and integration discipline.
Performance collapse, queueing dynamics, probabilistic availability, and economic/adversarial behavior must be explicitly modeled or marked out of scope.

MDD’s mature claim is:

MDD verifies design-level state/path semantics and constrains agent traversal under declared assumptions.

Not:

MDD proves the entire software system correct.

18. Implementation Roadmap

Phase 1: Mermaid ingestion and deterministic parsing Phase 2: typed MDD-IR schema Phase 3: source manifests, source maps, source hash/selector checks, and stale-source downgrades Phase 4: Qwen-backed source-map review for node/edge/guard/refinement abstraction correctness Phase 5: provenance and audit UI Phase 6: TLA+ backend for state integrity Phase 7: Z3 backend for path witnesses Phase 8: Lean4 proof obligations for semantic extrapolation Phase 9: .mddbundle packaging, source-review manifests, and signatures Phase 10: lightweight mdd runtime traversal Phase 11: counterexample-to-Mermaid rendering Phase 12: generated tests, assertions, and CI integration

CI gate example:

mdd-engine ci

It should fail if:

Mermaid cannot parse
graph elements lack provenance or source links
linked source hashes are stale or selectors no longer resolve
source-review findings contradict a Gold edge
MDD-IR schema validation fails
TLA+ invariants fail
forbidden paths are SAT
required coverage obligations lack witnesses
Lean4 proof obligations fail
generated bundle is stale relative to source hashes

19. Conclusion

MDD transforms ambiguous prose into auditable graphs, graphs into typed formal models, models into verified tests and obligations, verified artifacts into runtime bundles, and runtime traversal into logic-guided agent behavior.

The essential idea is:

mdd-engine compiles proof-carrying Mermaid diagrams.
mdd lets LLM agents traverse those diagrams without violating verified state/path constraints.

MDD is best understood as:

Mermaid-based, source-provenanced, exhaustively testable design debugging for AI-assisted software engineering.

The goal is not prettier diagrams. The goal is to find design bugs that prose, reviews, tests, and ordinary diagrams miss — and then give agents a verified design space they can traverse without inventing invalid paths.

TECHNICAL WHITE PAPER

Mermaid Diagram Driven Development: Verifiable Semantic Networks for Agentic Software Engineering

Abstract

1. Core Thesis

2. System Architecture

3. Mermaid as Auditable Abstraction

Autoform-inspired source-to-target lifecycle

4. MDD-IR

5. LLM Role

6. TLA+ for State Integrity

7. Z3 for Verifiable Paths

8. Lean4 for Semantic Extrapolation

9. AWS-Derived Principles

9.1 Start with “what must go right?”

9.2 Model the environment with the system

9.3 Prefer small bounded dangerous models

9.4 Counterexamples are product artifacts

9.5 Generate tests and assertions

9.6 Know the scope

10. Reusable LLM Prompt Preamble

11. mdd-engine

12. mdd

13. .mddbundle

14. Proof-Carrying Semantic Coverage

15. Related Work and Differentiation

Spec-driven AI development

Agent graph runtimes

Formal methods

LLMs for formalization

GraphRAG and knowledge graphs

16. Contribution Claims

17. Limitations

18. Implementation Roadmap

19. Conclusion

11. `mdd-engine`

12. `mdd`

13. `.mddbundle`