Durable Execution — Martha, Temporal, CloudEvents, submissions
Atelier's execution plane ("what can happen") splits durable work into two tiers that meet in this subsystem. The BFF runs its own Temporal worker that executes exactly one workflow — NotificationWorkflow on the admin-notifications queue — and offloads everything else durable to a shared Martha/Temporal backend over a fire-and-forget CloudEvents bridge. The durable-create primitive is the submission: a thin, vertical-agnostic ledger row written the instant an operator-declared execution_mode: workflow action is accepted, so the 202 is durable before any entity exists — even if the ontology or Martha is down. Martha's one generic submission_create workflow later calls back to materialise the real entity. This is how a declared action becomes a durable, retryable, human-in-the-loop process without any vertical-specific BFF code, which is the whole point: operators declare the interesting 20% (the contract), the platform provides the durable 80%.
Data model
| Name | Purpose | Plane | Key fields |
|---|---|---|---|
action_submissions (ledger) | One thin durable row per accepted submission: accept-when-ontology-down, tracker-by-reference, duplicate quarantine. Exactly-once creation is not its job. | BFF-postgres | id, application_code, entity_type, source, status (accepted|creating|created|rejected_duplicate|failed), payload (jsonb: body + reporter + metadata.tenant_id + metadata.submission_contract), duplicate_result, entity_id, correlation_id, actor_id, error |
action_type.submission_contract | Operator/seed JSON declaring the durable-create steps as data. Stashed onto the ledger row at accept time so the callback can't re-resolve the placement. | template/tenant | normalize, validate, duplicates, prepare_create (registered fn names), reference {field,scope,format,prefix_from,year_from}, attachments {entity_type,parent_field,file_field}, audit_policy {mode,actor_field} |
notification_rule + notification_template | Subscribe to a notification_event; fan out the in-process NotificationWorkflow per matching rule; tenant-scoped with per-tenant precedence. | template/tenant | event__id, is_active, organization (NULL=global-within-tenant), channels, filter, template FK, recipient_rule FK, order |
action_side_effect | Declared emits from an action (notification_event refs); fired at engine-execute or at submission accept time. | template/tenant | event_type, action_type_id |
| Martha Connection (per-tenant) | Holds the per-tenant martha-callback secret in credential_value; Martha attaches it as X-Martha-Service-Auth on callback. | runtime (Martha-owned, BFF-provisioned) | integration_name, name, auth_type=apikey, credential_value, scope=tenant, scope_ref |
| CloudEvent envelope | CloudEvents 1.0 structured JSON; the bridge contract to Martha triggers. Submission events use {app}.{entity}.submitted. | runtime | specversion, type, source, id, subject, time, tenantid (=settings.martha_tenant_id), data{...} |
| Temporal workflow id | The real exactly-once primitive. notification.{event}.{rule_id}.{entity}.{edit_event_id} in-process; submission_create dedup resolves from submission_id. | runtime | deterministic on replay/retry; doubles as the Resend idempotency_key |
How it's declared
Durable execution is authored as data on the action_type and its children, never code. An operator declares a durable action by setting execution_mode: workflow and authoring its submission_contract JSON, where every step key (normalize/validate/duplicates/prepare_create/reference/attachments/audit_policy) names a registered function from app/actions/functions.py — never inline code (one generic function serves every vertical via the {function, config} role shape at app/submissions/intake.py). Engine-mode actions are the synchronous sibling (see The Action Engine); open_phase is engine-mode, not a Martha workflow. Adding a durable side-effect emit is the 2-step operator path (CLAUDE.md #43): declare the emit via /settings/actions/<id> → Effects → New side effect (pick a notification_event), then author at least one notification_rule at /settings/notification-rules, with templates at /settings/notification-templates. The Martha workflow definitions themselves (the generic submission_create chain, the *.*.submitted trigger, the global callback function) are platform artifacts authored via Martha admin REST / the martha-cli (see the martha-cli skill), kept as workflows/*.yaml + scripts/seed_martha.py — a decided design fork: they are deliberately not moved into sheet grammar because the binding is generic (one wildcard trigger serves every vertical).
How it's provisioned
A tenant gets durable execution along two seams. (1) Config: submission_contract lives on the action_type, and notification_rule/notification_template/action_side_effect are template-plane rows copied + FK-rewired by the one-shot full-catalog fork from template_municipality (see Provisioning & the Fork), so a forked tenant's workflow-mode actions and notification fan-out work with zero special-case code. (2) Martha-side wiring: POST /tenants runs the tenant saga's MarthaStep (app/martha_client.py — idempotent get-or-create across the Martha row + KC client + realm group + admin user) plus a VaultStep minting a per-tenant callback secret at secret/data/admin-bff/tenants/{code}/martha-callback, and a Martha Connection holding that secret. The provisioning client authenticates via Martha's super-admin password grant because Martha's require_super_admin rejects service accounts, demanding a human identity. Confirmed honestly: this is shared-Martha in practice. The default martha_tenant_id (550e8400-..., app/config.py) and one global callback function are shared; per-tenant Connections override only the credential, never the workflow/URL.
Extension points (the growth vector)
- New durable-create vertical — declare an
action_typewithexecution_mode=workflow+ asubmission_contract. The generic intake + the*.*.submittedtrigger + the generic create-entity callback handle it with zero new BFF code (the single-vertical seam is dead). - New contract-step behavior — register a function in
app/actions/functions.py, reference it by name in the contract. - New durable notification — author an
action_side_effect+notification_rule; engine actions fire it post-execute, submission actions at accept time. - New Martha-reactive workflow — author the trigger + workflow in Martha keyed off an existing/new CloudEvent type; the BFF only emits the event.
- New in-process durable workflow — register it in
worker.py'sWorker(workflows=[...])and start it from a dispatcher — but note the BFF deliberately runs onlyNotificationWorkflow; heavier orchestration belongs in Martha. - Harden callback auth per-tenant — flip
martha_require_per_tenant_secret=Trueonce per-tenant Vault secrets + Connections genuinely exist (dormant today).
Invariants
- CloudEvent emission is fire-and-forget and never raises — verified:
emit_cloud_eventwraps the POST intry/except Exception: logger.exception(...)(app/actions/cloudevents.py). A Martha/webhook outage must not fail an action. - The ledger row (the 202) is durable before any entity create; accept-when-ontology-down is a core property.
- Exactly-once creation belongs to the Temporal/workflow id (the dedup_key), not the ledger status guard —
transition_submission_to_creating's own docstring disclaims being a claim protocol (app/submissions/ledger.py); it exists for observability + idempotent re-entry. - A callback's tenant is bound to the ledger row's
metadata.tenant_id, never the caller-suppliedX-Tenant-Id; a tenant-less row is refused 422 (#390, verified atapp/routers/submissions.pycreate_entity_from_submission_impl). - The callback is retry-safe: re-entry on a
createdrow only re-runs idempotent post-create effects (verified, same function). - Owner/actor fields are identity-derived from the validated token only, never the body;
is_serviceidentities are transport, never reporters. - Dispatcher and worker agree on the task queue by construction — both read
settings.temporal_task_queue(verifiedworker.pyregistersNotificationWorkflow+ 5 activities on it). - Notification-rule lookup must be tenant-scoped; without
X-Tenant-Idthe ontology returns every tenant's NULL-org global rules and one event fans out via every tenant's rules (live-caught). - CloudEvent data never carries secret-shaped keys (
_drop_secret_fieldsguards every builder); service-auth compare is constant-time.
Gaps & open edges
- Shared-Martha, not per-tenant. One
martha_tenant_idand one global callback function span all tenants; per-tenant Connections override only the credential. True per-tenant durable isolation does not exist today. - Per-tenant callback-secret matching is dormant. #441 built Vault per-tenant secrets; #390 stripped it because under shared Martha the global function sends one static
auth.value, making matching structurally unreachable.martha_require_per_tenant_secretis default-False dead config. - No human-in-the-loop primitive locally. Only the ledger states + the duplicate-quarantine
force_submitmodal exist; longer human-pause workflows live entirely inside Martha, undocumented here. - Best-effort offload, no outbox. The BFF is not a general durable-orchestration host — a swallowed
emit_cloud_eventcan silently drop asubmittedevent: the ledger row persists but the create is never triggered. Emit timeout is 5s with no retry/backoff. - Per-tenant queues on dev machines (e.g.
admin-notifications-valongo-c): a default-queue worker won't drain a tenant's queue, so notifications silently stick if the matching worker isn't running. - Single-secret service-auth in steady state, with no rotation overlap window.
- Occurrence durable-create stays blocked on fork-fidelity — a forked
action_placement.target_configpins a templateoccurrence_typeUUID the fork doesn't rewire; only the civic path is fork-clean. - Stale-doc correction: the prior worker-topology note's "4 activities /
record_deliveriesnot registered" is wrong —worker.pyimports and registers 5 activities includingrecord_deliveries.