commit aa10881dc18e29131a9aef0b79c0302d9f4b0c89
parent 96cacd24bd851bd39908c7d8aebabca08528df86
Author: Michael Camilleri <[email protected]>
Date: Tue, 16 Jun 2026 12:23:59 +0900
Add development notes
Diffstat:
| M | .gitignore | | | 1 | - |
| A | Notes/AppServices.md | | | 157 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
| A | Notes/Engagements.md | | | 89 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
| A | Notes/FriendColours.md | | | 90 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
| A | Notes/GameView.md | | | 194 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
| A | Notes/Journal.md | | | 189 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
| A | Notes/MacVersion.md | | | 126 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
| A | Notes/PresenceTracking.md | | | 92 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
| A | Notes/PushWorker.md | | | 214 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
| A | Notes/ReadHorizons.md | | | 165 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
| A | Notes/ServerSecurity.md | | | 237 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
| A | Notes/Snapshots.md | | | 143 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
| A | Notes/SyncRedesign.md | | | 186 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
13 files changed, 1882 insertions(+), 1 deletion(-)
diff --git a/.gitignore b/.gitignore
@@ -10,6 +10,5 @@ CLAUDE.md
cookies.txt
Generated/
examples/
-notes/
private_keys/
symlinks/
diff --git a/Notes/AppServices.md b/Notes/AppServices.md
@@ -0,0 +1,157 @@
+# AppServices Architecture Note
+
+I would not try to "remove the god object" by scattering construction
+everywhere. `AppServices` is doing something valuable: it gives the app one
+obvious ownership point for long-lived services, keeps SwiftUI environment setup
+simple, and makes startup ordering explicit. Those are real advantages.
+
+The thing I would change is not the centralisation itself, but the number of
+responsibilities inside that central class.
+
+Right now `AppServices` is both:
+
+1. The composition root: constructs `PersistenceController`, `GameStore`,
+ `SyncEngine`, `NYTAuthService`, etc.
+2. The lifecycle coordinator: handles startup, foreground/background sync,
+ remote notifications, share acceptance, reset, and import URLs.
+3. The wiring layer: connects `GameStore` callbacks to `SyncEngine`, connects
+ `SyncEngine` callbacks back to `GameStore`, installs tracing, and refreshes
+ identity.
+4. A small service locator used by views.
+
+That is why it feels dense. The problem is less "many properties" and more
+"many behavior loops in one file."
+
+## Recommendation
+
+Keep `AppServices` as the composition root.
+
+I would still have something like:
+
+```swift
+@MainActor
+final class AppServices {
+ let persistence: PersistenceController
+ let store: GameStore
+ let sync: SyncCoordinator
+ let sharing: SharingCoordinator
+ let imports: ImportCoordinator
+ let nytAuth: NYTAuthService
+ let nytFetcher: NYTPuzzleFetcher
+ let driveMonitor: DriveMonitor
+}
+```
+
+So the app still has one root object. SwiftUI can still receive `services`.
+Tests and previews still have a single place to swap dependencies later.
+Startup is still discoverable.
+
+## Extract Orchestration By Domain
+
+The first extraction I would make is a `SyncCoordinator`.
+
+It would own the current sync lifecycle behavior:
+
+- `start(preferences:)`
+- `syncOnForeground()`
+- `syncOnBackground()`
+- `handleRemoteNotification()`
+- identity refresh
+- `SyncMonitor` tracing
+- `SyncEngine` callback wiring
+- `MoveBuffer` construction/wiring, if you want to group move flushing with sync
+
+That would remove the densest closure wiring from `AppServices` while preserving
+the existing behavior.
+
+Then `AppServices.start(...)` becomes mostly:
+
+```swift
+func start(appDelegate: AppDelegate, preferences: PlayerPreferences) async {
+ guard !started else { return }
+ started = true
+
+ nytAuth.loadStoredSession()
+ driveMonitor.start()
+
+ appDelegate.onRemoteNotification = { await self.sync.handleRemoteNotification() }
+ appDelegate.onAcceptShare = { await self.sharing.acceptShare(metadata: $0) }
+
+ await sync.start(preferences: preferences)
+}
+```
+
+That is a good shape for a composition root: it says what starts, not how every
+subsystem talks internally.
+
+## Extract CloudKit Share And Reset Operations
+
+`acceptShare(metadata:)`, `deleteAllPrivateZones()`, and `leaveAllSharedZones()`
+are a coherent CloudKit-sharing/admin concern. I would put those in something
+like `SharingCoordinator` or `CloudKitMaintenanceService`.
+
+That class would depend on:
+
+- `CKContainer`
+- `SyncEngine`
+- `SyncMonitor`
+- maybe `GameStore` for reset
+
+This keeps the dangerous operations, especially reset and zone deletion, out of
+the general app service bag.
+
+## Extract URL Import
+
+`handleOpenURL(_:)` is also separable. It coordinates:
+
+- file security scope
+- XD file loading
+- Drive import
+- duplicate game lookup
+- game creation
+
+That could become:
+
+```swift
+@MainActor
+final class GameImportService {
+ func importGame(from url: URL) -> UUID?
+}
+```
+
+This is a nice extraction because it has a clear input/output and little
+lifecycle complexity.
+
+## Avoid Over-Abstracting The Simple Services
+
+I would not introduce protocols everywhere yet. For example, I would not
+immediately create `GameStoreProtocol`, `SyncEngineProtocol`, `DriveMonitoring`,
+`PuzzleFetching`, etc. unless tests or previews need them. That would make the
+code feel more "architected" without reducing the actual complexity.
+
+I would also avoid splitting into tiny factories like `PersistenceFactory`,
+`SyncFactory`, or `AuthFactory`. The construction code is not the main problem.
+The callback graph is.
+
+## Target Shape
+
+A reasonable end state would be:
+
+- `AppServices`: owns and exposes app-wide services; starts coordinators.
+- `SyncCoordinator`: owns sync lifecycle, move buffering, sync callbacks,
+ identity, and monitoring.
+- `SharingCoordinator`: accepts shares and maybe handles CloudKit zone
+ cleanup/reset support.
+- `GameImportService`: handles `.xd` URLs and Drive import.
+- `PlayerNamePublisher`: stays as-is.
+- `GameStore`: still stores callback hooks, though over time those could become
+ a more explicit event sink.
+
+The important design rule: `AppServices` can know about everything, but
+everything should not require new business logic inside `AppServices`.
+
+So I would treat the current class as acceptable but past the point where new
+responsibilities should keep landing there. The next time you touch sync
+lifecycle, share acceptance, reset, or file import, extract that domain then. I
+would not do a big architecture-only rewrite unless you already have bugs around
+startup ordering or sync callbacks.
diff --git a/Notes/Engagements.md b/Notes/Engagements.md
@@ -0,0 +1,89 @@
+# Engagements Design Note
+
+## Context
+
+CloudKit push delivers updates in roughly 1-3 seconds. That is fine for durable
+game state, but it is too laggy for two live-play interactions:
+
+1. Collaborator cursor/selection movement.
+2. In-progress letter taps before the authoritative `Move` records land.
+
+An **engagement** is Crossmate's opportunistic low-latency channel for those
+live interactions. It is distinct from a play session and is never the source of
+truth. CloudKit remains authoritative.
+
+## Architecture
+
+- **Transport:** native iOS WebSockets using `URLSessionWebSocketTask`.
+- **Relay:** Cloudflare Worker + one Durable Object per room.
+- **Bootstrap:** `.hail` pings in the shared game zone carry room credentials.
+- **Payload:** existing `EngagementMessage` JSON envelopes: debug text, realtime
+ cell edits, and selection updates.
+- **Fallback:** any socket failure silently returns to CloudKit-only behavior.
+
+## Hail Payload
+
+`.hail` now carries a v2 room bootstrap payload:
+
+```json
+{
+ "ver": 2,
+ "roomID": "<uuid>",
+ "secret": "<base64url 256-bit secret>",
+ "createdAt": "<date>",
+ "expiresAt": "<date>"
+}
+```
+
+Only CloudKit share participants can read the hail. The secret is a room secret,
+not a durable account credential.
+
+## Room Auth
+
+The client connects to:
+
+```text
+/rooms/:roomID/socket
+```
+
+with:
+
+- `authorID`
+- `deviceID`
+- `timestamp`
+- `nonce`
+- `secret`
+- `signature = HMAC-SHA256(secret, roomID|authorID|deviceID|timestamp|nonce)`
+
+The Durable Object verifies the timestamp, rejects nonce replay, verifies the
+HMAC, and stores only a hash of the room secret. This first implementation sends
+the room secret during the WSS upgrade so the object can verify the HMAC; if that
+becomes uncomfortable, move auth into a short-lived token minting step.
+
+## Lifecycle
+
+1. Existing peer-presence logic notices another player in the puzzle.
+2. Deterministic initiator creates `roomID + secret`, writes a directed `.hail`,
+ and connects.
+3. Recipient receives the hail, validates addressee/expiry, connects, and
+ deletes the hail.
+4. `EngagementCoordinator` marks the game live only after socket open.
+5. Socket close clears transient engagement selection state and may retry via
+ the normal presence path.
+
+Coordinator states are intentionally small:
+
+- `idle`
+- `connecting`
+- `live`
+
+## Worker Duties
+
+- Authenticate socket upgrades.
+- Bind one Durable Object to each room ID.
+- Broadcast incoming messages to every other socket in the room.
+- Keep nonce replay state with a short TTL.
+- Expire rooms and close sockets after the room TTL.
+
+The Worker does not know puzzle contents, CloudKit tokens, or durable game
+state.
diff --git a/Notes/FriendColours.md b/Notes/FriendColours.md
@@ -0,0 +1,90 @@
+# Plan: Manually set a colour for a friend
+
+## Status
+
+Not implemented. This document records the research for a future pass —
+estimated at roughly a day of work, with no architectural blockers.
+
+## How friend colours work today
+
+A friend's colour is **derived, never stored**. The whole scheme rests on
+`PlayerColor.stableColor(forAuthorID:reserved:)` (`Models/PlayerColor.swift`),
+a pure function: FNV-1a hash of the `authorID` into the nine-colour palette,
+then a linear probe forward to the first colour that is more than
+`similarHueThreshold` (90°) of hue away from every reserved colour, degrading
+to exact-match avoidance and finally to the raw hashed colour when the
+palette is too crowded.
+
+Two consumers:
+
+1. **Avatars** — `FriendAvatar.avatar(for:)` (`Views/FriendAvatarView.swift`)
+ colours the friend-list/picker/share-sheet avatar with
+ `stableColor(forAuthorID:reserved: [])`. The symbol is hashed off a
+ distinct seed string (`"symbol-\(authorID)"`) so symbol and colour vary
+ independently.
+2. **In-game roster** — `PlayerRoster.applyRoster`
+ (`Models/PlayerRoster.swift`) walks the participants in sorted-authorID
+ order, threading a running `taken` set seeded with the local user's
+ preferred colour (`PlayerPreferences.color`, synced across the user's
+ devices via iCloud KVS). The result is a pure function of
+ (participant set, preferred colour), both identical across the user's
+ devices — so every device derives the same colour for a friend **with no
+ persisted or synced mapping**, and a friend keeps their colour across
+ games unless a lower-sorted collaborator collides with them.
+
+That "no persisted colour state" invariant is the one real cost of the
+feature: once overrides exist, roster output depends on local data plus sync
+timing, so two devices can briefly disagree after a pin until the sync lands.
+
+## The pieces
+
+### Easy
+
+- **Storage.** Add an optional `colorID: String` attribute to `FriendEntity`
+ (`CrossmateModel.xcdatamodeld`). An added optional attribute is handled by
+ automatic lightweight migration.
+- **Picker UI.** A colour submenu in the friend row's ellipsis menu in
+ `FriendsView`. `PlayerColor.palette` is already picker-ordered (sorted by
+ hue) and `PuzzleView` (~line 947) has an existing colour-picker menu to
+ crib from, including the checkmark-on-current convention.
+- **Avatar override.** `FriendAvatar.avatar(for:)` takes an optional
+ override colour; the call sites that should honour it (`FriendsView`,
+ `FriendPickerView`, `GameShareItem`) all have the `FriendEntity` in hand.
+
+### Substantive
+
+- **Roster pinning.** `applyRoster` fetches overrides from Core Data and
+ treats them as *pinned*: seed `taken` with the local colour plus all pins,
+ assign pinned friends their colour directly, and let the existing probe
+ walk the unpinned friends around them. Design decisions to settle:
+ - An explicit pin should win even when it collides with the local user's
+ own colour or another friend's pin — the user did it on purpose.
+ - Two friends pinned to the same colour is the user's prerogative; don't
+ fight it.
+ - Pinning one friend deterministically bumps some auto-assigned friends to
+ new colours (their probe now starts from a different `taken` set). This
+ is inherent and fine, but worth knowing.
+- **Cross-device sync.** Without sync, iPhone and iPad disagree about a
+ friend's colour. The precedent is the block flag: it syncs across the
+ user's own devices as a `Decision` record in the account zone
+ (`SyncEngine.enqueueDecision(kind: "block", key: authorID)`), projected
+ back onto `FriendEntity` by the `kind` switch in
+ `RecordSerializer.applyDecisionRecord`. A `"color"` kind is maybe 30 lines
+ following that template:
+ - `enqueueDecision` already accepts a `payload: String?` — carry the
+ `colorID` there.
+ - The Decision record name is kind+key, so a re-pin for the same friend
+ overwrites the prior decision — last write wins, which is the behaviour
+ we want.
+ - Clearing a pin (back to automatic) can ride
+ `enqueueDecisionDeletion(kind:key:)`, which also already exists.
+
+## Suggested order
+
+1. Model attribute + `applyDecisionRecord` `"color"` case (so the projection
+ exists before anything writes it).
+2. Roster pinning in `applyRoster`, with `PlayerRosterTests` coverage for
+ the pin-wins and bump-the-others cases.
+3. Avatar override plumbing.
+4. Picker UI in `FriendsView`, writing the entity and enqueuing the
+ Decision.
diff --git a/Notes/GameView.md b/Notes/GameView.md
@@ -0,0 +1,194 @@
+# Per-keystroke hitches on populated databases
+
+## Symptom
+
+Solving a puzzle on iPhone 14 Pro Max produces visible hitches as letters
+appear in the grid. The same build on a freshly-set-up iPhone 13 Mini (new
+Apple Account, empty store) is smooth. Disabling iCloud sync in Settings does
+not help. Disabling ProMotion (removing `CADisableMinimumFrameDurationOnPhone`
+from `Info.plist`) does not help. Running "Reset Database" from the
+diagnostics screen restores responsiveness immediately.
+
+The shared property of the three things that *did* affect responsiveness
+(brand-new account, reset database) is **how much data the local Core Data
+store contains**, not network or display behaviour.
+
+## Diagnosis
+
+Per-keystroke trace:
+
+1. The user types a letter. `GameMutator.setLetter` updates the in-memory
+ `Game` and calls `emitMove`, which schedules
+ `MoveBuffer.enqueue(...)` on the `MoveBuffer` actor.
+2. If the new edit targets a different cell than the previous one,
+ `enqueue` triggers `performFlush()` for Lamport ordering.
+3. `performFlush` calls `persistAndAssignLamports`, which on a background
+ context inserts a `MoveEntity`, updates the `GameEntity`
+ (`lamportHighWater`, `updatedAt`), updates a `CellEntity`, and saves.
+4. `viewContext.automaticallyMergesChangesFromParent = true`
+ (`PersistenceController.swift:39`) merges the background save into the
+ main context.
+5. `GameListView` declares
+ `@FetchRequest<GameEntity>(sortDescriptors: [], animation: .default)`
+ (`GameListView.swift:11-15`). The merge invalidates the fetch request,
+ which marks `GameListView` dirty.
+6. SwiftUI recomputes `GameListView.body`. The body reads two computed
+ properties:
+
+ ```swift
+ private var inProgress: [GameSummary] {
+ games.compactMap(GameSummary.init(entity:))
+ .filter { $0.completedAt == nil }
+ .sorted { ... }
+ }
+ private var completed: [GameSummary] {
+ games.compactMap(GameSummary.init(entity:))
+ .filter { $0.completedAt != nil }
+ .sorted { ... }
+ }
+ ```
+
+ Each calls `games.compactMap(GameSummary.init(entity:))` independently.
+7. `GameSummary.init(entity:)` (`GameStore.swift:34-77`) calls
+ `XD.parse(source)`, constructs a full `Puzzle`, and walks every
+ `CellEntity` to build the thumbnail array.
+
+So per keystroke, on the main thread:
+
+```
+GameCount × 2 × (XD.parse + Puzzle init + thumbnail build)
+```
+
+With one game in the store the cost is invisible. With many games it is
+proportional to library size and lands on every cell-change keystroke.
+
+## Why this matches every observation
+
+- **Empty DB after reset → fast.** The `@FetchRequest` returns nothing, so
+ the body has no per-game work to do.
+- **iCloud disabled didn't help.** The work is local: a Core Data save in
+ the view context is enough to invalidate the fetch request. Network
+ state is irrelevant.
+- **ProMotion disabled didn't help.** The cost is real and on the main
+ thread. Capping the refresh rate only changes whether the dropped frame
+ is visible, not whether the work happens.
+- **iPhone 13 Mini (new account) was fast.** Same code, but the library was
+ empty, so the FetchRequest had nothing to iterate.
+- **Got worse over time.** Every game added to the library multiplies the
+ per-keystroke cost.
+
+## Why `GameListView` is involved while `PuzzleView` is on screen
+
+`NavigationStack` does **not** unmount the root view when a destination is
+pushed; it only covers it visually. The root remains in the view hierarchy,
+its property wrappers stay live, and SwiftUI continues to invalidate and
+re-evaluate its body whenever a tracked dependency changes. `@FetchRequest`
+is a Core Data observer; it fires on context changes regardless of whether
+the view that owns it is currently visible. SwiftUI does not skip body
+evaluation for "covered" views, because visibility is a render-tree concept
+while body evaluation is a data-graph concept — it has to compute the new
+body to know whether anything visible changed (toolbar, title, navigation
+state, transitions, etc.).
+
+`animation: .default` on the fetch request also wraps each invalidation in
+an animation transaction, which can amplify the perceived hitch.
+
+## Secondary issue (not the root cause of the hitch)
+
+`MoveEntity` rows are only pruned after CloudKit confirms the snapshot save.
+`AppServices.swift:71` wraps the entire `afterFlush` callback in
+`guard isEnabled` — meaning that with iCloud sync disabled, snapshots are
+never created and moves are never pruned. The move table grows unbounded.
+This is not what is producing the per-keystroke hitch (the dominant cost is
+the FetchRequest + `XD.parse`), but it is a real correctness/space concern
+worth fixing separately so users running with sync off don't accumulate
+forever.
+
+## Verification
+
+Drop a one-line probe at the top of `GameListView.body`:
+
+```swift
+let _ = print("GameListView body")
+```
+
+Run a build against the populated database. If the print fires on every
+cell-change keystroke, the diagnosis is correct.
+
+## Fixes
+
+### Recommended: gate `@FetchRequest` on visibility
+
+Stop doing the work entirely while the puzzle is on screen. The
+`@FetchRequest` only observes Core Data because the view that owns it is
+live. Removing the owning view from the hierarchy tears down the underlying
+`NSFetchedResultsController`. `NavigationStack` fires `.onDisappear` on the
+root when a destination is pushed, which is the signal we need.
+
+```swift
+struct GameListView: View {
+ let store: GameStore
+ let shareController: ShareController
+ @Binding var navigationPath: NavigationPath
+
+ @State private var isActive = true
+
+ var body: some View {
+ Group {
+ if isActive {
+ GameListContent(
+ store: store,
+ shareController: shareController,
+ navigationPath: $navigationPath
+ )
+ } else {
+ Color.clear
+ }
+ }
+ .onAppear { isActive = true }
+ .onDisappear { isActive = false }
+ }
+}
+
+private struct GameListContent: View {
+ // current GameListView body, including:
+ @FetchRequest(sortDescriptors: [], animation: .default)
+ private var games: FetchedResults<GameEntity>
+ // ...
+}
+```
+
+Push `PuzzleView` → `.onDisappear` → `isActive = false` →
+`GameListContent` is removed → `@FetchRequest` is torn down → keystrokes no
+longer have a SwiftUI subscriber to invalidate. Pop back → `.onAppear` →
+`isActive = true` → fresh fetch, list reflects whatever changed.
+
+Caveats:
+
+1. `.onDisappear` also fires when a sheet covers the list (`NewGameSheet`,
+ `SettingsView`). That's fine — list is not visible then either — but
+ quick sheet dismissals will flap `isActive`. If that becomes a problem,
+ gate on debounce or on `navigationPath.count > 0` instead.
+2. The first frame after popping back has to fetch and build summaries.
+ For a small-to-moderate library this should be imperceptible; if it
+ isn't, render a cached lightweight snapshot in the inactive branch.
+3. This addresses the "while typing" case only. Opening the library on a
+ populated database still triggers the same `XD.parse`-per-game work
+ during scrolling and resizing. If that becomes a problem, the longer-
+ term fix is to cache derived summary fields (title, publisher, date,
+ gridWidth, gridHeight, blockMask) directly on `GameEntity` so the list
+ path never parses XD.
+
+### Alternative: cache derived puzzle data on `GameEntity`
+
+Pre-compute and store everything `GameSummary` needs (title, publisher,
+date, width, height, block mask) on `GameEntity` at creation time. `XD.parse`
+disappears from the hot path entirely. This is a bigger change (schema +
+migration + initialisation) but fixes both the "while typing" case and the
+"opening a populated library" case.
+
+### Cheap partial: deduplicate the iteration
+
+`inProgress` and `completed` both iterate `games.compactMap(...)`. Compute
+the summaries once per body and partition. Halves the constant but stays
+O(games) per keystroke; not a real fix.
diff --git a/Notes/Journal.md b/Notes/Journal.md
@@ -0,0 +1,189 @@
+# Move Journal — Design
+
+A local, append-only log of every grid move, plus undo/redo built on top of it.
+Phase 1 is entirely local (Core Data only, no CloudKit). Phase 2a uploads each
+device's journal at game completion so the whole game can be replayed; it is
+**built**. Phase 2b (the replay viewer that merges all devices' journals) and
+the undo completion-lockout are still deferred — see "Remaining work" below.
+
+## Goals
+
+1. **Undo / redo during play.** A player can walk back their own moves, with no
+ fixed depth limit. Undo applies a *forward* mutation through the normal
+ `GameMutator` path, so it syncs to collaborators like any other edit — it is
+ not a time machine.
+2. **Replay after completion (Phase 2).** Merging every device's uploaded
+ journal by timestamp reconstructs the full game, including corrections.
+
+**Every grid-changing move is recorded** (so replay is faithful), but only a
+subset is **undoable**: letter `input` (adds/single-cell deletes) and `clear`
+(the bulk clear gesture). `check` and `reveal` are recorded for replay but never
+offered as undo steps — they're help actions, not edits you rewind.
+
+## Why a new structure (not the Moves record)
+
+`MovesValue.cells` (`Sync/Moves.swift`) is a *compacted current-state* map —
+`[GridPosition: TimestampedCell]`, only the latest touch per cell survives. It
+carries no history, so neither undo nor replay can be reconstructed from it.
+Snapshotting the whole Moves value per move is O(moves × cells) — tens of MB for
+a single game — and is also collaboratively wrong (restoring a snapshot rewrites
+every cell, clobbering peers). Instead we log **one entry per changed cell**, in
+the Moves cell shape, and recover the "before" value via a back-pointer.
+
+## Data model — `JournalEntity`
+
+Local Core Data entity, cascade-deleted with its `GameEntity`. Append-only;
+rows are never mutated or deleted in normal operation.
+
+| Field | Type | Purpose |
+|---|---|---|
+| `gameID` | UUID | game scope |
+| `seq` | Int64 | monotonic per game; stable local ordering + undo walk |
+| `timestamp` | Date | wall-clock; Phase 2 replay ordering + cross-device merge |
+| `row`, `col` | Int16 | cell |
+| `letter` | String | **after**-state letter |
+| `markCode` | Int16 | after-state `CellMark` as one lossless code (see `CellMarkCodec.code`) |
+| `cellAuthorID` | String? | preserved letter author (mirrors `emitMove`) |
+| `actingAuthorID` | String? | iCloud user who made the move |
+| `kind` | Int16 | 0 input / 1 check / 2 reveal / 3 clear / 4 undo / 5 redo (`input` + `clear` are undoable) |
+| `targetSeq` | Int64? | for undo/redo, the entry it acts on |
+| `batchID` | UUID? | groups one gesture (bulk check/reveal/clear, or one undo/redo op) into a single undo step |
+| `prevSeqAtCell` | Int64? | seq of the previous entry at this `(row,col)`; `nil` = cell was empty before |
+
+There is **no stored before-state**. To undo entry `E`, follow `E.prevSeqAtCell`
+to one prior entry and apply *its* after-state (or empty if `nil`). O(1) keyed
+lookup, no scan; after-states stay the single source of truth and the pointer
+can never go stale because the log is immutable.
+
+Adding the entity is an additive, automatic lightweight migration. Unlike the
+synced Moves wire format (which flattens the mark into `markKind` + two check
+bools), the journal stores the whole `CellMark` as one `markCode` — the
+two-bool form is a wart we chose not to propagate into new local storage. The
+shared `CellMarkCodec` now owns both encodings.
+
+## `MovesJournal`
+
+`@MainActor` class, a shared singleton beside `MovesUpdater`. Holds the current
+game's entries in memory (loaded lazily from Core Data on first use, so undo
+survives relaunch), persists each new entry on a background context (the
+in-memory list is authoritative for the session, so async persistence is fine).
+
+- `record(...)` — assigns `seq` (= max+1), looks up & updates a
+ `[GridPosition: Int64]` last-seq map to set `prevSeqAtCell`, appends in memory,
+ persists.
+- `planUndo(gameID:)` / `planRedo(gameID:)` — derive the next step and return
+ per-cell restore instructions; pure, no mutation of state.
+- `canUndo` / `canRedo`.
+
+### Undo/redo derivation
+
+One pass over the in-memory entries (seq order), grouping consecutive entries
+that share a `batchID`/kind into *operations*, then a stack machine:
+
+```
+for op in operations:
+ userEdit -> redoStack = []; liveStack.append(op) // new edit cuts redo branch
+ undo -> if let s = liveStack.popLast() { redoStack.append(s) }
+ redo -> if let s = redoStack.popLast() { liveStack.append(s) }
+```
+
+`liveStack.last` is the next thing to undo; `redoStack.last` the next to redo.
+Undone entries are never deleted — replay keeps the true history — the
+derivation just stops offering them once a new edit cuts the branch.
+
+### The superseded guard
+
+Each restore instruction carries the entry's after-state as `expectedCurrent`,
+which `GameMutator` compares (by **letter only**) to the live `game` cell. If
+the letter differs, a collaborator (or a later edit) changed the cell since, so
+we **skip that cell** — this stops undo from ever clobbering someone else's
+current letter. Only the letter is compared so that a peer's check/reveal (a
+mark-only change) doesn't block undoing one's own letter. In a batch, superseded
+cells are skipped individually; the step still counts as undone.
+
+## `GameMutator` / `Game` hooks
+
+- `emitMove` gains `journalKind` + `batchID` + `targetSeq` and, alongside the
+ existing sync enqueue, calls `movesJournal.record(...)` with the after-state.
+ A move is recorded only when the cell's state actually changed.
+- `setLetter`/`clearLetter` → `kind = .input`, `batchID = nil` (one step each).
+ `checkCells`/`revealCells`/`clearCells` → `kind = .check`/`.reveal`/`.clear`,
+ one shared `batchID`. Only `.input` and `.clear` are undoable; `.check`/
+ `.reveal` are recorded for replay but inert to the stack machine.
+- `undo()` / `redo()`: ask the journal for a plan, apply surviving instructions
+ via a new low-level `Game.applyCellState(...)` (sets entry + mark + author
+ directly; can override the reveal lock), emitting each as `kind = .undo`/
+ `.redo` under one op `batchID`.
+- `canUndo` / `canRedo` drive the UI control's enabled state.
+
+## Open questions / notes
+
+- **Undo control UI.** Buttons in `PuzzleView` are deferred to follow-up.
+- **`recordedEntries(gameID:)`** exposes the log read-only; it's a seam Phase 2b
+ replay can read locally (the cross-device merge reads uploaded assets).
+
+## Phase 2a — journal upload (built)
+
+At game completion (win or resign, in `GameStore`), each device encodes its
+whole local `JournalEntity` log via `JournalCodec` (JSON) and pushes it as a
+`CKAsset` on a `Journal` record — one per `(game, authorID, deviceID)`, named
+`journal-<gameID>-<authorID>-<deviceID>`, written into the game's existing zone.
+It mirrors the `Moves` record path and is **write-once with no system-fields
+archive** (like `Ping`/`Decision`): `buildRecord` reconstructs the asset from the
+durable `JournalEntity` rows, so a pending upload survives an app kill, and
+`MovesJournal.flush()` is awaited before enqueue to beat the journal's async
+persistence. Inbound `Journal` records are deliberately ignored (`case
+"Journal": break`) — 2b fetches them on demand instead of applying peers' logs.
+
+`JournalCodec` carries the mark as the single lossless `markCode` (matching
+`JournalEntity`), not the `Moves` two-bool flattening, and decodes optionals
+leniently for forward-compat. Key files: `Persistence/Journal.swift`,
+`Sync/RecordSerializer.swift`, `Sync/RecordBuilder.swift`, `Sync/SyncEngine.swift`
+(`enqueueJournalUpload`), `Persistence/GameStore.swift`, `Services/AppServices.swift`.
+
+CloudKit schema: the `Journal` record type (`authorID` String, `deviceID`
+String, `updatedAt` Date/Time, `entries` Asset) was added manually and deployed
+to Development and Production on 2026-05-30. No new zone.
+
+## Phase 2b — cross-device replay data layer (built)
+
+The read side of replay. `SyncEngine.fetchReplay(forGameID:)` (in
+`Sync/CloudQuery.swift`) runs two zone-scoped `CKQuery`s against the finished
+game's zone — one for `Journal` records (decoding each `entries` asset via
+`JournalCodec`) and one for `Moves` records (keys only) — and returns a
+`JournalReplayFetch { journals, expectedDevices }`. It's a plain query, so it
+never disturbs the sync engine's change token; inbound `Journal` records stay
+ignored in the delegate path.
+
+`Persistence/JournalReplay.swift` holds the pure, unit-tested core:
+- `ReplayTimeline(merging:)` flattens every device's log and orders it by
+ `timestamp` (ties break on `actingAuthorID` then `seq`, so fetch order can't
+ change the result). `state(through:)` folds the first *n* steps last-write-
+ per-cell into `[GridPosition: JournalCellState]`. Forward replay needs only
+ each entry's after-state, so device-local `seq`/`prevSeqAtCell` are unused;
+ undo/check/reveal rows replay naturally as timestamped restores.
+- `ReplayAssembler.assemble(fetch:localKey:localEntries:)` overlays this
+ device's *live* log over any uploaded copy of itself, then enforces **strict
+ completeness**: `.ready(timeline)` only when a journal is present for every
+ expected device, else `.waiting(missing:)`. `AppServices.loadReplay(gameID:)`
+ composes `fetchReplay` + `GameStore.localReplaySource` + the assembler and
+ maps an unreachable zone to `.unavailable`.
+
+**Strict completeness needs late devices to upload.** Completion learned purely
+via sync used to set `completedAt` without uploading the local journal, so a
+device away at finish time would block replay forever. `applyGameRecord` now
+signals a not-completed → completed transition (`onCompletedTransition`), and
+both inbound apply paths enqueue this device's journal upload — so any
+contributing device that ever syncs the finished game converges. Residual: a
+device that wrote cells and is then permanently gone keeps replay `.waiting`
+indefinitely (accepted trade-off; a "replay anyway" affordance could relax it).
+
+## Remaining work
+
+- **Phase 3 — scrubber UI.** A scrubber inside the finish banner (above the
+ victory image + scoreboard in `SuccessPanel`), head starting hard-right at the
+ finished grid and dragging left to rewind. Drives the main `GridView` from
+ `ReplayTimeline.state(through:)`; disabled with a "waiting to sync history"
+ overlay while `loadReplay` returns `.waiting`.
+- **Completion lockout.** Gate undo/redo off once the game's `completedAt` is
+ set, so a finished game can't be rewound.
diff --git a/Notes/MacVersion.md b/Notes/MacVersion.md
@@ -0,0 +1,126 @@
+# Bringing Crossmate to the Mac
+
+Notes from an exploration in June 2026. We did not ship anything — we evaluated
+two routes, prototyped the Catalyst one far enough to launch, then decided it
+was not worth pursuing right now. This records what we learned so a future
+attempt does not start from zero.
+
+## The two routes
+
+**Mac Catalyst** — almost free to get building, a poor long-term fit.
+**Native macOS (SwiftUI)** — the right shape, but a multi-week effort.
+
+Catalyst is genuinely ~2 lines of `project.yml` plus a couple of fixups. Native
+is a real project: a Mac UI layer, sharing rewritten off `UICloudSharingController`,
+the notification extension re-targeted, NYT re-plumbed. The honest framing is
+that Crossmate is a collaborative, touch-first crossword app, and the Mac
+audience may not justify either the Catalyst compromises or the native cost.
+That is why we stopped.
+
+## What we confirmed about the codebase (the encouraging part)
+
+The engine is already cleanly separated and almost entirely UIKit-free, which is
+the precondition for *either* route:
+
+- `Crossmate/Models` (21 files), `Crossmate/Sync` (21), `Crossmate/Persistence`
+ (5): **zero `import UIKit`**. The few `import SwiftUI` (PlayerColor,
+ PlayerPreferences, SyncEngine) are fine — SwiftUI is cross-platform.
+- Core puzzle subviews `CellView.swift`, `GridView.swift`, `ClueList.swift` are
+ **pure SwiftUI** — reusable on macOS as-is.
+- UIKit coupling is concentrated in `Crossmate/Views` and 4 services. The
+ background-task coupling (`UIApplication.beginBackgroundTask` /
+ `UIBackgroundTaskIdentifier`) is already abstracted behind closures in
+ `PuzzleSession` (the `beginBackgroundAssertion`/`endBackgroundAssertion`
+ pair), with the UIApplication implementations in `SessionCoordinator` and
+ `AppServices.ensureInBackground`. On macOS the replacement is
+ `ProcessInfo.performExpiringActivity`.
+- Engine constructors are light: `SyncEngine(container:persistence:)` and
+ `GameStore(persistence:movesUpdater:authorIDProvider:…closures)`. A minimal
+ Mac composition root is feasible **without** reusing the monolithic
+ `AppServices` (which eagerly builds push/NYT/engagement and `import UIKit`s).
+
+`../Listless` is the working reference for the native route — same
+CloudKit/CKShare stack, shipped natively on macOS.
+
+## Catalyst route — what it actually took
+
+For the record, getting Catalyst to *build* was:
+
+1. `SUPPORTS_MACCATALYST: YES` in `project.yml` settings.
+2. Remove `LSSupportsOpeningDocumentsInPlace: false` from the iOS Info.plist —
+ macOS rejects an explicit `NO`; absent defaults to the same behaviour.
+3. Guard three iOS-26-only SwiftUI APIs with `#available(iOS 26.0, *)` +
+ fallbacks: `glassEffect` (in `SuccessPanel` ×2 and `PuzzleView` pencil
+ button) and `ToolbarSpacer` (in `GameListView`). These guards are worth
+ keeping regardless of route — they make the version-gating explicit.
+
+### The deployment-target trap (important, non-obvious)
+
+Catalyst derives its **minimum macOS from the iOS deployment target** via a fixed
+Apple mapping (iOS 26 → macOS 26 Tahoe; iOS 18 → macOS 15). Crossmate targets
+iOS 26, so a stock Catalyst build demands macOS 26 and Xcode refuses "My Mac" as
+a run destination on anything older.
+
+- To run on macOS 15 you must lower the **iOS-equivalent availability floor**,
+ not the link-time minimum. The lever is
+ `IPHONEOS_DEPLOYMENT_TARGET[sdk=macosx*] = 18.0` (Catalyst compiles under the
+ `macosx` SDK), which keeps the real iOS app at 26 while the Catalyst variant
+ targets iOS 18 / macOS 15.
+- **Do not** instead set `MACOSX_DEPLOYMENT_TARGET = 15.0`. That silences the
+ compiler (it still uses the iOS-26 floor for availability) so iOS-26 API
+ compiles, but the binary then *crashes at runtime* on macOS 15 when it hits a
+ symbol that does not exist there. We verified this: it "builds" with zero
+ errors and would have shipped a crash.
+- Verify the result with `vtool -show-build <binary>`: a Catalyst binary reports
+ `platform MACCATALYST` and `minos` in **iOS** numbers (18.0 == macOS 15).
+
+With the floor lowered, "My Mac (Mac Catalyst)" became an eligible destination
+and the only remaining blocker was code signing (no Mac Catalyst provisioning
+profiles — a portal/account step, not code).
+
+### The runtime wall we hit
+
+Catalyst launched and ran. Opening the **puzzle picker** (`NewGameSheet`) then
+crashed:
+
+```
+No Observable object of type NYTAuthService found.
+```
+
+`NewGameSheet` reads `@Environment(NYTAuthService.self)`, injected once at the
+root in `CrossmateApp`. On iOS, sheets inherit that environment; **on Catalyst
+the `@Observable` object did not propagate across the sheet boundary.** Several
+sheets read root-injected observables (`SettingsView` reads `nytAuth` too), so
+this is systemic, not one view. The fix would be to re-inject the needed
+observables on each presented sheet from the presenting side (you cannot read
+them inside the sheet — that is the broken environment). We did not finish it;
+this quirk is a good example of why Catalyst is "runs but fights you."
+
+## If we revisit: recommended shape (native)
+
+We had settled on, and still recommend, a **scaffolding-slice** first increment:
+
+- SwiftUI App lifecycle (not a full AppKit shell like ListlessMac — far less
+ code; add AppKit only where needed, e.g. sharing).
+- **XcodeGen source globs, no file moves**: a new `Crossmate macOS` target whose
+ `sources` curate the clean engine folders + a whitelist of portable views +
+ a new `CrossmateMac/` UI layer. Prefer **excluding** iOS-only files over
+ `#if`-guarding them — keeps the diff small and the engine untouched.
+- A **minimal Mac composition root** wiring `PersistenceController →
+ MovesUpdater → GameStore (+ SyncEngine)` with no-op change closures, rather
+ than reusing `AppServices`. Factor `AppServices` into a shareable base only
+ later.
+- Set `deploymentTarget.macOS` to match whatever the dev host runs (we used
+ 15.0 against a macOS 15.7 host) so the native target builds **and runs/tests
+ locally** — the native route sidesteps the Catalyst deployment-target trap
+ entirely.
+- First slice = launches + game list + **read-only** puzzle (reuse `GridView`/
+ `CellView`/`ClueList`). Defer, each as its own increment: CKShare sharing UI
+ (`UICloudSharingController` → `ShareLink`/AppKit), push + `NotificationService`
+ extension, NYT import/auth, and text entry (no on-screen keyboard on Mac —
+ input is already hardware-keyboard-driven via `HardwareKeyboardInputView` +
+ `UIKeyCommand`, with `InputMonitor` hiding the custom keyboard when a hardware
+ keyboard is present, which on a Mac is always).
+
+The full draft plan lived only in the planning tool and was not saved to the
+repo; this note is the durable summary.
diff --git a/Notes/PresenceTracking.md b/Notes/PresenceTracking.md
@@ -0,0 +1,92 @@
+# Plan: Split the presence lease from the read watermark
+
+## Status
+
+Implemented. This document records the design and the rationale.
+
+## The problem
+
+A single field — the CloudKit `Player.readAt` (locally
+`GameEntity.lastReadOtherMoveAt`, peer copy `PlayerEntity.readAt`) — was read
+with two incompatible meanings:
+
+1. **Presence lease.** While a puzzle is on screen the app writes
+ `readAt = now + 10 min` (a forward-dated horizon). `PeerPresence.isPresent`,
+ the engagement-room teardown, `soonestPresentPeer` and `hasPresentPeer` all
+ treat a future `readAt` as "this player is here right now."
+2. **Read watermark.** The session-end ("stopped solving — filled N") push and
+ the unread badge treat `readAt` as "this account has *seen* moves through
+ T."
+
+The forward-dating makes both work *while you're looking*. It breaks the
+moment you stop: the lease stays in the future for up to 10 minutes after you
+background, so
+
+- a **peer** computing what you've missed sees your future `readAt`, concludes
+ you've already read their later moves, and **suppresses the summary push**;
+ and
+- your **own unread badge** compares "latest other move" against the future
+ `readAt` and stays clear, **swallowing moves that arrived after you left.**
+
+Observed in a diagnostics log: Bob solved (+17 fills) at 00:26 while Alice was
+backgrounded with a lease valid to 00:34. The moves synced in and surfaced only
+as the in-app session-summary banner on reopen — no push, no badge. The lease
+frequently does *not* collapse cleanly on background (`self … readAt=00:34:20`
+long after leaving), so "collapse the lease harder" is not a fix. The watermark
+has to be its own value.
+
+## The fix
+
+Two separate fields:
+
+| Concept | Local (`GameEntity`) | Peer (`PlayerEntity`) | Wire (`Player`) |
+|---|---|---|---|
+| Presence lease (forward-dated) | `lastReadOtherMoveAt` | `readAt` | `readAt` |
+| Read watermark (never future) | `readThroughAt` | `readThrough` | `readThrough` |
+
+- **`readAt` keeps its exact current meaning and behaviour** — the forward-dated
+ presence lease. Presence is the fragile, real-time subsystem and is what
+ behaves correctly today, so it is left untouched.
+- **`readThrough` is new**: the latest other-author move time the account has
+ *actually observed*. Monotonic, never leased into the future.
+
+### Who writes the watermark
+- **On open** (`GameStore.markOtherMovesRead`): advance to `latestOtherMoveAt`.
+- **While viewing** (`GameStore.noteIncomingMovesUpdate` lockstep): advance to
+ the latest inbound move, gated on `NotificationState.isSuppressed` (eyes on
+ the grid).
+- **On leave/background** (`publishReadCursor(.currentTime)`): advance to `now`
+ — present right up to here — *in lockstep with* collapsing the lease.
+
+`publishReadCursor(.activeLease)` keeps writing only the lease. The two no
+longer fight over one field.
+
+### Who reads the watermark
+- **Session-end push**: `pushPlan` sources `PushRecipient.readThrough` from the
+ peer's `PlayerEntity.readThrough`; `SessionPushPlanner` windows the tally on
+ `max(readThrough, notifiedThrough)`. Presence reads are untouched.
+- **Unread badge**: `GameSummary.computeHasUnread` compares `latestOtherMoveAt`
+ against `readThroughAt`.
+
+### Wire + multi-device
+- `RecordBuilder`/`RecordSerializer` send both fields; `CloudQuery` desiredKeys
+ request `readThrough`; `RecordApplier` adopts the peer copy, and for our own
+ record coming back from a sibling device advances the shared
+ `GameEntity.readThroughAt` monotonically (so the badge clears on every device
+ once any device reads). The watermark rides the Player record sync; the fast
+ `accountSeen` push still carries only the lease.
+
+## Compatibility
+
+The app is unreleased (two testers), so **no backward compatibility was kept**:
+old records simply have a `nil` `readThrough` (tallies their whole backlog,
+self-healing on the next read). Core Data migration is lightweight-additive
+(two optional `Date` attributes); `PersistenceController.recreateStore` is the
+fallback if a store can't be opened.
+
+## Follow-up (v4)
+
+`readAt` is now purely a presence horizon, so its name is misleading. A `TODO(v4)`
+note sits at `GameStore.setReadCursor` and `RecordSerializer.playerRecord`:
+rename the `readAt` CloudKit field (and `lastReadOtherMoveAt`) to something like
+`presenceLeaseUntil` when the schema next breaks.
diff --git a/Notes/PushWorker.md b/Notes/PushWorker.md
@@ -0,0 +1,214 @@
+# Push Worker + Sync Simplification
+
+## Context
+
+After the engagement Worker landed (collaborative live-solving over WebSockets, 2026-05), it became natural to use the same Cloudflare runtime for push notifications. Pings were unreliable for time-critical user-facing events, and the receiver-side heuristics that had grown around them (`SessionMonitor.presentBegins`, the 3-minute quiescence-window scheduler) were brittle. This pass moves the user-facing event notifications onto a second Cloudflare Worker that signs APNs JWTs and pushes alerts directly, while folding the per-cell state for check / reveal / resign into the Moves stream so it propagates collaboratively.
+
+## What landed (2026-05-27)
+
+## App Attest migration (2026-06)
+
+The push worker no longer relies on a build-baked `PUSH_BEARER` for normal
+clients. Updated apps enroll a per-install App Attest key through
+`GET /attest/challenge` + `POST /attest/register`, then sign every
+`/register`, `/register DELETE`, and `/publish` request with that key.
+
+The worker stores only the attested public key and assertion counter keyed by
+`deviceID` + `keyID`. This preserves the no-Crossmate-account model: CloudKit
+still owns collaboration identity, and the worker only proves that a request
+came from an attested Crossmate install. Existing games keep working because
+their CloudKit `pushAddress` values are unchanged; updated clients simply
+re-register the same addresses with App Attest request auth.
+
+Deploy requirements:
+
+- `APP_TEAM_ID`, `APP_BUNDLE_ID`, and `APP_ATTEST_ENVIRONMENT` are configured
+ in `wrangler.push.toml`.
+- `APP_ATTEST_ROOT_CERT_PEM` must be set as a Worker secret or dashboard
+ variable.
+- `ALLOW_LEGACY_PUSH_BEARER = "1"` may be set temporarily for rollback, but
+ normal deployment should leave it unset.
+
+### Worker
+
+- New `Workers/push-worker.js` + `Workers/wrangler.push.toml`.
+- Single Durable Object class `PushRegistry`, sharded by `idFromName("registry")` (one global instance — Crossmate's scale doesn't need per-author sharding).
+- Three push endpoints, all App-Attest-auth gated:
+ - `POST /register` — `{deviceID, token, environment, addresses}` upsert. Idempotent.
+ - `DELETE /register` — `{deviceID, addresses}` — unregisters dropped address bindings.
+ - `POST /publish` — `{kind, addressees, gameID, fromAuthorID, title, alertBody}` — fans out to every addressee's registered devices.
+- APNs JWT (ES256) signed in-Worker via Web Crypto. JWT cached in DO instance memory and refreshed every ~40 min (APNs' rate-limit floor is ~20 min; the 1-hour ceiling is the upper bound).
+- Per-token `environment` field picks sandbox vs production endpoint; iOS reports `"sandbox"` for Debug builds and `"production"` for TestFlight/App Store.
+- Deployed at the push Worker's `*.workers.dev` URL with `preview_urls = false`.
+
+### Secrets
+
+Loaded via `wrangler secret put --config wrangler.push.toml`:
+
+- `APNS_KEY` — the `.p8` (piped via `< AuthKey_*.p8` stdin redirect; not pasted, because the interactive prompt eats newlines).
+- `APNS_KEY_ID`, `APNS_TEAM_ID` — both 10-char.
+- `APP_ATTEST_ROOT_CERT_PEM` — public Apple App Attest root certificate used
+ by the worker to verify the attestation chain.
+
+APNs key is topic-restricted to the app's bundle ID, both environments enabled. Apple caps at 2 APNs keys per Team, so a single key serving both endpoints leaves headroom.
+
+### iOS plumbing
+
+- `Crossmate/Services/PushClient.swift` — `@MainActor` class. Reads
+ `CrossmatePushBaseURL` from `Bundle.main` and signs worker requests via
+ `PushRequestAuthenticator`.
+- Idempotent: `updateAPNsToken(_:)` on every `didRegisterForRemoteNotificationsWithDeviceToken` callback, `updateAuthorID(_:)` on every `identity.refresh(...)` site. Worker side dedupes unchanged triples.
+- Account switch: explicit `DELETE /register` for the previous `(authorID, deviceID)` before re-registering.
+- Failures go to `syncMonitor.note(...)` for diagnostics; never user-surfaced.
+
+### Push kinds (sender-formatted strings, no `loc-key` — Crossmate is English-only)
+
+| Kind | Trigger | Body |
+|------|---------|------|
+| `win` | `sendCompletionPings(resigned: false)` after the local player completes a puzzle | "Alice solved the puzzle 'X'" |
+| `resign` | `sendCompletionPings(resigned: true)` after the local player gives up | "Alice resigned the puzzle 'X'." |
+| `play` | `PuzzleDisplayView.updateActiveNotificationPuzzleID` when scenePhase becomes `.active` | "Alice is solving the puzzle 'X'" |
+| `pause` | Same handler when scenePhase leaves `.active`, plus the selection-clear block at puzzle close | "Alice added 12 letters and cleared 3 letters in the puzzle 'X'" |
+
+Pause uses `LocalSessionTracker` (`Crossmate/Services/LocalSessionTracker.swift`) — a small `@MainActor` counter wired into `store.onLocalCellEdit` that resets on every `begin(gameID:)` and drains on `consume(gameID:)`. The "skip if both counts are zero" guard inside `publishSessionEndPush` handles dedup between scenePhase-background and puzzle-close naturally.
+
+### Pings that stay
+
+- `.friend` — friendship handshake.
+- `.invite` — re-invite to a game (uses `addressee`; surfaces in the Invited section).
+- `.hail` — engagement room bootstrap.
+- `.join` — the *invite-accept* one (one-shot, when a collaborator first accepts a share). Distinct from the new session-start APN, which is the `play` kind on the push side specifically to avoid this naming collision.
+
+`PingScope` and `PingScopePayload` are gone entirely (only `.check`/`.reveal` used them).
+
+### Check/reveal/resign in Moves
+
+- New `Crossmate/Models/CheckResult.swift` — `enum CheckResult: Sendable, Equatable, Codable { case right, case wrong }`. Used as `CheckResult?`; `nil` is the canonical "unchecked" value (no `.none` case).
+- `CellMark.pen` / `.pencil` now carry `checked: CheckResult?` instead of `checkedWrong: Bool`.
+- `Game.checkCells` now stamps `.right` on correct entries (previously erased them to `.none`). Wrong entries continue to stamp `.wrong` — pen-vs-pencil style is preserved across the check.
+- Wire format and Core Data keep two parallel `checkedRight` / `checkedWrong` booleans (translation lives in `GameMutator.encodeMark` / `GameStore.decodeMark`). `MovesCodec.Payload.Entry` gets a custom `init(from:)` that defaults `checkedRight` to false so records written by older clients still decode cleanly.
+- Resign reveals propagate via the existing Moves reveal mechanism — peer grids fill in cooperatively. No new `Game.resignedBy` field; a resigned game and a reveal-all-puzzle game look identical at cold start (accepted trade-off — the moment-of-resign signal lives in the APN).
+
+### SessionMonitor cleanup
+
+Slimmed to what the on-open catch-up banner actually needs:
+
+- Kept: `ingest(_ deltas:)` (bucket accumulation), `consumeOnOpen(gameID:)`, `cancel(gameID:authorID:)`, `bodyText(playerName:puzzleTitle:added:cleared:)`.
+- Removed: `presentBegins`, `scheduleEnd`, the 3-minute quiescence timer, `Bucket.scheduledFor`, `SessionNotificationScheduling` protocol + `RecordingNotificationScheduler` test recorder, `bodyText(for: Session)`, the end-/begin-identifier helpers, the `notificationCenter` and `notificationAuthorization` constructor params.
+
+The on-open banner still says "Alice added 12 letters; Bob added 5 letters" — that path is independent of the new APNs and remains useful as a catch-up when the user opens an unread puzzle.
+
+## Design decisions
+
+- **Sender-formatted strings, not `loc-key`/`loc-args`.** Crossmate has no `Localizable.strings` / `.lproj` yet. Setting up the localization scaffolding just to feed receiver-side formatting that would produce the same English text is more work than it's worth for a single-locale app. Switch to `loc-key` when a second language ships.
+- **One Durable Object, not per-author sharding.** Single DO instance can comfortably hold thousands of token registrations; `/publish` resolves targets in one `state.storage.list({prefix: ...})` per addressee.
+- **JWT caching is mandatory.** APNs rejects rapidly-rotated provider tokens with `TooManyProviderTokenUpdates`. Cache in DO instance memory (sticky region) — no storage RTTs.
+- **`CellMark` as enum, parallel-Bool storage.** The conceptual model is the optional `CheckResult` enum; the on-disk encoding is two booleans. Translation at the boundary keeps the API clean without requiring a Core Data migration model.
+- **No `Game.resignedBy` distinguisher.** Cold-state conflation of resign vs reveal-all-puzzle is acceptable; the moment-of-resign signal lives in the APN body.
+- **No `lastAction` field on Moves; no Actions record.** Replay is a future feature that will design its own log format with the consuming UX in mind. Designing the granularity now without the consumer would be a guess.
+- **Push registration is fire-and-fetch-error.** Worker is idempotent, so the next launch's APNs callback retries naturally. Failures log to the diagnostics monitor but don't surface to the user.
+- **`preview_urls = false`.** Each preview URL carries the same secret bindings; gratuitous attack surface for no benefit.
+
+## Badge and sibling-device read horizons (2026-06)
+
+Badge state is now split between durable-ish local truth and best-effort push
+transport:
+
+- The Notification Service Extension and the app share an App Group
+ `BadgeState` ledger. It is a per-game horizon map, not a set: `unreadAt`
+ advances when a badge-worthy APN is delivered, `seenAt` advances when the
+ user opens/clears the game, and a game counts only when `unreadAt > seenAt`.
+ This lets stale APNs lose to a newer open without needing CloudKit to arrive
+ first.
+- The app icon badge is computed as `BadgeState.unreadGameIDs()` unioned with
+ Core Data's unread-other-moves query. Core Data unread games are also seeded
+ back into the ledger as `unreadAt` horizons, each stamped with that game's
+ newest unseen other-author move time (`latestOtherMoveAt`). The NSE can't
+ reach Core Data, so without this seed a push landing on a suspended app would
+ re-stamp the badge from the ledger alone and drop any game whose unread state
+ arrived purely via CloudKit sync. The seed is safe precisely because the
+ ledger is a horizon map and not a set: re-seeding a game the user has since
+ opened is a no-op, because its newer `seenAt` still wins. (The old set-based
+ ledger couldn't express that, which is why the write-back was dropped when
+ the horizon map first landed — and why it is safe to reinstate now.)
+- Ledger entries are maintained by lifecycle event, not reconciled by absence
+ (a ledger-only game is ambiguous: it could be push-ahead-of-sync, which must
+ count, or a deleted game, which must not). So `BadgeState.forget(gameID:)` is
+ called on the two hard-removal hooks — local `onGameDeleted` and sync-driven
+ `onGameRemoved` — and `BadgeState.reset()` on the diagnostics data wipe. This
+ matters because a deleted game has nothing left to open, so a stale
+ `unreadAt > seenAt` entry could never be cleared by a seen horizon and would
+ badge forever.
+- The ledger key is `badge.ledger.v2`. The bump from `v1` is a deliberate
+ one-shot discard: the `v1` migration off the pre-horizon set stamped
+ `unreadAt = now` for every legacy entry, fabricating fresh-unread phantoms
+ that no read state could defeat (a future-dated `unreadAt` beats any past
+ `seenAt`). `loadLedger` now drops the legacy stores without fabricating, and
+ rebuilds from Core Data via the seed above.
+- Each account has an account-scoped push address, published as the existing
+ `Decision` record `decision-account-pushAddress` with `kind = account` and
+ the address in the payload. This avoids adding CloudKit schema fields while
+ giving all devices on the same iCloud account a shared silent-push route.
+- When a device joins a shared game, it sends `accountJoined` to the account
+ address. Sibling devices ignore self-sends, fetch/sync, mint their own
+ per-game push address if needed, and register with the Worker so they can
+ receive that game's future pushes.
+- When a device clears/sees a game, it sends `accountSeen` with
+ `senderDeviceID` and `readAt`. Sibling devices ignore self-sends, apply the
+ supplied `readAt` to their local read cursor and badge ledger, withdraw any
+ matching delivered notifications, and refresh the app badge. Inbound
+ `accountSeen`, CloudKit `Player.readAt` updates and ping deletions are
+ non-echoing so the account push path remains one-hop rather than a feedback
+ loop.
+- If the game is actively visible, the seen push uses the same future
+ read-lease horizon as `Player.readAt`. That prevents an account-seen message
+ from racing after an active-lease update and accidentally collapsing
+ presence on another device.
+
+The push server is still not durable truth for badges. It only transports
+same-account hints quickly. If a silent push is dropped, the next app launch or
+CloudKit fetch recomputes from local Core Data plus whatever the NSE delivered
+on that device.
+
+## Open items / gotchas
+
+- **Schema deployment is not needed.** Moves records ship the cells as an opaque binary blob; the new `checkedRight` boolean rides inside that blob. No CloudKit dashboard field to add (unlike, say, `Player.readAt`). The `CellEntity.checkedRight` Core Data attribute auto-migrates locally on first launch (Boolean default `NO`).
+- **No visual indicator UI yet.** The cell state is now propagating; rendering it on the grid is separate UX work. Plan was tri-state: subtle dot for `.right`, distinct marker for `.wrong`, cleared when the cell's letter changes.
+- **The bearer in the binary is the only abuse gate.** That matches your stated comfort level (users aren't adversarial). If the binary is ever reverse-engineered the bearer leaks; rotate via `wrangler secret put` if that happens.
+- **Background-time publish.** Backgrounding while in a puzzle fires `publishSessionEndPush` from a `Task` that runs as the app suspends. iOS gives ~5–10 seconds; in practice a small POST to a Cloudflare Worker completes well within that window. If we see drops in the field, switch to a `BGTaskScheduler` background task or a background `URLSession`.
+- **`onPlayerEvent` removed from `PlayerSession`** — the 6 fire sites that used to enqueue check/reveal pings are gone. Anyone re-introducing peer-visible per-action notifications would need to wire a new path; the existing one no longer exists.
+- **Mixed-version peers.** `MovesCodec.Payload.Entry.init(from:)` defaults `checkedRight` to false, so a TestFlight build that's behind on the schema can still decode records from the Simulator. Older builds drop the new flag when re-encoding (they just don't see it), so a check-correct stamped on the new build that round-trips through an older device's re-write would lose its `.right` state. Acceptable transient: the next check on the new device re-stamps.
+
+## File-level inventory
+
+Added:
+
+- `Workers/push-worker.js`, `Workers/wrangler.push.toml`
+- `Crossmate/Services/PushClient.swift`
+- `Crossmate/Services/LocalSessionTracker.swift`
+- `Crossmate/Models/CheckResult.swift`
+
+Modified:
+
+- `project.yml`, `Crossmate/Info.plist`, `Crossmate/Crossmate.entitlements` — push base URL and App Attest entitlement/configuration
+- `Crossmate/Services/PushRequestAuthenticator.swift` — per-install App Attest enrollment and request signing
+- `Crossmate/Models/CellMark.swift` — enum-based `checked: CheckResult?`
+- `Crossmate/Models/Game.swift` — `setLetter` / `checkCells` updated for new mark API
+- `Crossmate/Models/CrossmateModel.xcdatamodeld` — `CellEntity.checkedRight` attribute
+- `Crossmate/Models/PlayerSession.swift` — `onPlayerEvent` + 6 fire sites removed
+- `Crossmate/Sync/Moves.swift` — `TimestampedCell` / `GridCell` / `RealtimeCellEdit` / `Payload.Entry` gain `checkedRight`
+- `Crossmate/Sync/Presence.swift` — `PingKind` trimmed, `PingScope` / `PingScopePayload` removed, `Ping.scope` removed
+- `Crossmate/Sync/SyncEngine.swift` — `enqueuePing` loses `scope`; `PingPayload` loses `scope`
+- `Crossmate/Sync/RecordSerializer.swift` — `pingRecord(scope:)` removed
+- `Crossmate/Sync/RecordBuilder.swift` — drop `payload.scope` passthrough
+- `Crossmate/Sync/SessionMonitor.swift` — slimmed to bucket accumulation + `consumeOnOpen`
+- `Crossmate/Sync/MovesUpdater.swift` — `enqueue(checkedRight:)`, `Pending.checkedRight`
+- `Crossmate/Sync/GridStateMerger.swift` — propagate `checkedRight`
+- `Crossmate/Persistence/GameMutator.swift` — `encodeMark` returns the triple; cell-edit path passes `checkedRight`
+- `Crossmate/Persistence/GameStore.swift` — `decodeMark` takes the triple; `applyCellCache` / `applyRealtimeCellEdit` updated
+- `Crossmate/CrossmateApp.swift` — `AppDelegate.onAPNsToken` callback; `updateActiveNotificationPuzzleID` fires the play/pause pushes; selection-clear block fires the pause push; `session.onPlayerEvent` callback removed
+- `Crossmate/Services/AppServices.swift` — `PushClient` init wired; `publishCompletionPush` (win + resign), `publishSessionStartPush`, `publishSessionEndPush`, `recipientsAndTitle`; `sendPlayerEventPings` removed; `bodyText(for: Ping)` trimmed; `presentBegins` call site removed
+
+Tests touched: `RecordSerializerMovesTests`, `PendingEditFlagTests`, `GridStateMergerTests`, `MovesUpdaterTests`, `GameStoreUnreadMovesTests`, `Sync/MovesInboundTests`, `Sync/AuthorDeltaTests`, `Sync/EngagementCoordinatorTests`, `Sync/PendingChangeReapTests`, `Sync/SessionMonitorTests`, `GameMutatorTests`, `XDAcceptTests`, `PuzzleNotificationTextTests`, `RecordSerializerTests`.
+
+All unit tests pass. No CloudKit Dashboard changes required.
diff --git a/Notes/ReadHorizons.md b/Notes/ReadHorizons.md
@@ -0,0 +1,165 @@
+# Problem: cross-device read-horizon / presence collapse
+
+This document captures an unresolved design problem in the read-cursor /
+unread-badge sync path. It is written to be picked up cold in a new session.
+**No fix is committed for it yet that works** — see "Current state" below.
+
+## TL;DR
+
+The account's read horizon (`GameEntity.lastReadOtherMoveAt`) and its
+"a device is actively present" lease are squeezed into a **single
+last-writer-wins scalar** carried on **one Player record per (game, account)**.
+That structure cannot represent "device A has left but device C is still
+present," so any attempt to stop a leaving device from collapsing another
+device's active lease is unsolvable with the data we currently store. Two
+successive fixes attempted it and both were wrong. We need to decide between
+(a) accepting the bounded, self-healing limitation, or (b) making presence
+per-device, which is a schema + sync change.
+
+## Background: how the read horizon works today
+
+- `GameEntity.lastReadOtherMoveAt` is the per-account "how far we've read other
+ authors' moves" horizon. The unread badge / session nudge fires when a peer
+ (different iCloud account) has a `MovesEntity.updatedAt` later than this
+ horizon. See `GameStore.unreadOtherMovesGameCount()` and `GameSummary`.
+- A **future-dated** horizon is an *active-session lease*: "a device of this
+ account is in the puzzle right now, suppress badges/nudges until this time."
+ `publishReadCursor(.activeLease)` writes `now + readLeaseDuration` (10 min)
+ and refreshes only when less than `readLeaseRefreshFloor` (5 min) remains
+ (`AppServices.swift`). Exit/background writes `now` (`.currentTime`),
+ intentionally closing the lease.
+- The horizon is shipped to other devices on the Player record's `readAt`
+ field. **Outbound `readAt` is built from `entity.game?.lastReadOtherMoveAt`**
+ (`RecordBuilder.swift:135`) — i.e. every device publishes the single shared
+ account scalar, not its own lease.
+- Inbound: `RecordApplier.applyPlayerRecord` adopts `readAt` (and the
+ `sessionSnapshot` catch-up baseline) **above** the selection `updatedAt` LWW
+ guard but **under** the per-record etag freshness guard, then calls
+ `onReadCursor` → `GameStore.noteIncomingReadCursor` for our own account's
+ records only (`authorID == localAuthorID`). This hoist was commit `3835f0f`
+ ("Converge the catch-up baseline regardless of cursor recency"), which fixed a
+ real duplicate-catch-up-banner bug.
+
+## The decisive fact (why the recent fixes fail)
+
+**Player records are one per `(game, author)`, with no device component:**
+
+```
+RecordSerializer.swift:62
+ static func recordName(forPlayerInGame gameID: UUID, authorID: String) -> String {
+ "player-\(gameID.uuidString)-\(authorID)" // <-- no deviceID
+ }
+```
+
+All of an account's devices share the *same* `authorID`, so they all write the
+*same* Player record. Consequences:
+
+1. CloudKit resolves concurrent device writes by **last-writer-wins**. The
+ record's `readAt` is simply whoever wrote most recently.
+2. Locally there is exactly **one** `PlayerEntity` row per `(game, authorID)`.
+ The per-record etag guard (`applyPlayerRecord`, the
+ `incomingIsAtLeastAsFresh` check) rejects older CloudKit server snapshots
+ than the local etag, so the `readAt` that reaches `noteIncomingReadCursor`
+ is from the accepted/current server version of that one Player record. That
+ does **not** mean its `updatedAt` value is semantically freshest; the
+ catch-up-baseline bug existed precisely because the accepted record could
+ carry a stale `updatedAt`.
+3. There is **no stored representation of an individual device's lease.** "A
+ left, C is still here" is unrepresentable; the scalar holds only the last
+ write.
+
+## The original concern and why it is inherent, not a bug we introduced
+
+Red-team round 1 worried that a leaving device's record could move
+`lastReadOtherMoveAt` backward and collapse another device's active future
+lease, briefly reopening unread badges / pause nudges. With the one-record LWW
+model this is just **last-writer-wins on a shared scalar**: if device A closes
+(writes `now`) after device C leased `now+10m`, the record now says `now`. Other
+devices adopt `now`. It is bounded and **self-healing**: the still-active device
+re-leases within ≤5 min (`readLeaseRefreshFloor`), and a foreground device marks
+inbound moves read on arrival anyway. It cannot be "fixed" without representing
+C's lease separately from A's close.
+
+## What was attempted (both wrong)
+
+1. **`0ccbe90` — `protectActiveLease` floor (committed).** Made
+ `noteIncomingReadCursor` refuse any incoming `readAt` below an unexpired
+ future horizon. **Broke clean closes:** a real close from another device is
+ indistinguishable from a stale echo (both are "a value below the current
+ future horizon"), so legitimate closes were ignored and a stale future lease
+ was held — suppressing badges/nudges for up to the 10-min lease.
+
+2. **`e2d863a` — recompute from sibling rows (committed).** Replaced the
+ floor with `horizon = max(incoming, min(current, now), max sibling
+ PlayerEntity.readAt)`. **Two fatal flaws, both correct per red-team:**
+ - **P1:** "max over sibling `PlayerEntity` rows" assumes per-device rows that
+ do not exist (one row per author). In production it degenerates to the
+ single LWW row and distinguishes nothing. The accompanying test fabricated
+ `player-A`/`player-B` rows with the same author — impossible.
+ - **P1:** `min(current, Date())` floors at wall-clock *processing* time, not
+ the actual departure time. A late-processed close (device left 10:02,
+ received 10:05, old lease 10:10) advances the horizon to 10:05 and marks
+ peer moves made 10:02–10:05 as read even though no device saw them.
+
+## Current state
+
+- Committed: `3835f0f` (good, keep), `0ccbe90` (broken floor), and `e2d863a`
+ (broken recompute).
+- `PLAN.md` is untracked documentation. The working tree may otherwise be clean
+ depending on whether this file has been added.
+- The unit suite currently passes, but only because the test was written to the
+ fictional model. Passing is not meaningful here.
+
+## Options to decide between
+
+**Option A — Revert to the plain-adopt baseline (recommended as the floor).**
+Restore `noteIncomingReadCursor` to simply
+`_ = setReadCursor(gameID: gameID, readAt: readAt)` (the pre-`0ccbe90` behavior),
+i.e. revert both `0ccbe90` and `e2d863a` back to the `3835f0f` state. This is
+the LWW-consistent choice: trust the account's latest resolved
+`readAt`, allow backward movement. It has **no false-negative** (a close to
+10:02 yields horizon 10:02; later moves badge correctly — Finding 2 gone) and
+**no fictional data dependency** (Finding 1 gone). Its only cost is the inherent
+round-1 collapse, which is bounded (≤5 min) and self-healing. The catch-up
+baseline fix from `3835f0f` is preserved.
+
+**Option B — Make presence per-device (the real fix, larger).** Represent each
+device's lease separately so the account horizon can be a true max over *live*
+device leases. Sketch:
+- Give the Player record a `deviceID` (like Moves/Journal/Ping already do:
+ `moves-<game>-<author>-<device>`), so each device owns its own row, OR add a
+ separate lightweight per-device presence/lease record.
+- Compute the suppression horizon as the max **unexpired** lease across the
+ account's own device rows; let truly-read progress be a separate monotonic
+ value that never moves backward.
+- This is what the round-2/round-3 attempts *wanted* to do but couldn't, because
+ the rows didn't exist. It is a schema + sync-path change with migration
+ considerations and should be scoped deliberately. Note CloudKit production
+ cannot be inspected/edited via the dashboard, so verify via on-device
+ diagnostics.
+
+**Option C — Status quo (`0ccbe90`/`e2d863a` as-is).** Not recommended: the
+committed follow-up still relies on per-device Player rows that do not exist
+and can mark unseen moves read via the wall-clock processing-time floor. This
+is worse than the occasional spurious nudge Option A allows.
+
+## Recommendation
+
+Take **Option A now** to get back to a correct, simple baseline, then decide
+separately whether the round-1 collapse is worth **Option B**. My read: the
+collapse is minor and self-healing, so Option A may be sufficient on its own and
+Option B is only worth it if real-world reports show spurious badges/nudges
+while a sibling is demonstrably active.
+
+## Key references
+
+- `Crossmate/Sync/RecordSerializer.swift:62` — Player record name (the root fact)
+- `Crossmate/Sync/RecordBuilder.swift:135` — outbound `readAt` = shared scalar
+- `Crossmate/Sync/RecordApplier.swift` (`applyPlayerRecord`) — etag guard, the
+ `3835f0f` hoist, `onReadCursor`
+- `Crossmate/Persistence/GameStore.swift` — `noteIncomingReadCursor`,
+ `setReadCursor`, `noteIncomingMovesUpdate`, `markOtherMovesRead`
+- `Crossmate/Services/AppServices.swift` — `publishReadCursor`,
+ `readLeaseDuration` (10 min), `readLeaseRefreshFloor` (5 min), the
+ `accountSeen` push path (~2208), the incoming-cursor drain (~486)
+- Commits: `3835f0f` (keep), `0ccbe90` and `e2d863a` (revert)
diff --git a/Notes/ServerSecurity.md b/Notes/ServerSecurity.md
@@ -0,0 +1,237 @@
+# Crossmate Audit
+
+Date: 2026-06-12 (updated 2026-06-16)
+
+Scope: Crossmate iOS/iPadOS app, shared app/extension code, tests, scripts, and
+Cloudflare Workers used by Crossmate. Crossmake was intentionally excluded.
+
+## Summary
+
+The highest-risk issue was the push worker's authentication model. The app
+previously shipped a shared push bearer in its bundle configuration, and the
+worker treated that bearer as the only authorization gate for registering
+devices and publishing pushes. Anyone who extracted that value from a real app
+build could use the worker as Crossmate. That issue has been addressed by
+replacing shared bearer authentication with per-install App Attest enrollment and
+signed worker requests. A temporary legacy-bearer fallback behind an
+environment flag was subsequently removed and the legacy push-bearer secret
+deleted from the worker (see Finding 1), so App Attest is
+now the only authentication path.
+
+The rest of the audit found lower-risk operational and robustness issues around
+push environment handling, local secret storage, engagement room authentication,
+and destructive local-store recovery behavior. All of these have since been
+addressed. The most substantive was the engagement room authentication design
+(Finding 4): the secret-in-URL symptom sat on top of a trust-on-first-use
+room-auth scheme whose HMAC signature added no security. Rooms are now
+registered with the worker ahead of connecting (secret in a POST body, never a
+URL), and the connect signature is verified against the worker-held secret.
+
+## Findings
+
+### 1. Push worker used a bearer secret embedded in the app
+
+Status: Fixed.
+
+Impact: High.
+
+The app read `CrossmatePushBearer` from its bundle configuration and sent it to
+the push worker. The Cloudflare worker accepted that shared value as sufficient
+authorization for `/register`, `/publish`, and unregister operations. Because the
+same bearer was present in every distributed build, extracting one app binary
+would expose the credential for all clients.
+
+The implemented replacement keeps Crossmate account-free while removing the
+global shared secret. Updated clients now enroll a per-install App Attest key
+with the worker, store the key ID locally, and sign each protected push request.
+The worker verifies Apple App Attest attestation, stores only the public key and
+assertion counter, and accepts push operations only when signed by the registered
+key. Existing collaborative games remain compatible for updated clients because
+CloudKit game state and push address derivation did not change; clients simply
+re-register the same push addresses under the new authentication scheme.
+
+The worker initially retained a legacy bearer fallback behind an
+environment flag as a rollback window. Resolved
+2026-06-12: the fallback branch was removed from `Workers/push-worker.js`, the
+updated worker was redeployed, and the legacy push-bearer secret was
+deleted from the worker. App Attest is now the only authentication path.
+
+Deployment notes:
+
+- `APP_ATTEST_ROOT_CERT_PEM` has been uploaded to Cloudflare from
+ Apple's published App Attest root certificate.
+- `crossmate-push` has been deployed with App Attest environment and app identity
+ variables.
+- The regenerated `Crossmate iOS Distribution` provisioning profile includes
+ `com.apple.developer.devicecheck.appattest-environment`.
+- App Attest enrollment and signed worker requests have been validated on a real
+ client after the initial implementation session.
+
+### 2. Push environment handling can diverge between APNs entitlement and client registration
+
+Status: Fixed.
+
+Impact: Low (debug builds only).
+
+`Crossmate.entitlements` hardcoded `aps-environment: production`, while
+`PushClient` selected `sandbox` under `#if DEBUG` and `production` otherwise.
+TestFlight and App Store builds are release-signed, so both sides agreed on
+production there; the mismatch only affected local device debug builds, where the
+token's APNs environment and the environment registered with the worker could
+disagree and pushes silently fail. That failure mode was a diagnosability
+annoyance during development rather than a user-facing risk.
+
+Resolved 2026-06-12: a single `APS_ENVIRONMENT` build setting in `project.yml`
+(`development` for Debug, `production` for Release) is now substituted into both
+the `aps-environment` entitlement and a `CrossmateAPSEnvironment` Info.plist
+key. `PushClient` reads the Info.plist key instead of inferring from
+`#if DEBUG`, and disables push (failable init, logged to diagnostics) if the
+value is missing or unrecognised. Because the entitlement and the worker
+registration now derive from the same variable, they cannot diverge. This also
+corrects the Debug entitlement itself: development-signed builds previously
+claimed `production` while their provisioning profile said `development`.
+
+### 3. Account push secret storage and generation path is weakly guarded
+
+Status: Fixed.
+
+Impact: Low.
+
+The account push secret is stored in local defaults, and random generation
+failure was not treated as a hard failure: `generatePushSecret()` in
+`AccountPushCoordinator` discarded the `SecRandomCopyBytes` result, so on a
+(vanishingly unlikely) failure the minted secret would be the base64 of 32 zero
+bytes.
+
+Moving the local copy to the Keychain would change little, because the secret is
+deliberately replicated across the account's devices as a CloudKit `Decision`
+record — the private CloudKit database is already its dominant home. The
+severity is therefore Low rather than Medium.
+
+The boundary this finding originally documented — the secret is the HMAC key
+from which per-game push addresses are derived, those addresses acted as bearer
+capabilities, and the worker's `handlePublish` did not check that a caller
+participates in the game it pushed to, so any App Attest-enrolled client that
+learned a derived address could publish to that account — has since been closed
+(participation gating, below).
+
+Resolved 2026-06-12 (secret hardening): `generatePushSecret()` now checks the
+`SecRandomCopyBytes` status with a `precondition` (crashing is preferable to
+durably converging a predictable HMAC key across the account's devices) and
+reuses the existing `Data.base64URLEncodedString()` helper.
+
+Resolved 2026-06-16 (participation gating): publishing to a game now requires
+proving participation, mirroring the engagement room-secret scheme (Finding 4).
+Each shared game carries a per-game push credential (`GamePushCredentials` — an
+unguessable `credID` plus a 256-bit secret) in the Game record's `notification`
+field, synced only to CKShare participants. Devices register their per-game push
+addresses with the worker keyed under that `credID`, and a publish is signed
+with the game secret (HMAC over the App Attest request's already-validated body
+hash, timestamp, and nonce). The worker verifies the signature against the
+secret it holds for the credID and resolves targets only among addresses
+registered under it — so a caller can reach a game's participants only if it
+holds that game's secret, i.e. it is a participant. A signature alone would not
+have sufficed (the worker can't tell which addresses belong to which game), so
+the device address registration is credID-bound too; that binding is the part
+that actually closes the hole. Account-scoped sibling pushes
+(accountJoined/accountSeen) carry no credID and stay on the App-Attest-only
+path; their addresses derive from the account secret in the private database and
+were never participant-spoofable. Deployment: the Game record gains a
+`notification` String field (Development + Production) and the updated push
+worker must be deployed; older clients lose game pushes (the durable CloudKit
+path is unaffected), which is acceptable pre-release.
+
+### 4. Engagement room authentication is trust-on-first-use and its signature adds no security
+
+Status: Fixed.
+
+Impact: Medium.
+
+The secret-in-URL exposure originally flagged here was the symptom; the room
+authentication design underneath was the actual weakness.
+
+The client sent both the room secret and an HMAC-SHA256 signature over
+`roomID|authorID|deviceID|timestamp|nonce` as query parameters. The worker
+verified that signature using the secret supplied in the same request
+(`room-worker.js`), so anyone could mint a valid signature for any secret
+they chose — the HMAC added nothing. The real gate was a trust-on-first-use
+`secretHash` stored in the Durable Object: the first client to connect to a
+room defined its secret, and later connections had to match it. The secret
+itself was the bearer credential, and it travelled in the query string, where
+it could appear in logs, diagnostics, proxies, and tooling surfaces.
+
+Resolved 2026-06-12 (code): the worker gained a `POST /rooms/{id}/register`
+endpoint that stores the room secret from the request body — first write wins,
+re-registration with the same secret is an idempotent success, and a different
+secret is refused with 409. The websocket connect no longer accepts a `secret`
+parameter at all; the worker verifies the HMAC signature against the
+registered secret and refuses unregistered rooms. Timestamp-freshness and
+nonce-single-use checks already existed on the worker and are unchanged — with
+a server-held key they now carry real authentication. Clients register
+idempotently before every connect, which preserves the previous de-facto
+behavior that an idle-expired Durable Object room is resurrected by whichever
+creds-holder returns first, and removes any ordering race between the minting
+peer and peers that receive the creds via CloudKit. A 409 (room ID registered
+under a different secret, e.g. squatted after idle expiry) is surfaced as
+`EngagementReconcileOutcome.roomRejected`, and `EngagementLifecycle` clears the
+Game record's `engagement` field so a fresh room is minted and LWW converges
+the peers onto it.
+
+Deployed 2026-06-12: the updated worker refuses TOFU connects, so clients
+running builds that predate this change lose the live engagement channel
+(graceful — the durable CloudKit Moves path is unaffected). It was deployed
+alongside the updated clients.
+
+### 5. Core Data store recovery can erase local state
+
+Status: Fixed.
+
+Impact: Medium.
+
+The local persistence recovery path could wipe the Core Data store when store
+load or migration fails. The in-code rationale
+(`PersistenceController.recreateStore`) was that the store is a rebuildable
+cache of CloudKit, so a wipe is recoverable via refetch. That rationale does not
+hold in two cases:
+
+- iCloud sync is user-toggleable (`isICloudSyncEnabled`). For a user with sync
+ disabled, the local store is the *only* copy of their games, and the wipe is
+ total, permanent data loss.
+- Even with sync on, offline edits not yet uploaded through the sync engine are
+ lost.
+
+Resolved 2026-06-12: `recreateStore` now moves the failing store files
+(including the `-wal`/`-shm` sidecars) aside under a `.broken-<timestamp>`
+suffix before destroying and rebuilding, so the data stays inspectable and
+recoverable instead of being erased. The recovery event — including the original
+load error and the preserved file names — is surfaced through the diagnostics
+`EventLog`. `PersistenceController` gained an injectable store URL for tests,
+and targeted tests now cover both the failure path (broken store preserved
+byte-for-byte, empty store rebuilt) and the healthy path (reopening an existing
+store triggers no recovery). Preserved backups are not auto-pruned; the path
+requires a failed migration, so accumulation is not a practical concern.
+
+## Verification Performed
+
+After the App Attest push-auth changes, these checks passed:
+
+- `node --check Workers/push-worker.js`
+- `xcodegen generate`
+- `bash Scripts/build.sh`
+- `bash Scripts/test-unit.sh`
+
+Cloudflare was also updated operationally:
+
+- `APP_ATTEST_ROOT_CERT_PEM` secret uploaded.
+- `crossmate-push` deployed to its workers.dev URL.
+
+## Release Validation
+
+The App Attest challenge/registration path and signed worker requests have been
+validated on a real client. The remaining release checks are the broader
+end-to-end collaboration and push-delivery behaviors:
+
+- Fresh install registers for notifications.
+- Two updated clients can continue an existing collaborative game.
+- Background push delivery still works.
+- Worker logs remain clean during normal registration and publish traffic.
diff --git a/Notes/Snapshots.md b/Notes/Snapshots.md
@@ -0,0 +1,143 @@
+# Snapshots in Shared Games
+
+Snapshots are a move-log compaction and replay acceleration mechanism. Instead
+of reconstructing a game by replaying every move from an empty grid, a client can
+start from a stored grid snapshot and replay only the moves not included in that
+snapshot.
+
+The current scalar `upToLamport` model is not safe for shared games. It assumes
+there is one global move sequence:
+
+```text
+Snapshot(upToLamport: 42)
+```
+
+Replay then skips every move whose Lamport is `<= 42`. That is only correct if
+all devices assign Lamports from a single globally ordered stream. In a shared
+or offline-capable game, two devices can independently create moves with the
+same or overlapping Lamport values. A snapshot from device A at Lamport 42 may
+not include device B's move at Lamport 41. If replay treats B's move as covered,
+that move can disappear from the reconstructed grid.
+
+## Proposed Model
+
+Use per-replica move clocks and snapshot coverage vectors.
+
+Each move should identify the replica that produced it and that replica's local
+sequence number:
+
+```text
+Move(replicaID: "device-a", sequence: 12)
+Move(replicaID: "device-b", sequence: 7)
+```
+
+A snapshot should store the grid plus the highest included sequence for each
+replica:
+
+```text
+Snapshot(
+ grid: ...,
+ coverage: [
+ "device-a": 12,
+ "device-b": 7
+ ]
+)
+```
+
+Replay can then skip a move only when the selected snapshot explicitly covers
+that move:
+
+```swift
+let coveredSequence = snapshot.coverage[move.replicaID] ?? 0
+let isCovered = move.sequence <= coveredSequence
+```
+
+This prevents one device's snapshot from accidentally hiding another device's
+move. A move is skipped only if the snapshot proves that the move was folded
+into the snapshot grid.
+
+## Replay and Conflict Ordering
+
+Snapshot coverage answers one question: which moves are included in this
+snapshot?
+
+Conflict ordering answers a separate question: when multiple moves affect the
+same square, which move wins?
+
+Those should not be represented by the same scalar value. Replay should:
+
+1. Choose a snapshot.
+2. Start from the snapshot grid.
+3. Apply only moves not covered by that snapshot.
+4. Use a deterministic conflict ordering for same-cell writes.
+
+The conflict ordering can be last-writer-wins with a stable tie-breaker, for
+example `(createdAt, replicaID, sequence)` or another explicit ordering chosen
+for product semantics. The important part is that coverage remains per-replica.
+
+The shipped tie-break (`GridStateMerger.shouldReplace`) is
+`(updatedAt, authorID, deviceID)`: per-cell last-writer-wins on the writer's
+wall-clock `updatedAt`, then lex-min `authorID`, then lex-min `deviceID`.
+
+### Accepted trade-off: cross-device wall-clock skew
+
+The merge, the completion seal (`merge(notAfter: completedAt)`), the pause-push
+diff window, and presence all compare raw device wall clocks across users —
+there is no logical clock and no server-time normalisation on apply. A device
+whose clock runs fast therefore wins every cell conflict regardless of real
+typing order, and its legitimate pre-win letters can be dropped by the
+completion seal when their future-dated `updatedAt` reads as past the latch.
+
+This is accepted as-is rather than mitigated, because the realistic incidence is
+low. iOS defaults to "Set Automatically" (NTP / cellular network time), which
+holds devices within sub-second of true time; quartz drift on a merely-offline
+device is seconds per week. Sub-second skew almost never inverts a merge outcome,
+since that requires two writers to the *same* cell within a real-time gap smaller
+than the skew, and the losing letter in such a tie is usually plausible anyway —
+it is a coin-flip on a shared cell, not silent destruction of unique work. The
+one regime where skew actually bites is a user who has turned off automatic time
+or set the clock manually (minutes to days off); that is self-inflicted
+misconfiguration, and since `CKShare` participants are invited friends, a
+malicious far-future clock is griefing by someone you chose to play with, not a
+cheating-for-advantage vector — there is nothing zero-sum to win in a cooperative
+solve.
+
+If field reports ever surface a "my answer disappeared" trace to a wrong clock,
+the cheap mitigation is to clamp each inbound cell's `updatedAt` to
+`record.modificationDate + ε` on apply, bounding every writer's skew by
+CloudKit's server clock (which all devices share). It is deliberately not built
+speculatively: it adds a permanent "inbound timestamps are rewritten on apply"
+behaviour to reason about, for an edge no user is known to hit.
+
+## Pruning
+
+Pruning old moves is the risky part. With multiple replicas, snapshots can be
+incomparable:
+
+```text
+Snapshot A covers device-a: 20, device-b: 5
+Snapshot B covers device-a: 10, device-b: 30
+```
+
+Neither snapshot fully replaces the other. If moves are deleted merely because
+they are covered by some snapshot, a later replay from a different snapshot may
+miss required moves.
+
+The conservative implementation path is:
+
+1. Add per-replica coverage to snapshots.
+2. Use snapshots only as replay accelerators.
+3. Do not prune moves initially.
+4. Add pruning later only for moves covered by a retained canonical snapshot.
+
+A newer snapshot can safely replace a canonical snapshot when its coverage
+dominates the previous one: for every known replica, the newer snapshot covers at
+least as high a sequence number. Until that rule exists and is tested, retaining
+the move log is safer than compacting it incorrectly.
+
+## Practical Recommendation
+
+For now, disable snapshot pruning for shared games. It is also reasonable to
+disable shared-game snapshot creation entirely until the snapshot format records
+per-replica coverage. The cost is a larger move log and slower replay for very
+long games; the benefit is avoiding data loss or disappearing squares.
diff --git a/Notes/SyncRedesign.md b/Notes/SyncRedesign.md
@@ -0,0 +1,186 @@
+# Sync Redesign Plan
+
+## Context
+
+On 2026-04-21, completing a puzzle on device surfaced two distinct failures:
+
+1. **Data loss**: after filling every cell and seeing the win screen, the puzzle list's preview was missing a square. Re-opening the puzzle confirmed a previously-filled cell had reverted to blank.
+2. **CloudKit throttling**: diagnostic logs showed the container hitting a device-level throttle (`x-cloudkit-error=THROTTLED`, `CKRetryAfter=22`, `ThrottleLabel=Device throttle: container=iCloud.net.inqk.crossmate.v2, type=210/211`) with five distinct `OperationID`s inside a ~2-second window.
+
+These are separate problems with a shared amplifier.
+
+## Status (as of 2026-04-26)
+
+**Phases 1, 2, 3a, 3b+3c, and 4 are complete.** Only the conditional Phase 5 (WebRTC) remains, gated on measured live-latency being inadequate.
+
+### Done
+
+- **Phase 1** — _was_: fetch-side LWW guard on `RecordSerializer.applyCellRecord`. Now deleted: Cell records no longer exist in the move-log model.
+- **Phase 2** — _was_: `PushDebouncer`, single-flight push, `CKRetryAfter` handling. Now deleted: replaced by `CKSyncEngine`'s built-in batching and retry.
+- **Phase 3a** — data model and replay plumbing: `MoveEntity`, `SnapshotEntity`, `GameEntity.lamportHighWater`, `MoveLog.swift`, `RecordSerializer` Move/Snapshot builders. Tests in `MoveLogTests.swift`.
+- **Phase 3b+3c** — full move-log + CKSyncEngine adoption (all 9 sub-tasks done):
+ 1. `MoveBuffer.swift` — coalescing actor, lamport assignment, `MoveEntity` persistence, debounce. Tests in `MoveBufferTests.swift`.
+ 2. `SyncEngine.swift` — rewritten around `CKSyncEngine`. Delegate handles incoming `Move`/`Snapshot` records, replays `CellEntity` cache, fires `onRemoteMoves`.
+ 3. `GameMutator.swift` — now emits `Move` to `MoveBuffer`; no direct `CellEntity` writes; `RemoteCellChange` / `Origin` / `applyRemoteCell` deleted.
+ 4. `GameStore.swift` — restores from move-log replay; `createGame` no longer seeds `CellEntity`; `repairSyncedGamesIfNeeded` / `applyRemoteChanges` deleted.
+ 5. `AppServices.swift` — constructs `MoveBuffer`, wires into `GameStore`, wires `onRemoteMoves` → `store.refreshCurrentGame()`.
+ 6. `CrossmateApp.swift` — no changes needed; `syncOnBackground()` already calls `moveBuffer.flush()`.
+ 7. `OutboxRecorder.swift`, `PushDebouncer.swift`, `PendingChangePayload.swift`, `PendingChange+Helpers.swift` deleted; `PendingChangeEntity` and obsolete token fields removed from data model.
+ 8. `PushDebouncerTests.swift`, `PendingChangeTests.swift` deleted; `ApplyCellRecordLWWTests` / `PendingChangePayloadTests` suites removed from `RecordSerializerTests.swift`; `GameMutatorTests.swift` pruned to in-memory mutation tests only.
+ 9. `MoveBufferTests.swift` — 7 tests all passing.
+
+- **Phase 4** — CKShare via per-game zones and dual sync engines, plus shared player identity:
+ 1. Per-game zones (`game-<UUID>`) with one `CKShare` rooted on the `Game` record; `Move`/`Snapshot`/`Player` records sit in the same zone via `parent` reference, so share inclusion is automatic.
+ 2. Dual sync engines: one driving `privateCloudDatabase` (owned games), one driving `sharedCloudDatabase` (joined games), parameterised through the same delegate path.
+ 3. `ShareController` for invitation creation; share-accept handler in `AppServices` accepts via `CKAcceptSharesOperation` and wires the new zone into the shared engine.
+ 4. `AuthorIdentity` stamps every `Move` with the local iCloud user record name, so multi-author replay attributes correctly.
+ 5. Share revocation: `markAccessRevoked(gameID:)` flips the local game read-only when the participant is removed; resolves the open question from earlier drafts.
+ 6. **Player identity** (deviated from the original `namesByAuthor`-on-Game JSON-blob plan in `.claude/plans/shared-player-identity.md`): names ride on per-(game, author) `Player` records (`player-<gameID>-<authorID>`) through the same engine. Each author owns their own record name → no write-write conflicts, no JSON merge primitive, no "must start from latest server record" hazard. `NameBroadcaster` fans out renames to every shared/joined game (debounced 250 ms); `PlayerRoster` resolves entries via the local `PlayerEntity` rows → `CKShare.Participant` name components → email → `"Player"` fallback chain. Colours are device-local in `GamePlayerColorStore`, with collision reassignment when the local user picks a remote colour.
+ 7. Snapshot pruning correctness fix landed during Phase 4: covered moves are now deleted only after `CKSyncEngine` confirms the snapshot save (`needsPruning` flag on `SnapshotEntity`), preventing data loss when an upload fails permanently.
+
+### Design decisions already locked in conversation
+
+- **Lamport timestamps** — simple per-game monotonic counter stored in `GameEntity.lamportHighWater`. Bumped on local emit and on receiving a move with a higher lamport. No cross-participant total order; undo-across-participants stays out of scope.
+- **Snapshot cadence** — on game completion + opportunistic when the move log exceeds 200 moves. Starting point; tunable later.
+- **No migration** — user deletes existing games before launching the new version, so the new code can assume the move-log schema from first run.
+- **`CellEntity` stays** — as a replay-driven cache, not source of truth. Views that read `CellEntity` keep working.
+
+### Gotchas for a fresh session
+
+- The Xcode project is generated by XcodeGen from `project.yml`. Run `xcodegen generate` after adding new `.swift` files or the build target won't see them.
+- The test target is named `Crossmate Unit Tests` (with spaces). Use `-only-testing:'Crossmate Unit Tests/<SuiteName>'`.
+- SourceKit diagnostics in-session regularly claim "Cannot find type X in scope" for Core Data generated classes and cross-file types. These are noise — trust `xcodebuild build` for ground truth.
+- Build target: `'platform=iOS Simulator,name=iPhone 17 Pro'`.
+- The user commits changes themselves — never run `git commit` unsolicited.
+
+### Root cause: data loss
+
+`RecordSerializer.applyCellRecord` (`Crossmate/Sync/RecordSerializer.swift:129`) unconditionally overwrites the local `CellEntity`'s `letter`, `markKind`, `checkedWrong`, `updatedAt`, `letterAuthorID` with the server's values. There is no LWW check against local `updatedAt` and no check for an outstanding `PendingChangeEntity`. Only `handlePushError`'s `.serverRecordChanged` branch does LWW; the plain fetch path does not.
+
+Reproduction:
+1. User types final letter; local `letter="A"`, pending change queued.
+2. Push is throttled; pending change remains in the outbox.
+3. Interleaved fetch (log: `fetch: changedZoneIDs count=1`) pulls the server's stale copy of that cell (`letter=""`).
+4. `applyCellRecord` overwrites local `letter="A"` with `""`. UI renders blank.
+5. Pending change would eventually re-push `"A"`, but during the window the cell genuinely reads blank.
+
+### Root cause: throttling
+
+The sync engine writes one `CKRecord` per cell and fires pushes aggressively:
+- No client-side debounce: every keystroke enqueues a `PendingChangeEntity`.
+- No serialisation: multiple `pushChanges()` invocations can produce overlapping `CKModifyRecordsOperation`s (hence the five distinct `OperationID`s in one second).
+- No `CKRetryAfter` handling: on 429, `handlePushError`'s default branch silently leaves the pending change in the outbox, and the next trigger immediately hits CloudKit again, tightening the device cooldown.
+
+### Why data model is central
+
+Throttling is ultimately about request rate and payload volume, not record shape per se. But the per-cell-record shape makes debouncing and coalescing hard (each cell has its own identity and history), and it doesn't compose with any real-time side-channel — each keystroke is always a record write. A better-chosen data shape removes the hardest constraint before we start tuning knobs.
+
+## Decision
+
+**Adopt a CloudKit + move-log + CKShare architecture, with WebRTC as deferred optional future work.**
+
+### Data model: append-only move log
+
+The source of truth for a game's cell state becomes an append-only collection of `Move` records, each describing a single cell mutation:
+
+- `recordName`: `move-<gameUUID>-<lamportOrUUID>`
+- `parent`: reference to the parent `Game` record (so CKShare inclusion is automatic)
+- Fields: `row`, `col`, `letter`, `markKind`, `checkedWrong`, `authorID`, `createdAt`, and a monotonic ordering key (Lamport timestamp or similar).
+
+The grid displayed in the UI is the deterministic replay of the move log. `CellEntity` becomes a local cache populated by replaying moves into Core Data; it is not the source of truth.
+
+Periodic **snapshots** keep the move log bounded: every N moves (or on game completion), write a `Snapshot` record capturing the full grid state and delete moves older than the snapshot. New joiners load the latest snapshot plus the tail of moves since it.
+
+### Why move log over per-cell-LWW or whole-game blob
+
+| Model | Throttling | Multiplayer correctness | Composes with real-time | Conflict code |
+|---|---|---|---|---|
+| Per-cell records (current) | Bad: per-keystroke write, hard to coalesce | Requires LWW on fetch (not present today) | Awkward: same keystroke would need to flow over two transports | Must implement on every client |
+| Whole-game blob with LWW | Good: coalesces trivially | **Broken**: Bob's blob without Alice's write wipes her cells | Awkward | N/A (wrong) |
+| Whole-game blob with per-cell LWW merge | Good | Works if merge is disciplined | Awkward | Non-trivial merge code on every client |
+| **Move log + snapshots** | **Good**: moves are small, append-only, debouncable | **Works**: each author writes only their own records, no `serverRecordChanged` races on shared cells | **Clean**: the same move unit ships over CloudKit or any side-channel | Deterministic replay; no merge code |
+
+The move log's key property for CKShare is **no write-write conflicts between participants** — Alice and Bob only ever create new records, each with a unique `recordName`. That removes the hardest correctness problem from the table.
+
+### How coalescing works in the move log
+
+"Coalescing" in this design is really a debounced in-memory buffer that dedupes by cell before anything becomes a record. A keystroke does **not** become a `Move` record at the moment it happens — it sits in an in-memory buffer keyed by `(gameID, row, col)`. A new keystroke in the same cell replaces the pending entry's value. At flush time, each buffer entry becomes one `Move` record with a fresh Lamport timestamp, and all records from one flush go into a **single** `CKModifyRecordsOperation`.
+
+So if the user types `A` → `B` → `C` in the same cell inside the debounce window, only one `Move` record is written, with `letter = "C"`. The intermediate states never exist as records. Flush triggers are: the debounce timer (1–2s), cell-change (flush the old cell before the new one starts accumulating, so the log's causal order matches the user's editing order), app background, and game completion.
+
+This is safe specifically because moves are append-only and each author only writes their own records — local coalescing never has to reason about server state or another participant's writes, which is what breaks the equivalent coalescing in the per-cell-record model.
+
+### Sync mechanics: CKShare for both async and sync
+
+- Each `Game` record is the root of a `CKShare`. Participants accept the share URL via `UICloudSharingController` and the OS's share-accept handler.
+- `Move` and `Snapshot` records set `parent = Game`, so they are part of the share automatically.
+- The engine drives both `privateCloudDatabase` (owner) and `sharedCloudDatabase` (participant) through the same code paths, parameterised by database handle.
+- `CKDatabaseSubscription` on both databases delivers silent pushes on change; the app wakes, fetches, replays moves into the local cache.
+- Async solving: partner's moves arrive on next fetch; 1–5s latency after push is fine.
+- Sync solving: same mechanism; feels like near-real-time at 1–3s in good conditions.
+
+### Throttle-safety rules (apply regardless of model)
+
+These are properties the engine must have, independent of the data model choice:
+
+1. **Debounce writes.** Buffer move events in memory; flush at most every 1–2 seconds (or on cell change / app background).
+2. **Single in-flight push.** Serialise `pushChanges()` via an actor or explicit lock so overlapping triggers coalesce rather than fan out.
+3. **Respect `CKRetryAfter`.** On `CKError.serviceUnavailable` / `.requestRateLimited` / `.zoneBusy`, sleep for the returned delay before the next push attempt. Do not break out of drain on transient errors.
+4. **Adopt `CKSyncEngine`.** It handles batching, retry-after, and send-state tracking natively. Re-implementing these primitives has been the source of the current bugs. The minimum deployment target for this redesign is iOS 18 (or iOS 26), so `CKSyncEngine` — including its shared-database support — is available unconditionally.
+
+### Deferred: WebRTC data channel
+
+If 1–3s live latency proves inadequate in practice, add a WebRTC data channel between live participants:
+
+- Signaling: small records written to the shared CloudKit zone (Alice writes SDP offer; Bob fetches via push, writes answer; ICE candidates exchanged the same way). Signaling latency is irrelevant since it's one-time per session.
+- STUN: public free server (`stun:stun.l.google.com:19302` or similar).
+- TURN: free tier (Cloudflare Calls, Xirsys) or paid (~$0.40/GB, Twilio) for the ~15–20% of sessions that fail direct P2P. Graceful fallback: if P2P fails, use the existing CloudKit move-log path.
+- Implementation: `WKWebView`-hosted WebRTC using WebKit's built-in `RTCPeerConnection` / `RTCDataChannel`. Avoids the ~20 MB `libwebrtc` bundle cost. Swift ↔ JS bridge via `WKScriptMessageHandler` + `evaluateJavaScript`. The app already ships `WKWebView` for NYT auth, so the framework cost is zero.
+
+This is **out of scope for the initial redesign.** CloudKit + move log + CKShare should be shipped and evaluated first. Add WebRTC only if measured live latency is genuinely unacceptable.
+
+### Rejected alternatives
+
+- **GameKit** (`GKMatch` / `GKTurnBasedMatch`). Requires Game Center authentication, which adds user friction, fails for users who have dismissed the sign-in prompt, and introduces a second identity layer separate from iCloud. The transport advantages do not outweigh the user-facing costs.
+- **Custom backend** (Firebase, Supabase, Convex, Cloudflare Durable Objects, Ably). Violates the no-backend constraint. Kept in mind as a future option if pure-CloudKit proves inadequate at scale.
+- **SharePlay as the primary transport.** Requires an active FaceTime/Messages session, which is not guaranteed. Still a candidate as a future augmentation for call-based live sessions, but cannot replace the async path.
+
+## Scope of the redesign
+
+### In scope (initial implementation)
+
+- New `Move` and `Snapshot` record types, with serialisation and replay logic.
+- `CellEntity` becomes a derived cache populated by replay; not the source of truth.
+- No migration. The user is the sole participant and will manually delete existing games before launching the new version. The new codebase is free to assume the move-log schema from the first run.
+- `SyncEngine` rewritten around `CKSyncEngine`, driving both the private and shared databases.
+- Dual-database support: `privateCloudDatabase` for owned games, `sharedCloudDatabase` for joined games.
+- CKShare creation, invitation flow via `UICloudSharingController`, share-accept handler in the app delegate.
+- Per-record `authorID` from `CKUserIdentity` (iCloud identity), so moves attribute correctly in multiplayer.
+- Fix the fetch-side overwrite bug: `applyCellRecord` (or its replacement in the move-log model) must not clobber state without LWW.
+
+### Out of scope (explicitly deferred)
+
+- WebRTC data channel and signaling.
+- SharePlay integration.
+- MultipeerConnectivity for co-located play.
+- Presence indicators (who is currently in the puzzle) and cursor tracking.
+- Operation-based undo across participants.
+- Conflict UX for the pathological "both type in the same cell in the same second" case — accept Lamport-ordered LWW silently for now.
+
+## Open questions
+
+_All originally listed questions have been resolved:_
+- _Share revocation: resolved by `markAccessRevoked(gameID:)` — local replay-derived state stays readable, no new moves accepted._
+- _Snapshot cadence and move ordering: resolved (see "Design decisions already locked")._
+
+## Phasing
+
+Proposed, subject to refinement:
+
+- **Phase 1: Fix the immediate data-loss bug in place.** Teach `applyCellRecord` to check for a pending outbox change or LWW `updatedAt` before overwriting. Keep the per-cell record model. Ships as a patch release.
+- **Phase 2: Throttle-safety.** Implement debounce, serialise pushes, respect `CKRetryAfter` on top of the existing hand-rolled `CKModifyRecordsOperation` engine. Still per-cell records. Ships as the next release; validates that pure-CloudKit with good hygiene is acceptable. The hand-rolled throttle code is understood to be short-lived — Phase 3 replaces it wholesale with `CKSyncEngine`.
+- **Phase 3: Move-log data model + `CKSyncEngine` adoption.** _Landed._ New `Move`/`Snapshot` records, `CellEntity` as cache, replay engine. Transport rewritten around `CKSyncEngine` so batching, retry-after, and send-state tracking come from the framework rather than hand-rolled primitives.
+ - **3a.** _Landed._ Data model & replay plumbing.
+ - **3b + 3c.** _Landed._ Live wiring + Phase 2 primitive deletion.
+- **Phase 4: CKShare.** _Landed._ Per-game zones, dual-engine sync (private + shared), invitation flow via `ShareController`, share-accept handler, per-author `Player` records for identity, share-revocation handling.
+- **Phase 5 (deferred, conditional):** WebRTC if measured live latency is inadequate. Not justified yet — needs real-world signal.