crossmate

A collaborative crossword app for iOS
Log | Files | Refs | LICENSE

PushWorker.md (17642B)


      1 # Push Worker + Sync Simplification
      2 
      3 ## Context
      4 
      5 After the engagement Worker landed (collaborative live-solving over WebSockets, 2026-05), it became natural to use the same Cloudflare runtime for push notifications. Pings were unreliable for time-critical user-facing events, and the receiver-side heuristics that had grown around them (`SessionMonitor.presentBegins`, the 3-minute quiescence-window scheduler) were brittle. This pass moves the user-facing event notifications onto a second Cloudflare Worker that signs APNs JWTs and pushes alerts directly, while folding the per-cell state for check / reveal / resign into the Moves stream so it propagates collaboratively.
      6 
      7 ## What landed (2026-05-27)
      8 
      9 ## App Attest migration (2026-06)
     10 
     11 The push worker no longer relies on a build-baked `PUSH_BEARER` for normal
     12 clients. Updated apps enroll a per-install App Attest key through
     13 `GET /attest/challenge` + `POST /attest/register`, then sign every
     14 `/register`, `/register DELETE`, and `/publish` request with that key.
     15 
     16 The worker stores only the attested public key and assertion counter keyed by
     17 `deviceID` + `keyID`. This preserves the no-Crossmate-account model: CloudKit
     18 still owns collaboration identity, and the worker only proves that a request
     19 came from an attested Crossmate install. Existing games keep working because
     20 their CloudKit `pushAddress` values are unchanged; updated clients simply
     21 re-register the same addresses with App Attest request auth.
     22 
     23 Deploy requirements:
     24 
     25 - `APP_TEAM_ID`, `APP_BUNDLE_ID`, and `APP_ATTEST_ENVIRONMENT` are configured
     26   in `wrangler.push.toml`.
     27 - `APP_ATTEST_ROOT_CERT_PEM` must be set as a Worker secret or dashboard
     28   variable.
     29 - `ALLOW_LEGACY_PUSH_BEARER = "1"` may be set temporarily for rollback, but
     30   normal deployment should leave it unset.
     31 
     32 ### Worker
     33 
     34 - New `Workers/push-worker.js` + `Workers/wrangler.push.toml`.
     35 - Single Durable Object class `PushRegistry`, sharded by `idFromName("registry")` (one global instance — Crossmate's scale doesn't need per-author sharding).
     36 - Three push endpoints, all App-Attest-auth gated:
     37   - `POST /register` — `{deviceID, token, environment, addresses}` upsert. Idempotent.
     38   - `DELETE /register` — `{deviceID, addresses}` — unregisters dropped address bindings.
     39   - `POST /publish` — `{kind, addressees, gameID, fromAuthorID, title, alertBody}` — fans out to every addressee's registered devices.
     40 - APNs JWT (ES256) signed in-Worker via Web Crypto. JWT cached in DO instance memory and refreshed every ~40 min (APNs' rate-limit floor is ~20 min; the 1-hour ceiling is the upper bound).
     41 - Per-token `environment` field picks sandbox vs production endpoint; iOS reports `"sandbox"` for Debug builds and `"production"` for TestFlight/App Store.
     42 - Deployed at the push Worker's `*.workers.dev` URL with `preview_urls = false`.
     43 
     44 ### Secrets
     45 
     46 Loaded via `wrangler secret put --config wrangler.push.toml`:
     47 
     48 - `APNS_KEY` — the `.p8` (piped via `< AuthKey_*.p8` stdin redirect; not pasted, because the interactive prompt eats newlines).
     49 - `APNS_KEY_ID`, `APNS_TEAM_ID` — both 10-char.
     50 - `APP_ATTEST_ROOT_CERT_PEM` — public Apple App Attest root certificate used
     51   by the worker to verify the attestation chain.
     52 
     53 APNs key is topic-restricted to the app's bundle ID, both environments enabled. Apple caps at 2 APNs keys per Team, so a single key serving both endpoints leaves headroom.
     54 
     55 ### iOS plumbing
     56 
     57 - `Crossmate/Services/PushClient.swift` — `@MainActor` class. Reads
     58   `CrossmatePushBaseURL` from `Bundle.main` and signs worker requests via
     59   `PushRequestAuthenticator`.
     60 - Idempotent: `updateAPNsToken(_:)` on every `didRegisterForRemoteNotificationsWithDeviceToken` callback, `updateAuthorID(_:)` on every `identity.refresh(...)` site. Worker side dedupes unchanged triples.
     61 - Account switch: explicit `DELETE /register` for the previous `(authorID, deviceID)` before re-registering.
     62 - Failures go to `syncMonitor.note(...)` for diagnostics; never user-surfaced.
     63 
     64 ### Push kinds (sender-formatted strings, no `loc-key` — Crossmate is English-only)
     65 
     66 | Kind | Trigger | Body |
     67 |------|---------|------|
     68 | `win` | `sendCompletionPings(resigned: false)` after the local player completes a puzzle | "Alice solved the puzzle 'X'" |
     69 | `resign` | `sendCompletionPings(resigned: true)` after the local player gives up | "Alice resigned the puzzle 'X'." |
     70 | `play` | `PuzzleDisplayView.updateActiveNotificationPuzzleID` when scenePhase becomes `.active` | "Alice is solving the puzzle 'X'" |
     71 | `pause` | Same handler when scenePhase leaves `.active`, plus the selection-clear block at puzzle close | "Alice added 12 letters and cleared 3 letters in the puzzle 'X'" |
     72 
     73 Pause uses `LocalSessionTracker` (`Crossmate/Services/LocalSessionTracker.swift`) — a small `@MainActor` counter wired into `store.onLocalCellEdit` that resets on every `begin(gameID:)` and drains on `consume(gameID:)`. The "skip if both counts are zero" guard inside `publishSessionEndPush` handles dedup between scenePhase-background and puzzle-close naturally.
     74 
     75 ### Pings that stay
     76 
     77 - `.friend` — friendship handshake.
     78 - `.invite` — re-invite to a game (uses `addressee`; surfaces in the Invited section).
     79 - `.hail` — engagement room bootstrap.
     80 - `.join` — the *invite-accept* one (one-shot, when a collaborator first accepts a share). Distinct from the new session-start APN, which is the `play` kind on the push side specifically to avoid this naming collision.
     81 
     82 `PingScope` and `PingScopePayload` are gone entirely (only `.check`/`.reveal` used them).
     83 
     84 ### Check/reveal/resign in Moves
     85 
     86 - New `Crossmate/Models/CheckResult.swift` — `enum CheckResult: Sendable, Equatable, Codable { case right, case wrong }`. Used as `CheckResult?`; `nil` is the canonical "unchecked" value (no `.none` case).
     87 - `CellMark.pen` / `.pencil` now carry `checked: CheckResult?` instead of `checkedWrong: Bool`.
     88 - `Game.checkCells` now stamps `.right` on correct entries (previously erased them to `.none`). Wrong entries continue to stamp `.wrong` — pen-vs-pencil style is preserved across the check.
     89 - Wire format and Core Data keep two parallel `checkedRight` / `checkedWrong` booleans (translation lives in `GameMutator.encodeMark` / `GameStore.decodeMark`). `MovesCodec.Payload.Entry` gets a custom `init(from:)` that defaults `checkedRight` to false so records written by older clients still decode cleanly.
     90 - Resign reveals propagate via the existing Moves reveal mechanism — peer grids fill in cooperatively. No new `Game.resignedBy` field; a resigned game and a reveal-all-puzzle game look identical at cold start (accepted trade-off — the moment-of-resign signal lives in the APN).
     91 
     92 ### SessionMonitor cleanup
     93 
     94 Slimmed to what the on-open catch-up banner actually needs:
     95 
     96 - Kept: `ingest(_ deltas:)` (bucket accumulation), `consumeOnOpen(gameID:)`, `cancel(gameID:authorID:)`, `bodyText(playerName:puzzleTitle:added:cleared:)`.
     97 - Removed: `presentBegins`, `scheduleEnd`, the 3-minute quiescence timer, `Bucket.scheduledFor`, `SessionNotificationScheduling` protocol + `RecordingNotificationScheduler` test recorder, `bodyText(for: Session)`, the end-/begin-identifier helpers, the `notificationCenter` and `notificationAuthorization` constructor params.
     98 
     99 The on-open banner still says "Alice added 12 letters; Bob added 5 letters" — that path is independent of the new APNs and remains useful as a catch-up when the user opens an unread puzzle.
    100 
    101 ## Design decisions
    102 
    103 - **Sender-formatted strings, not `loc-key`/`loc-args`.** Crossmate has no `Localizable.strings` / `.lproj` yet. Setting up the localization scaffolding just to feed receiver-side formatting that would produce the same English text is more work than it's worth for a single-locale app. Switch to `loc-key` when a second language ships.
    104 - **One Durable Object, not per-author sharding.** Single DO instance can comfortably hold thousands of token registrations; `/publish` resolves targets in one `state.storage.list({prefix: ...})` per addressee.
    105 - **JWT caching is mandatory.** APNs rejects rapidly-rotated provider tokens with `TooManyProviderTokenUpdates`. Cache in DO instance memory (sticky region) — no storage RTTs.
    106 - **`CellMark` as enum, parallel-Bool storage.** The conceptual model is the optional `CheckResult` enum; the on-disk encoding is two booleans. Translation at the boundary keeps the API clean without requiring a Core Data migration model.
    107 - **No `Game.resignedBy` distinguisher.** Cold-state conflation of resign vs reveal-all-puzzle is acceptable; the moment-of-resign signal lives in the APN body.
    108 - **No `lastAction` field on Moves; no Actions record.** Replay is a future feature that will design its own log format with the consuming UX in mind. Designing the granularity now without the consumer would be a guess.
    109 - **Push registration is fire-and-fetch-error.** Worker is idempotent, so the next launch's APNs callback retries naturally. Failures log to the diagnostics monitor but don't surface to the user.
    110 - **`preview_urls = false`.** Each preview URL carries the same secret bindings; gratuitous attack surface for no benefit.
    111 
    112 ## Badge and sibling-device read horizons (2026-06)
    113 
    114 Badge state is now split between durable-ish local truth and best-effort push
    115 transport:
    116 
    117 - The Notification Service Extension and the app share an App Group
    118   `BadgeState` ledger. It is a per-game horizon map, not a set: `unreadAt`
    119   advances when a badge-worthy APN is delivered, `seenAt` advances when the
    120   user opens/clears the game, and a game counts only when `unreadAt > seenAt`.
    121   This lets stale APNs lose to a newer open without needing CloudKit to arrive
    122   first.
    123 - The app icon badge is computed as `BadgeState.unreadGameIDs()` unioned with
    124   Core Data's unread-other-moves query. Core Data unread games are also seeded
    125   back into the ledger as `unreadAt` horizons, each stamped with that game's
    126   newest unseen other-author move time (`latestOtherMoveAt`). The NSE can't
    127   reach Core Data, so without this seed a push landing on a suspended app would
    128   re-stamp the badge from the ledger alone and drop any game whose unread state
    129   arrived purely via CloudKit sync. The seed is safe precisely because the
    130   ledger is a horizon map and not a set: re-seeding a game the user has since
    131   opened is a no-op, because its newer `seenAt` still wins. (The old set-based
    132   ledger couldn't express that, which is why the write-back was dropped when
    133   the horizon map first landed — and why it is safe to reinstate now.)
    134 - Ledger entries are maintained by lifecycle event, not reconciled by absence
    135   (a ledger-only game is ambiguous: it could be push-ahead-of-sync, which must
    136   count, or a deleted game, which must not). So `BadgeState.forget(gameID:)` is
    137   called on the two hard-removal hooks — local `onGameDeleted` and sync-driven
    138   `onGameRemoved` — and `BadgeState.reset()` on the diagnostics data wipe. This
    139   matters because a deleted game has nothing left to open, so a stale
    140   `unreadAt > seenAt` entry could never be cleared by a seen horizon and would
    141   badge forever.
    142 - The ledger key is `badge.ledger.v2`. The bump from `v1` is a deliberate
    143   one-shot discard: the `v1` migration off the pre-horizon set stamped
    144   `unreadAt = now` for every legacy entry, fabricating fresh-unread phantoms
    145   that no read state could defeat (a future-dated `unreadAt` beats any past
    146   `seenAt`). `loadLedger` now drops the legacy stores without fabricating, and
    147   rebuilds from Core Data via the seed above.
    148 - Each account has an account-scoped push address, published as the existing
    149   `Decision` record `decision-account-pushAddress` with `kind = account` and
    150   the address in the payload. This avoids adding CloudKit schema fields while
    151   giving all devices on the same iCloud account a shared silent-push route.
    152 - When a device joins a shared game, it sends `accountJoined` to the account
    153   address. Sibling devices ignore self-sends, fetch/sync, mint their own
    154   per-game push address if needed, and register with the Worker so they can
    155   receive that game's future pushes.
    156 - When a device clears/sees a game, it sends `accountSeen` with
    157   `senderDeviceID` and `readAt`. Sibling devices ignore self-sends, apply the
    158   supplied `readAt` to their local read cursor and badge ledger, withdraw any
    159   matching delivered notifications, and refresh the app badge. Inbound
    160   `accountSeen`, CloudKit `Player.readAt` updates and ping deletions are
    161   non-echoing so the account push path remains one-hop rather than a feedback
    162   loop.
    163 - If the game is actively visible, the seen push uses the same future
    164   read-lease horizon as `Player.readAt`. That prevents an account-seen message
    165   from racing after an active-lease update and accidentally collapsing
    166   presence on another device.
    167 
    168 The push server is still not durable truth for badges. It only transports
    169 same-account hints quickly. If a silent push is dropped, the next app launch or
    170 CloudKit fetch recomputes from local Core Data plus whatever the NSE delivered
    171 on that device.
    172 
    173 ## Open items / gotchas
    174 
    175 - **Schema deployment is not needed.** Moves records ship the cells as an opaque binary blob; the new `checkedRight` boolean rides inside that blob. No CloudKit dashboard field to add (unlike, say, `Player.readAt`). The `CellEntity.checkedRight` Core Data attribute auto-migrates locally on first launch (Boolean default `NO`).
    176 - **No visual indicator UI yet.** The cell state is now propagating; rendering it on the grid is separate UX work. Plan was tri-state: subtle dot for `.right`, distinct marker for `.wrong`, cleared when the cell's letter changes.
    177 - **The bearer in the binary is the only abuse gate.** That matches your stated comfort level (users aren't adversarial). If the binary is ever reverse-engineered the bearer leaks; rotate via `wrangler secret put` if that happens.
    178 - **Background-time publish.** Backgrounding while in a puzzle fires `publishSessionEndPush` from a `Task` that runs as the app suspends. iOS gives ~5–10 seconds; in practice a small POST to a Cloudflare Worker completes well within that window. If we see drops in the field, switch to a `BGTaskScheduler` background task or a background `URLSession`.
    179 - **`onPlayerEvent` removed from `PlayerSession`** — the 6 fire sites that used to enqueue check/reveal pings are gone. Anyone re-introducing peer-visible per-action notifications would need to wire a new path; the existing one no longer exists.
    180 - **Mixed-version peers.** `MovesCodec.Payload.Entry.init(from:)` defaults `checkedRight` to false, so a TestFlight build that's behind on the schema can still decode records from the Simulator. Older builds drop the new flag when re-encoding (they just don't see it), so a check-correct stamped on the new build that round-trips through an older device's re-write would lose its `.right` state. Acceptable transient: the next check on the new device re-stamps.
    181 
    182 ## File-level inventory
    183 
    184 Added:
    185 
    186 - `Workers/push-worker.js`, `Workers/wrangler.push.toml`
    187 - `Crossmate/Services/PushClient.swift`
    188 - `Crossmate/Services/LocalSessionTracker.swift`
    189 - `Crossmate/Models/CheckResult.swift`
    190 
    191 Modified:
    192 
    193 - `project.yml`, `Crossmate/Info.plist`, `Crossmate/Crossmate.entitlements` — push base URL and App Attest entitlement/configuration
    194 - `Crossmate/Services/PushRequestAuthenticator.swift` — per-install App Attest enrollment and request signing
    195 - `Crossmate/Models/CellMark.swift` — enum-based `checked: CheckResult?`
    196 - `Crossmate/Models/Game.swift` — `setLetter` / `checkCells` updated for new mark API
    197 - `Crossmate/Models/CrossmateModel.xcdatamodeld` — `CellEntity.checkedRight` attribute
    198 - `Crossmate/Models/PlayerSession.swift` — `onPlayerEvent` + 6 fire sites removed
    199 - `Crossmate/Sync/Moves.swift` — `TimestampedCell` / `GridCell` / `RealtimeCellEdit` / `Payload.Entry` gain `checkedRight`
    200 - `Crossmate/Sync/Presence.swift` — `PingKind` trimmed, `PingScope` / `PingScopePayload` removed, `Ping.scope` removed
    201 - `Crossmate/Sync/SyncEngine.swift` — `enqueuePing` loses `scope`; `PingPayload` loses `scope`
    202 - `Crossmate/Sync/RecordSerializer.swift` — `pingRecord(scope:)` removed
    203 - `Crossmate/Sync/RecordBuilder.swift` — drop `payload.scope` passthrough
    204 - `Crossmate/Sync/SessionMonitor.swift` — slimmed to bucket accumulation + `consumeOnOpen`
    205 - `Crossmate/Sync/MovesUpdater.swift` — `enqueue(checkedRight:)`, `Pending.checkedRight`
    206 - `Crossmate/Sync/GridStateMerger.swift` — propagate `checkedRight`
    207 - `Crossmate/Persistence/GameMutator.swift` — `encodeMark` returns the triple; cell-edit path passes `checkedRight`
    208 - `Crossmate/Persistence/GameStore.swift` — `decodeMark` takes the triple; `applyCellCache` / `applyRealtimeCellEdit` updated
    209 - `Crossmate/CrossmateApp.swift` — `AppDelegate.onAPNsToken` callback; `updateActiveNotificationPuzzleID` fires the play/pause pushes; selection-clear block fires the pause push; `session.onPlayerEvent` callback removed
    210 - `Crossmate/Services/AppServices.swift` — `PushClient` init wired; `publishCompletionPush` (win + resign), `publishSessionStartPush`, `publishSessionEndPush`, `recipientsAndTitle`; `sendPlayerEventPings` removed; `bodyText(for: Ping)` trimmed; `presentBegins` call site removed
    211 
    212 Tests touched: `RecordSerializerMovesTests`, `PendingEditFlagTests`, `GridStateMergerTests`, `MovesUpdaterTests`, `GameStoreUnreadMovesTests`, `Sync/MovesInboundTests`, `Sync/AuthorDeltaTests`, `Sync/EngagementCoordinatorTests`, `Sync/PendingChangeReapTests`, `Sync/SessionMonitorTests`, `GameMutatorTests`, `XDAcceptTests`, `PuzzleNotificationTextTests`, `RecordSerializerTests`.
    213 
    214 All unit tests pass. No CloudKit Dashboard changes required.