Embeddings versioning

Status

Spec only. Canthus v1 ships a deterministic fuzzy-only suggestion pipeline. This page captures the contract a future embeddings reranker must satisfy. Treat the acceptance criteria as binding for any PR that turns the reranker on.

Why embeddings might be added later

Fuzzy matching is strong on titles that share tokens. It is weak on paraphrases (“walking the pup” vs “walk dog”) and on cross-lingual or compound activities. An on-device sentence encoder can rerank top-K fuzzy candidates to recover those cases. The reranker is opt-in: the suggestion pipeline calls a Reranker interface that defaults to identity. Turning embeddings on is a one-line wiring change once the assets and storage are in place.

Constraints

Fully offline. The model and tokenizer ship in the app binary.
Deterministic. Same inputs produce the same vector.
Bounded latency. P95 reranking of top-25 candidates under 80 ms on mid-range Android.
No regression for the fuzzy-only path when the reranker is off.

Storage

Model identity

A model is identified by a single string:

modelId = "{architecture}-{revision}-{quant}"

Example: "all-minilm-l6-v2-r4-q8". The active modelId is stored in shared_preferences under the key embeddingModelId. Missing or unrecognised values force a re-encode at next launch.

Cached embeddings

Add a Drift table:

CREATE TABLE template_embeddings (
  model_id   TEXT NOT NULL,
  template_id TEXT NOT NULL REFERENCES template_tasks(id) ON DELETE CASCADE,
  vector     BLOB NOT NULL,
  dim        INTEGER NOT NULL,
  encoded_at INTEGER NOT NULL,
  PRIMARY KEY (model_id, template_id)
);

CREATE INDEX template_embeddings_model_idx ON template_embeddings (model_id);

The reranker reads only rows where model_id matches the active embeddingModelId. Vectors from older models are never mixed.

Bundled assets

Models live under app/assets/models/:

app/assets/models/
  {architecture}-{revision}-{quant}/
    model.tflite           # or model.onnx
    tokenizer.json
    config.json

pubspec.yaml lists the directory under flutter.assets.

Upgrade path

On app launch:

Read bundled modelId from config.json of the active asset directory.
Read stored modelId from shared_preferences.
If equal: do nothing.
If different or missing: schedule a background re-encode job.
- Encode every template_tasks row.
- Insert new rows under the new modelId with encoded_at = now.
- Once complete, atomically swap embeddingModelId to the new id.
- Mark the old model’s rows as stale (a background sweep deletes them after one launch cycle).
Until the swap completes, the reranker continues to read the old modelId. The fuzzy-only path always works as a fallback.

The user is never blocked by re-encoding. If the device is killed mid-encode, the swap simply does not happen and re-encoding resumes next launch.

Rollback

Old model rows are retained for at least one full launch cycle after a swap.
A failed swap (encode error, schema mismatch) reverts embeddingModelId to the previous value.
If a release ships with a regressed model, replacing the bundled asset directory in a hotfix triggers the same upgrade path. There is no separate “rollback flow”.

Compatibility rules

Vector dimensionality is part of the modelId. Cross-model arithmetic is forbidden.
The reranker treats unknown modelId rows as absent.
New template inserts at runtime are encoded lazily on first lookup if the row is missing for the active model.
Stored embeddings are never serialized off-device.
The reranker must compose with the fuzzy candidate generator; it never bypasses fuzzy filtering.

Reranker interface

The suggestion pipeline (features/tasks/domain/suggestion/cost_suggester.dart) exposes:

abstract interface class Reranker {
  Future<List<RankedCandidate>> rerank({
    required String query,
    required List<RankedCandidate> candidates, // top-K from fuzzy
  });
}

Default implementation is IdentityReranker (returns input unchanged). The embedding reranker, when implemented, replaces this binding without touching call sites.

Performance budget

Operation	Budget
Encode a single query	P95 under 30 ms mid-range Android
Rerank top-25 candidates	P95 under 80 ms mid-range Android
Cold-start re-encode of 500 templates	under 4 s background
Binary size cost of bundled model	25 MB or less (alarm above 25 MB; hard cap 50 MB)

These numbers are gates for any PR that turns the reranker on.

Acceptance criteria (ENG-39)

When the reranker is implemented, all of the following must hold:

​Status

​Why embeddings might be added later

​Constraints

​Storage

​Model identity

​Cached embeddings

​Bundled assets

​Upgrade path

​Rollback

​Compatibility rules

​Reranker interface

​Performance budget

​Acceptance criteria (ENG-39)

​Read next