Cost suggestion pipeline

Purpose

When you add a task, Canthus suggests a starting relativeCost and durationMinutes. The pipeline must be deterministic, offline, fast, and never prescriptive. This page is the contract.

Vocabulary

Term	Meaning
Query	The trimmed, normalized title you typed
Candidate	A `TemplateTask` row that survived candidate generation
Score	A number in `[0, 1]` representing match strength
Confidence	The score of the top-ranked candidate
Suggestion	A `CostSuggestion` value object returned to the UI

Stages

Normalize the query: lowercase, trim, ascii-fold, strip punctuation, collapse whitespace, split into tokens.
Content reduction: drop a fixed English stopword list (a, the, had, did, went, …) and apply a conservative suffix stripper (groceries -> grocery, emails -> email, running -> run). The same reduction is applied to indexed template titles so query and target are compared on equal terms.
Candidate generation by fuzzy similarity against pre-tokenized template titles.
Rerank (optional, deferred): if an embedding reranker is plugged in, it reorders the top-K candidates.
Threshold decision: pick prefill, suggest, or fallback path based on top score.
Construct CostSuggestion: bundle template, computed relativeCost, durationMinutes, and confidence for the UI.

Candidate generation

Fuzzy score combines three normalized similarity functions weighted as a base score, with a token-containment floor that rescues short queries that are a strict subset of a longer template title:

base = 0.5 * tokenSortRatio
     + 0.3 * jaroWinkler
     + 0.2 * normalizedLevenshtein

containment = max(
    |query intersect target| / |query union target|,
    |query intersect target| / |query|,
)

fuzzyScore = max(base, containment)

Function	Why it is included
`tokenSortRatio`	Robust to word reorder (`walk dog` vs `dog walk`)
`jaroWinkler`	Strong on short strings and shared prefixes
`normalizedLevenshtein`	Penalizes character-level edits
`tokenContainment`	Rescues short queries (`tea`, `groceries`) that are fully contained in a longer template title

Weights are an opinionated starting point. Tuning happens through the evaluation harness in cost-suggestion-evaluation.mdx.

Ranking

All templates are scored. For 500 entries this is well under the latency budget.
Tie-break: stable lex order on templateId so identical scores produce a deterministic ranking.
Candidates with fuzzyScore < 0.40 are dropped before threshold evaluation; this prunes obvious noise.

Thresholds

Top score	Path	UX behaviour
`>= 0.85`	Prefill	Use the top template’s `netMET` to compute `relativeCost`. Show the template name as “Looks like X”. User may dismiss or edit.
`0.60 - 0.85`	Suggest	Show top 3 candidates as chips. No prefill. User picks or dismisses.
`< 0.60`	Fallback	No template hit. Ask the user a 1-5 rating. Compute `relativeCost` from rating.

Thresholds are tunable via the evaluation harness. Any change to thresholds must be traceable to an evaluation run.

`relativeCost` derivation

Template path

For a template with activity-only MET value netMET:

\text{relativeCost} = \max(\text{netMET} - 1.0,\ 0.1)

This matches the task-costing spec. The -1.0 removes the resting metabolic baseline; the max(_, 0.1) floor keeps the cost above zero.

Rating fallback path

When confidence is below the suggest threshold, the user picks a 1-5 rating. With the user’s current personalCoefficient:

\text{impliedCoefficient} = 0.8 \times 1.8^{(\text{rating} - 1)}

\text{relativeCost} = \frac{\text{impliedCoefficient}}{\text{personalCoefficient}}

The relative cost floor 0.1 applies here too.

`durationMinutes` derivation

Source	Behaviour
Template hit with `durationMinutes` set	Use the template duration.
Template hit with no duration	Default `10`.
Rating fallback	Default `10`.

Users can edit the duration in the More options section of the add-task sheet.

Caching

An LRU memo of size 64 keyed by normalized query holds suggestions for the duration of a process.
No persistent cache. The fuzzy index is cheap to rebuild and avoids cache-invalidation concerns when the template set changes.

Determinism

All inputs are local: query plus template index.
No randomness. No network. No timers.
Same query against same template index always produces the same suggestion.

Latency budget

Device class	Target
Mid-range Android	P95 < 50 ms per query
iPhone (last 4 generations)	P95 < 30 ms per query

The evaluation harness asserts these budgets.

Safety framing rules

These rules are non-negotiable. They protect users from feeling judged or prescribed-to.

Never auto-apply. A suggestion is a starting point. The user always confirms.
Tentative copy. Use phrasing like “looks like”, “we’d guess”, “might be similar to”. Never “this costs X” or “you should rate it Y”.
Editable everywhere. relativeCost, durationMinutes, and any rating are editable before submit and after creation (cost_override_sheet).
No silent learning. A suggestion does not write data on its own. Only confirmed task creation persists.
Reversible. Clear “back” or “use my own” affordance from any prefill.
No ranking implications. Suggestion confidence is not exposed as a numerical score in the UI; it only drives presentation choice.
No fallback shaming. When no match is found, copy must be neutral: “Tell us how heavy this feels”. Never “this isn’t in our list” or similar.

These rules cross-reference copy-system.mdx and the user contract.

”No good match” behaviour

When topScore < 0.60:

The suggestion area collapses to a 1-5 rating row with neutral copy.
The user can submit immediately after picking a rating; duration defaults to 10.
The form does not block submission to wait for a “better” match.

Outputs

The pipeline returns a sealed CostSuggestion:

sealed class CostSuggestion {}

class TemplateMatch extends CostSuggestion {
  TemplateTask template;
  List<TemplateTask> alternates; // size 0..2
  double relativeCost;
  int durationMinutes;
  double bodyWeight;
  double mindWeight;
  double confidence; // top score, [0, 1]
  bool prefill; // true if confidence >= prefillThreshold
}

class RatingFallback extends CostSuggestion {
  // No candidates passed threshold; UI must show rating selector.
}

class NoCandidates extends CostSuggestion {
  // Query is empty or below minimum length. UI shows nothing.
}

Acceptance criteria mapping

Criterion (ENG-91)	Where it lives
Pipeline spec defines stages and thresholds	This page (Stages, Thresholds)
Behaviour defined when no good match exists	This page (No good match behaviour)
Safety framing rules are explicit and enforced in UI	This page (Safety framing rules) plus widget tests

​Purpose

​Vocabulary

​Stages

​Candidate generation

​Ranking

​Thresholds

​relativeCost derivation

​Template path

​Rating fallback path

​durationMinutes derivation

​Caching

​Determinism

​Latency budget

​Safety framing rules

​”No good match” behaviour

​Outputs

​Acceptance criteria mapping

​Read next