From Conversations to Clarity: Applying LLM Candidate Distillation to Product Backlog Generation

Product discovery is fundamentally a process of interpreting ambiguity. Stakeholder conversations—whether held in meetings, captured in emails, or reflected in support tickets—are rich in context but often diffuse, overlapping, and unstructured. Transforming this raw input into a coherent, prioritized product backlog remains one of the most cognitively intensive tasks in product management.

A recent white paper from the University of Washington and the Allen Institute, titled “Prompt Candidates, then Distill: A Teacher-Student Framework for LLM-driven Data Annotation”, introduces a compelling solution to this class of problem. The authors propose CanDist, a two-phase framework that separates candidate generation (by a large language model, or LLM) from decision distillation (by a smaller, targeted model, or SLM). Rather than forcing an LLM to commit prematurely to a single answer, CanDist encourages it to surface multiple plausible interpretations, which are then refined downstream.

This decomposition of tasks reflects how experienced product teams often operate—brainstorming broadly before converging on a concise definition of value. The CanDist framework offers a blueprint for systematizing this process at scale.

Adapted CanDist Flow for Backlog Generation

Stage	Role of LLM (Teacher)	Role of SLM (Student)
Candidate Generation	Generate multiple plausible user stories/features from stakeholder input (e.g., meeting transcripts, emails, notes	Collect and organize candidate stories with metadata (e.g., intent, urgency, dependencies)
Candidate Evaluation	Provide multiple priority rationales (e.g., "critical for MVP", "regulatory need")	Synthesize and score priority across competing rationales
Distillation & Selection	Summarize, deduplicate, and select the best story candidates	Rank backlog by value, urgency, and feasibility

Applied Opportunities in Product Management

1. User Story Generation from Ambiguous Conversations

Stakeholder input is frequently underspecified or expressed in divergent ways. The LLM can generate multiple candidate interpretations, ensuring that no intent is prematurely excluded. The SLM then distills these options into a single, coherent user story that aligns with system constraints, prior roadmap decisions, and organizational goals.

2. Capturing Stakeholder Intent Without Overfitting

By preserving a range of candidate formulations, the system avoids over-indexing on any individual stakeholder’s language, assumptions, or priorities. This reduces bias and supports multi-stakeholder alignment.

3. Prioritization Based on Multi-Factor Criteria

LLMs can propose justifications across various dimensions (e.g., "critical for compliance," "low engineering complexity," "high NPS impact"). The SLM evaluates and reconciles these dimensions into a structured priority ranking, using composite scoring or decision trees informed by historical delivery data.

4. Auto-Tagging and Thematic Grouping

The LLM can suggest multiple thematic tags per story (e.g., usability, data integrity, internationalization), while the SLM normalizes tags into a predefined taxonomy. This ensures consistent backlog segmentation for team ownership, reporting, and roadmap planning.

5. Change Management and Scope Evolution

As stakeholder priorities shift, the LLM can produce alternative versions of affected backlog items. The SLM compares these changes to the existing backlog and resolves conflicts, maintaining coherence while adapting to change.

Example Workflow

Input: Transcript from a product strategy meeting.

LLM Output (candidate user stories):

"As a customer, I want real-time shipping updates."
"As an ops manager, I want visibility into warehouse delays."
"As a logistics partner, I want alerts on SLA breaches."

SLM Tasks:

Merge overlapping intents into a unified feature.
Evaluate urgency and strategic alignment based on current themes (e.g., delivery experience, SLA compliance).
Annotate with metadata: business impact, technical dependencies, stakeholder ownership.

Final Backlog Item:

User Story: "As a stakeholder, I want real-time shipping and delay notifications across the logistics chain."
Tags: logistics, real-time, SLA
Priority: High
Dependencies: Alerting service upgrade, warehouse API access

Tooling Considerations for Implementation

To operationalize this architecture:

Candidate Generation (LLM): Use a general-purpose model such as OpenAI GPT-4, Anthropic Claude, or Google Gemini to extract candidate stories from raw stakeholder inputs.
Distillation (SLM): Use a fine-tuned lightweight model (e.g., DistilBERT, LLaMA 2, or Mistral) or hybridize with heuristic scoring and vector similarity for context-aware resolution.
Integration: Connect with backlog tools such as Jira, Linear, or Azure DevOps via API to push validated stories, tags, and metadata directly into your planning system.
Feedback Loop: Incorporate delivery data, sprint retrospectives, and stakeholder feedback to fine-tune both LLM prompts and SLM scoring over time.

Conclusion

The CanDist framework formalizes a highly effective pattern for transforming unstructured input into structured, decision-ready artifacts. For product managers, it represents a path toward scalable, auditable, AI-assisted backlog generation—without sacrificing stakeholder nuance or delivery quality.

At Forte Group, this approach aligns directly with our Concerto framework for AI-augmented delivery, in which humans orchestrate and supervise multi-agent AI systems to deliver faster, more traceable outcomes. The CanDist model is not simply an annotation technique—it is a design pattern for enterprise-grade reasoning at scale.

In the age of LLMs, the backlog should no longer be the product of manual synthesis alone. It can—and should—be the result of structured orchestration between models and humans, working in concert.