← Back to tutorials

Skeleton-of-Thought: Build the Skeleton First, Then Fill in the Flesh—Making Long Answers Fast and Well-Structured

A prompt technique that boosts both speed and structure, with a principle so simple it's easy to overlook

Skeleton-of-Thought: Build the Skeleton First, Then Fill in the Flesh

When asking a model to write a long answer, two common issues arise: either the structure is loose and rambling, or the generation is painfully slow. Skeleton-of-Thought (SoT) tackles both problems at once.

The idea is remarkably simple: First, have the model list an outline (skeleton) of the answer, then expand each outline point individually (fill in the flesh).

Why This Works Better

Clearer structure: With a skeleton in place, the model won't drift off-topic as it writes; each paragraph corresponds to a clear point.

Faster too: This is the counterintuitive part. Once the skeleton is laid out, expanding each point is independent, so they can be generated in parallel. A long text that would normally be written sequentially can be split into several segments running concurrently, reducing total time. This was the original selling point of the SoT paper—speedup.

Two Phases

Phase 1: Build the skeleton


For the question "{question}", first list only the key points of the answer as an outline,
each point in one sentence, no expansion, 3-7 points.

The model might return:


  • Definition and background
  • Core advantages
  • Main limitations
  • Applicable scenarios
  • Practical tips
  • Phase 2: Fill in each point: For each skeleton point, send a separate request to expand it (these requests can be sent concurrently):

    
    For the point "{skeleton point}", expand it in 2-3 sentences in the context of the question "{question}".
    

    Finally, concatenate the expanded content in the order of the skeleton to form a complete, well-structured answer.

    When to Use

  • Structured long answers: evaluations, comparisons, plans, checklists.
  • Latency-sensitive yet long content: leverage parallelism to reduce time.
  • Scenarios requiring clarity: reports, documentation, knowledge organization.
  • When Not to Use

    Avoid for strong logical chains. For mathematical proofs or step-by-step reasoning problems, the points are not independent; splitting them in parallel can break the logic. Stick with chain-of-thought for those.

    Short answers don't need it. For one- or two-sentence replies, building a skeleton is overkill.

    Parallelism adds engineering complexity. To truly benefit from speedup, you need to concurrently call the API multiple times in code and then concatenate in order. If you're just using it manually in a chat interface, you only get the "better structure" benefit, not the speedup.

    A Simplified Version

    If you don't want to deal with parallelism, you can still apply the SoT idea in a single prompt:

    
    When answering "{question}", please:
    1) First list 3-5 key points in one line as an outline;
    2) Then expand each point one by one.
    

    Although there's no parallel speedup, the structural constraint of "skeleton first, then expansion" alone can make the answer significantly more organized.

    Combining with Other Techniques

    SoT handles "structure," while chain-of-thought handles "reasoning." They solve different problems. When both structure and reasoning are needed, you can first use SoT to build the framework, then apply CoT for points that require reasoning. To build a solid foundation in prompt engineering, check out Prompt Engineering 101.

    Summary

    Skeleton-of-Thought delivers on both "structure" and "speedup." Used manually, you get the structural benefit; engineered with parallel calls, you also get the speedup.

    Also available in 中文.