5 Prompt Engineering Patterns We Use at TaskWeaver

Mar 2025
Updated May 2025
Reading time: 14 min
RICK TSUI

RICK TSUI

AI Researcher & Developer

Exploring the intersection of artificial intelligence, language, and human interaction through practical applications and innovative research.

When we started building TaskWeaver, prompt engineering felt like alchemy—mysterious incantations that sometimes worked, often didn’t, and rarely worked the same way twice. After a year of rigorous testing and hundreds of prompt iterations, we’ve identified stable patterns that significantly improve reliability.

Here are five prompt engineering patterns we now use consistently across our systems.

1. The Meta-Cognitive Frame

The most powerful pattern we’ve discovered is what we call the “meta-cognitive frame”—structuring prompts to make the model reflect on its own reasoning process.

Before:

Generate a plan to analyze the sales data in the attached CSV file.

After:

Generate a plan to analyze the sales data in the attached CSV file.
Before finalizing your plan, identify potential assumptions you're making and validate whether they're correct based on the information provided.

This simple addition forces the model to double-check its own reasoning and has reduced “hallucinated steps” by nearly 60% in our internal benchmarks.

2. The Chain-of-Thought Escalation

While standard chain-of-thought prompting is well-documented, we’ve refined it into what we call “chain-of-thought escalation”—starting with simple reasoning and progressively increasing complexity.

Before:

Look at the provided Python code and identify any security vulnerabilities.

After:

Look at the provided Python code. First, list all external inputs the code processes. Second, for each input, trace how it flows through the program. Third, identify points where these inputs could cause security issues. Finally, categorize each vulnerability by severity and suggest fixes.

The gradual increase in reasoning complexity produces more thorough analyses and fewer missed issues.

3. The Context Window Manager

Large prompts often push against context window limits. Our “context window manager” pattern strategically allocates the limited context space:

  1. Core Instructions: 10-15% of context window
  2. Critical Examples: 20-25%
  3. Input Data: Remaining space, prioritized by relevance

We also use progressive summarization techniques when handling large documents:

For document analysis:
1. First pass: Identify key sections
2. Second pass: Focus on those sections only
3. Third pass: Analyze relationships between extracted information

This approach has allowed us to process documents 3-4x larger than our nominal context window would typically allow.

4. The Output Structure Enforcer

One of our earliest struggles was getting consistent output formats. The “output structure enforcer” pattern solved this:

Before:

Analyze these user reviews and provide insights.

After:

Analyze these user reviews and provide insights in the following JSON format:
{
"main_themes": [string],
"positive_points": [string],
"improvement_areas": [string],
"sentiment_score": number,
"key_action_items": [string]
}
For each array, provide exactly 3-5 items. Ensure the sentiment_score is between 0 and 10.

What’s interesting is that we found being extremely specific about format requirements actually improves the semantic quality of responses, not just their structure. Our theory is that the format constraint forces the model to think more systematically.

5. The Two-Stage Review Pattern

Finally, our favorite pattern for critical applications is the “two-stage review”:

Stage 1: Generate a proposed solution to [problem].
Stage 2: You are an expert reviewer evaluating the solution. Identify potential issues or edge cases the solution doesn't address.
Stage 3: Revise the original solution to address the identified issues.

By having the model critique its own work before finalizing, we’ve seen error rates drop by 40-50% compared to one-shot generation.

Implementation in Code

Here’s a simplified example of how we implement these patterns in our Python code:

def generate_task_plan(task_description, available_tools, examples=None):
# Apply the Meta-Cognitive Frame
prompt = f"""
You are TaskWeaver's planning agent. Generate a plan to accomplish: {task_description}

Available tools: {json.dumps(available_tools)}

{examples if examples else ''}

Generate your plan following these steps:
1. Identify the key components needed to accomplish this task
2. For each component, determine which available tools to use
3. Create a sequence of steps with specific tool calls
4. Review your plan for missing steps or incorrect assumptions
5. Revise the plan based on your review

Output your final plan in JSON format:
{{
"steps": [
{{"name": "step_name", "tool": "tool_name", "parameters": {{...}} }}
],
"reasoning": "explanation of your approach"
}}
"""

return call_llm(prompt)

Conclusion

Prompt engineering has evolved from unpredictable art to structured engineering at TaskWeaver. These five patterns have become part of our standard development process, dramatically improving both reliability and capability.

What’s most fascinating is how these patterns interact with model improvements. As newer models emerge, our patterns continue to provide benefits—suggesting they’re tapping into fundamental aspects of how language models reason.

If you’re building systems with language models, I’d love to hear which patterns have worked for you.