RICK TSUI
AI Researcher & Developer
Exploring the intersection of artificial intelligence, language, and human interaction through practical applications and innovative research.
When we started building TaskWeaver, prompt engineering felt like alchemy—mysterious incantations that sometimes worked, often didn’t, and rarely worked the same way twice. After a year of rigorous testing and hundreds of prompt iterations, we’ve identified stable patterns that significantly improve reliability.
Here are five prompt engineering patterns we now use consistently across our systems.
1. The Meta-Cognitive Frame
The most powerful pattern we’ve discovered is what we call the “meta-cognitive frame”—structuring prompts to make the model reflect on its own reasoning process.
Before:
Generate a plan to analyze the sales data in the attached CSV file. |
After:
Generate a plan to analyze the sales data in the attached CSV file. |
This simple addition forces the model to double-check its own reasoning and has reduced “hallucinated steps” by nearly 60% in our internal benchmarks.
2. The Chain-of-Thought Escalation
While standard chain-of-thought prompting is well-documented, we’ve refined it into what we call “chain-of-thought escalation”—starting with simple reasoning and progressively increasing complexity.
Before:
Look at the provided Python code and identify any security vulnerabilities. |
After:
Look at the provided Python code. First, list all external inputs the code processes. Second, for each input, trace how it flows through the program. Third, identify points where these inputs could cause security issues. Finally, categorize each vulnerability by severity and suggest fixes. |
The gradual increase in reasoning complexity produces more thorough analyses and fewer missed issues.
3. The Context Window Manager
Large prompts often push against context window limits. Our “context window manager” pattern strategically allocates the limited context space:
- Core Instructions: 10-15% of context window
- Critical Examples: 20-25%
- Input Data: Remaining space, prioritized by relevance
We also use progressive summarization techniques when handling large documents:
For document analysis: |
This approach has allowed us to process documents 3-4x larger than our nominal context window would typically allow.
4. The Output Structure Enforcer
One of our earliest struggles was getting consistent output formats. The “output structure enforcer” pattern solved this:
Before:
Analyze these user reviews and provide insights. |
After:
Analyze these user reviews and provide insights in the following JSON format: |
What’s interesting is that we found being extremely specific about format requirements actually improves the semantic quality of responses, not just their structure. Our theory is that the format constraint forces the model to think more systematically.
5. The Two-Stage Review Pattern
Finally, our favorite pattern for critical applications is the “two-stage review”:
Stage 1: Generate a proposed solution to [problem]. |
By having the model critique its own work before finalizing, we’ve seen error rates drop by 40-50% compared to one-shot generation.
Implementation in Code
Here’s a simplified example of how we implement these patterns in our Python code:
def generate_task_plan(task_description, available_tools, examples=None): |
Conclusion
Prompt engineering has evolved from unpredictable art to structured engineering at TaskWeaver. These five patterns have become part of our standard development process, dramatically improving both reliability and capability.
What’s most fascinating is how these patterns interact with model improvements. As newer models emerge, our patterns continue to provide benefits—suggesting they’re tapping into fundamental aspects of how language models reason.
If you’re building systems with language models, I’d love to hear which patterns have worked for you.