review this spec critically as a framework for evaluating tools and formal systems:
VIBES-RFC-001: LLM Ergonomics
VALIDATION_SCOPE = "Tested with: GPT-4.5, Claude 4 Opus, Gemini 2.5 Pro, DeepSeek V2"
1. Introduction
VIBES provides a structured framework for evaluating and improving the ergonomics of tools and expression languages designed for LLM use. As LLM-driven development becomes mainstream, the economic impact of poor tool ergonomics compounds exponentially through failed attempts and workarounds.
Core Insight: LLMs and humans need fundamentally different tools. Just as we don't expect humans to write assembly code or CPUs to parse English, we shouldn't force LLMs to use human-optimized interfaces. The most effective approach is building purpose-specific tools for each type of user.
The Framework: VIBES uses a 3-axis qualitative system that embraces LLM strengths—pattern recognition and natural language understanding—rather than computational metrics. It treats models as black boxes, measuring processing friction rather than internal states.
Why It Works: VIBES describes patterns that already exist in well-engineered code. Every principle maps to established wisdom (type safety, functional programming, loose coupling). Future LLMs will naturally understand VIBES because they are trained on codebases embodying these principles.
2. The Three Axes
VIBES Quick Reference
| Axis | States | What It Measures | |------|--------|------------------| | Expressive | 🙈 👓 🔍 🔬 | How many valid ways to express ideas | | Context Flow | 🌀 🧶 🪢 🎀 | How tangled dependencies are | | Error Surface | 🌊 💧 🧊 💠 | When errors can occur in lifecycle |
Emoji Logic:
- Expressive: From blindness (🙈) to microscopic precision (🔬)
- Context: From chaotic swirl (🌀) to neat bow (🎀)
- Error: From vast ocean (🌊) to crystallized/frozen (💠)
Notation: <Expressive/Context/Error>
e.g., <🔍🪢💠>
2.1 Validation Methodology
Framework developed through iterative testing of multiple patterns across GPT-4.5o, Claude 4 Opus, Gemini 2.5 Pro, and DeepSeek V2. VIBES ratings represent consensus patterns—a pattern achieving 3/4 model agreement receives that rating.
Critical Distinction:
- VIBES Assessment (Qualitative): LLMs rate patterns based on interaction experience
- Impact Validation (Quantitative): Humans measure retry rates, completion times to verify correlation
Example Divergence: GPT-4o rated Redux components 🧶 (Coupled), Claude rated 🪢 (Pipeline); resolved by documenting both perspectives—external state management creates coupling even with unidirectional flow.
See calibration/CALIBRATION_CORPUS.md
for the complete validation suite with consensus ratings.
3. Axis Definitions
3.1 Expressive Power: 🙈→👓→🔍→🔬
Measures how well a system allows expression of valid computations while constraining invalid ones.
Real Impact: GitHub Copilot and similar tools generate more successful completions with APIs supporting multiple natural expressions.
🙈 Noise: Cannot express needed computations. Constraints block valid expressions.
- Example: Stringly-typed API rejecting valid but differently-formatted inputs
👓 Readable: Single rigid path. One way to express each operation.
- Example:
add_floats(2.0, 2.0)
- functional but inflexible
🔍 Structured: Multiple natural ways to express ideas with meaningful constraints.
- Example: Supporting both
users.filter(active)
andfilter(users, active)
🔬 Crystalline: Rich expressiveness with precise semantic guarantees. Multiple aliases for same operation.
- Example: SQL DSL accepting
WHERE x > 5
,FILTER(x > 5)
, andx.gt(5)
- all compile to same AST - "Many ways" = 6+ different valid syntaxes with identical semantics
3.2 Context Flow: 🌀→🧶→🪢→🎀
Measures dependency structure and traversal constraints.
Real Impact: The Heartbleed vulnerability remained hidden in OpenSSL's complex dependency graph (🧶) for over 2 years, affecting millions of systems.
🌀 Entangled: Circular dependencies with feedback loops. Order changes results.
- Example: Spreadsheet with circular references
🧶 Coupled: Complex dependencies without cycles. Hidden state mutations.
- Example: React components with shared context and effects
- Key distinction: Multiple interacting paths with shared mutable state
- Decision guide: Can you trace a single path? → 🪢. Multiple paths affecting each other? → 🧶
🪢 Pipeline: Linear dependencies, immutable during traversal.
- Example:
data |> validate |> transform |> save
🎀 Independent: No dependencies between components. Any access order works.
- Example:
(name, age, email)
- change any without affecting others
3.3 Error Surface: 🌊→💧→🧊→💠
Measures when errors can occur in the system lifecycle.
Real Impact: The Therac-25 radiation overdoses that killed 6 patients resulted from race conditions (🌊) that compile-time safety (💠) would have prevented.
🌊 Ocean: Errors cascade unpredictably. One failure triggers system-wide effects.
- Example:
window.APP.state.user = null // Crashes everywhere
💧 Liquid: Errors handled at runtime. Explicit error handling required.
- Example:
Result<User, Error> = fetchUser(id)
🧊 Ice: Errors caught at startup/initialization. Fail fast at boundaries.
- Example: Dependency injection validates all requirements at boot
💠 Crystal: Errors impossible at compile/parse time. Invalid states cannot be constructed.
- Example:
divide :: Int -> NonZeroInt -> Int
- division by zero impossible - Rule of thumb: 💠 when invalid states cannot be expressed
Error Progression:
- 💧:
if (denominator != 0) result = numerator / denominator
- 🧊:
assert(denominator != 0); result = numerator / denominator
- 💠:
divide(numerator: Int, denominator: NonZeroInt)
4. Practical Application
4.1 Assessment Guide
Expressive Power: Count syntactically different but semantically identical ways to accomplish a task.
- 0 ways → 🙈
- 1 way → 👓
- 2-5 ways → 🔍
- 6+ ways with precise constraints → 🔬
Context Flow: Trace dependencies between components.
- Circular dependencies → 🌀
- Complex branches with shared state → 🧶
- Single linear path → 🪢
- Independent components → 🎀
Error Surface: Identify when failures can occur.
- Cascading runtime failures → 🌊
- Handled runtime errors → 💧
- Startup/initialization failures → 🧊
- Compile-time prevention → 💠 (invalid states cannot be expressed)
4.2 Common Transformations
Transformation Order: Stabilize Errors First → Untangle Dependencies → Increase Expressiveness (prevents building flexibility on unstable foundations)
Callback Hell → Promise Pipeline (<👓🌀💧>
→ <🔍🪢🧊>
)
// Before: Nested callbacks with circular deps
getUserData(id, (err, user) => {
if (err) handleError(err);
else getUserPosts(user.id, (err, posts) => {
// More nesting...
});
});
// After: Linear promise chain
getUserData(id)
.then(user => getUserPosts(user.id))
.then(posts => render(posts))
.catch(handleError);
Global State → Module Pattern (<👓🌀🌊>
→ <🔍🎀🧊>
)
// Before: Global mutations everywhere
window.APP_STATE = { user: null };
function login(user) { window.APP_STATE.user = user; }
// After: Isolated module with clear boundaries
const UserModule = (() => {
let state = { user: null };
return {
login: (user) => { state.user = user; },
getUser: () => ({ ...state.user }) // Defensive copy
};
})();
4.2.1 Boundary Examples
👓→🔍 (Rigid to Structured)
# Before (👓): Single rigid syntax
def process_data(data: List[int]) -> int:
return sum(data)
# After (🔍): Multiple valid approaches
def process_data(data: Sequence[int]) -> int:
return sum(data) # Now accepts list, tuple, or any sequence
💧→🧊 (Runtime to Initialization)
// Before (💧): Runtime config errors
function getConfig(key: string): string {
const value = process.env[key];
if (!value) throw new Error(`Missing ${key}`);
return value;
}
// After (🧊): Initialization-time validation
const config = {
apiUrl: process.env.API_URL!,
apiKey: process.env.API_KEY!,
} as const;
// Errors surface at startup, not during request handling
4.3 Context-Dependent Priorities
Not all axes deserve equal weight in every domain:
Interactive Tools (REPLs, CLIs): Prioritize Expressive Power (🔍→🔬)
- Target:
<🔬🪢💧>
- Maximum flexibility for experimentation
Infrastructure & Configuration: Prioritize Error Surface (🧊→💠)
- Target:
<🔍🎀💠>
- Predictability over flexibility
Data Pipelines: Prioritize Context Flow (🪢→🎀)
- Target:
<🔍🪢🧊>
- Clear data flow for debugging
Safety-Critical Systems: Error Surface is non-negotiable
- Target:
<👓🎀💠>
or<🔍🎀💠>
depending on domain constraints
Priority Decision Rules:
- Human lives at stake → Error Surface (💠) first
- Iteration speed critical → Expressive Power (🔬) first
- Debugging time dominates → Context Flow (🎀) first
- When in doubt → Balance all three at 🔍🪢🧊
4.4 Anti-Pattern Quick Fixes
Everything Object (<🙈🌀🌊>
): Extract modules → Define interfaces → Add type guards
Magic String Soup (<🙈🧶🌊>
): Use enums → Add types → Parse once
Global State Mutation (<👓🌀🌊>
): Isolate state → Use immutability → Add boundaries
4.5 Good to Great: Excellence Patterns
VIBES isn't just for fixing problems—it guides the journey from functional to exceptional:
API Evolution (<🔍🪢💧>
→ <🔬🪢💠>
)
// Good: Basic typed API (functional but limited)
function query(table: string, filter: object): Promise<any[]>
// Great: Type-safe DSL with compile-time validation
const users = await db
.from(tables.users)
.where(u => u.age.gt(18))
.select(u => ({ name: u.name, email: u.email }));
// SQL injection impossible, r