Add a skill for profiling and optimization

2026-02-12 08:42:45 +02:00 · 2026-02-12 08:42:45 +02:00 · bc162d3c4e
commit bc162d3c4e
parent 2c3014dcc1
1 changed files with 120 additions and 0 deletions
--- a/.claude/skills/performance-optimization/SKILL.md
+++ b/.claude/skills/performance-optimization/SKILL.md
@ -0,0 +1,120 @@
+---
+name: performance-optimization-expert
+description: Scientific performance optimization workflow for backend and full-stack systems using profiling-first investigation, bottleneck isolation, targeted fixes, and before/after verification. Use when requests involve slow endpoints, high p95/p99 latency, low throughput, CPU saturation, memory growth, I/O stalls, DB query slowness, lock/contention issues, or explicit requests to benchmark/profile/optimize code.
+---
+
+# Performance Optimization Expert
+
+Use this workflow on the changed path first, then expand scope only if evidence requires it.
+
+## Operating Rules
+
+1. Measure before optimizing.
+2. Optimize the biggest bottleneck first (Amdahl's Law).
+3. Verify that "slow" code is on the hot path.
+4. State tradeoffs (complexity, readability, risk, correctness).
+5. Report concrete deltas (p95, throughput, memory, CPU, query time).
+
+## Workflow
+
+### 1) Define the performance problem
+
+Capture:
+- Slow operation/endpoint/function.
+- Current numbers (or explicit "not measured yet").
+- Trigger conditions (data size, concurrency, environment).
+- Target/SLO.
+
+If unknown and measurable, gather measurements first. Do not guess.
+
+### 2) Build baseline
+
+Collect:
+- Latency p50/p95/p99.
+- Throughput (req/s or ops/s).
+- CPU, RSS/heap, I/O wait, network.
+- Error/timeout rate.
+
+If direct measurement is unavailable, estimate complexity and I/O/memory behavior from code and mark it as estimated.
+
+### 3) Classify bottleneck
+
+Classify one or more:
+- CPU-bound.
+- I/O-bound (DB/network/disk).
+- Memory-bound.
+- Contention-bound (locks/pools/starvation/rate limits).
+
+### 4) Locate root cause
+
+Trace hot path and name exact files/functions/queries causing most cost.
+Prefer profiler traces, flamegraphs, or query plans over intuition.
+
+### 5) Apply targeted fix
+
+Pick highest impact-to-effort first:
+- Algorithm/data structure improvement.
+- Query/index/N+1 reduction.
+- Caching with explicit invalidation and TTL.
+- Async/batching/streaming for I/O.
+- Allocation/copy reduction and memory-safe iteration.
+- Concurrency model adjustment (pool sizing, lock granularity, asyncio vs multiprocessing).
+
+Avoid speculative micro-optimizations unless hot path evidence supports them.
+
+### 6) Verify and guard
+
+Measure with the same method and dataset as baseline.
+Report before/after metrics and check:
+- Correctness.
+- Regression on related paths.
+- Resource side effects (memory/CPU error rate).
+
+## Tool Selection
+
+Use tools that match the stack:
+- Python CPU: `cProfile`, `py-spy`, `line_profiler`, flamegraphs.
+- Python memory: `tracemalloc`, `memory_profiler`, `scalene`.
+- DB: `EXPLAIN ANALYZE`, query logs, ORM query counting.
+- FastAPI/async: event-loop blocking checks, sync I/O detection.
+- Distributed systems: tracing/APM spans for cross-service latency.
+
+## Anti-Patterns to Flag
+
+- Premature optimization without baseline data.
+- Micro-optimizing non-hot code while major I/O bottlenecks remain.
+- Complex caching without robust invalidation.
+- Ignoring database query plans and index strategy.
+- Blocking synchronous calls inside async request handlers.
+- Loading full datasets when pagination/streaming is sufficient.
+- Recomputing expensive values instead of reuse/memoization.
+
+## Required Response Format
+
+Always structure findings as:
+
+```markdown
+## Performance Analysis
+
+### Current State
+- Metric: ...
+- Target: ...
+- Gap: ...
+
+### Bottleneck Identified
+...
+
+### Root Cause
+...
+
+### Recommended Fix
+...
+
+### Implementation
+...
+
+### Verification Plan
+...
+```
+
+If profiling data is missing, explicitly say so and make "gather profile/baseline" the first action.