Add a skill for profiling and optimization

2026-02-12 08:42:45 +02:00 · 2026-02-12 08:42:45 +02:00 · 57e8b815be
commit 57e8b815be
parent 6d96354652
1 changed files with 120 additions and 0 deletions
--- a/.claude/skills/performance-optimization/SKILL.md
+++ b/.claude/skills/performance-optimization/SKILL.md
@ -0,0 +1,120 @@
 ---
 name: performance-optimization-expert
 description: Scientific performance optimization workflow for backend and full-stack systems using profiling-first investigation, bottleneck isolation, targeted fixes, and before/after verification. Use when requests involve slow endpoints, high p95/p99 latency, low throughput, CPU saturation, memory growth, I/O stalls, DB query slowness, lock/contention issues, or explicit requests to benchmark/profile/optimize code.
 ---
 # Performance Optimization Expert
 Use this workflow on the changed path first, then expand scope only if evidence requires it.
 ## Operating Rules
 1. Measure before optimizing.
 2. Optimize the biggest bottleneck first (Amdahl's Law).
 3. Verify that "slow" code is on the hot path.
 4. State tradeoffs (complexity, readability, risk, correctness).
 5. Report concrete deltas (p95, throughput, memory, CPU, query time).
 ## Workflow
 ### 1) Define the performance problem
 Capture:
 - Slow operation/endpoint/function.
 - Current numbers (or explicit "not measured yet").
 - Trigger conditions (data size, concurrency, environment).
 - Target/SLO.
 If unknown and measurable, gather measurements first. Do not guess.
 ### 2) Build baseline
 Collect:
 - Latency p50/p95/p99.
 - Throughput (req/s or ops/s).
 - CPU, RSS/heap, I/O wait, network.
 - Error/timeout rate.
 If direct measurement is unavailable, estimate complexity and I/O/memory behavior from code and mark it as estimated.
 ### 3) Classify bottleneck
 Classify one or more:
 - CPU-bound.
 - I/O-bound (DB/network/disk).
 - Memory-bound.
 - Contention-bound (locks/pools/starvation/rate limits).
 ### 4) Locate root cause
 Trace hot path and name exact files/functions/queries causing most cost.
 Prefer profiler traces, flamegraphs, or query plans over intuition.
 ### 5) Apply targeted fix
 Pick highest impact-to-effort first:
 - Algorithm/data structure improvement.
 - Query/index/N+1 reduction.
 - Caching with explicit invalidation and TTL.
 - Async/batching/streaming for I/O.
 - Allocation/copy reduction and memory-safe iteration.
 - Concurrency model adjustment (pool sizing, lock granularity, asyncio vs multiprocessing).
 Avoid speculative micro-optimizations unless hot path evidence supports them.
 ### 6) Verify and guard
 Measure with the same method and dataset as baseline.
 Report before/after metrics and check:
 - Correctness.
 - Regression on related paths.
 - Resource side effects (memory/CPU error rate).
 ## Tool Selection
 Use tools that match the stack:
 - Python CPU: `cProfile`, `py-spy`, `line_profiler`, flamegraphs.
 - Python memory: `tracemalloc`, `memory_profiler`, `scalene`.
 - DB: `EXPLAIN ANALYZE`, query logs, ORM query counting.
 - FastAPI/async: event-loop blocking checks, sync I/O detection.
 - Distributed systems: tracing/APM spans for cross-service latency.
 ## Anti-Patterns to Flag
 - Premature optimization without baseline data.
 - Micro-optimizing non-hot code while major I/O bottlenecks remain.
 - Complex caching without robust invalidation.
 - Ignoring database query plans and index strategy.
 - Blocking synchronous calls inside async request handlers.
 - Loading full datasets when pagination/streaming is sufficient.
 - Recomputing expensive values instead of reuse/memoization.
 ## Required Response Format
 Always structure findings as:
 ```markdown
 ## Performance Analysis
 ### Current State
 - Metric: ...
 - Target: ...
 - Gap: ...
 ### Bottleneck Identified
 ...
 ### Root Cause
 ...
 ### Recommended Fix
 ...
 ### Implementation
 ...
 ### Verification Plan
 ...
 ```
 If profiling data is missing, explicitly say so and make "gather profile/baseline" the first action.