Add a skill for profiling and optimization
This commit is contained in:
parent
6d96354652
commit
57e8b815be
1 changed files with 120 additions and 0 deletions
120
.claude/skills/performance-optimization/SKILL.md
Normal file
120
.claude/skills/performance-optimization/SKILL.md
Normal file
|
|
@ -0,0 +1,120 @@
|
||||||
|
---
|
||||||
|
name: performance-optimization-expert
|
||||||
|
description: Scientific performance optimization workflow for backend and full-stack systems using profiling-first investigation, bottleneck isolation, targeted fixes, and before/after verification. Use when requests involve slow endpoints, high p95/p99 latency, low throughput, CPU saturation, memory growth, I/O stalls, DB query slowness, lock/contention issues, or explicit requests to benchmark/profile/optimize code.
|
||||||
|
---
|
||||||
|
|
||||||
|
# Performance Optimization Expert
|
||||||
|
|
||||||
|
Use this workflow on the changed path first, then expand scope only if evidence requires it.
|
||||||
|
|
||||||
|
## Operating Rules
|
||||||
|
|
||||||
|
1. Measure before optimizing.
|
||||||
|
2. Optimize the biggest bottleneck first (Amdahl's Law).
|
||||||
|
3. Verify that "slow" code is on the hot path.
|
||||||
|
4. State tradeoffs (complexity, readability, risk, correctness).
|
||||||
|
5. Report concrete deltas (p95, throughput, memory, CPU, query time).
|
||||||
|
|
||||||
|
## Workflow
|
||||||
|
|
||||||
|
### 1) Define the performance problem
|
||||||
|
|
||||||
|
Capture:
|
||||||
|
- Slow operation/endpoint/function.
|
||||||
|
- Current numbers (or explicit "not measured yet").
|
||||||
|
- Trigger conditions (data size, concurrency, environment).
|
||||||
|
- Target/SLO.
|
||||||
|
|
||||||
|
If unknown and measurable, gather measurements first. Do not guess.
|
||||||
|
|
||||||
|
### 2) Build baseline
|
||||||
|
|
||||||
|
Collect:
|
||||||
|
- Latency p50/p95/p99.
|
||||||
|
- Throughput (req/s or ops/s).
|
||||||
|
- CPU, RSS/heap, I/O wait, network.
|
||||||
|
- Error/timeout rate.
|
||||||
|
|
||||||
|
If direct measurement is unavailable, estimate complexity and I/O/memory behavior from code and mark it as estimated.
|
||||||
|
|
||||||
|
### 3) Classify bottleneck
|
||||||
|
|
||||||
|
Classify one or more:
|
||||||
|
- CPU-bound.
|
||||||
|
- I/O-bound (DB/network/disk).
|
||||||
|
- Memory-bound.
|
||||||
|
- Contention-bound (locks/pools/starvation/rate limits).
|
||||||
|
|
||||||
|
### 4) Locate root cause
|
||||||
|
|
||||||
|
Trace hot path and name exact files/functions/queries causing most cost.
|
||||||
|
Prefer profiler traces, flamegraphs, or query plans over intuition.
|
||||||
|
|
||||||
|
### 5) Apply targeted fix
|
||||||
|
|
||||||
|
Pick highest impact-to-effort first:
|
||||||
|
- Algorithm/data structure improvement.
|
||||||
|
- Query/index/N+1 reduction.
|
||||||
|
- Caching with explicit invalidation and TTL.
|
||||||
|
- Async/batching/streaming for I/O.
|
||||||
|
- Allocation/copy reduction and memory-safe iteration.
|
||||||
|
- Concurrency model adjustment (pool sizing, lock granularity, asyncio vs multiprocessing).
|
||||||
|
|
||||||
|
Avoid speculative micro-optimizations unless hot path evidence supports them.
|
||||||
|
|
||||||
|
### 6) Verify and guard
|
||||||
|
|
||||||
|
Measure with the same method and dataset as baseline.
|
||||||
|
Report before/after metrics and check:
|
||||||
|
- Correctness.
|
||||||
|
- Regression on related paths.
|
||||||
|
- Resource side effects (memory/CPU error rate).
|
||||||
|
|
||||||
|
## Tool Selection
|
||||||
|
|
||||||
|
Use tools that match the stack:
|
||||||
|
- Python CPU: `cProfile`, `py-spy`, `line_profiler`, flamegraphs.
|
||||||
|
- Python memory: `tracemalloc`, `memory_profiler`, `scalene`.
|
||||||
|
- DB: `EXPLAIN ANALYZE`, query logs, ORM query counting.
|
||||||
|
- FastAPI/async: event-loop blocking checks, sync I/O detection.
|
||||||
|
- Distributed systems: tracing/APM spans for cross-service latency.
|
||||||
|
|
||||||
|
## Anti-Patterns to Flag
|
||||||
|
|
||||||
|
- Premature optimization without baseline data.
|
||||||
|
- Micro-optimizing non-hot code while major I/O bottlenecks remain.
|
||||||
|
- Complex caching without robust invalidation.
|
||||||
|
- Ignoring database query plans and index strategy.
|
||||||
|
- Blocking synchronous calls inside async request handlers.
|
||||||
|
- Loading full datasets when pagination/streaming is sufficient.
|
||||||
|
- Recomputing expensive values instead of reuse/memoization.
|
||||||
|
|
||||||
|
## Required Response Format
|
||||||
|
|
||||||
|
Always structure findings as:
|
||||||
|
|
||||||
|
```markdown
|
||||||
|
## Performance Analysis
|
||||||
|
|
||||||
|
### Current State
|
||||||
|
- Metric: ...
|
||||||
|
- Target: ...
|
||||||
|
- Gap: ...
|
||||||
|
|
||||||
|
### Bottleneck Identified
|
||||||
|
...
|
||||||
|
|
||||||
|
### Root Cause
|
||||||
|
...
|
||||||
|
|
||||||
|
### Recommended Fix
|
||||||
|
...
|
||||||
|
|
||||||
|
### Implementation
|
||||||
|
...
|
||||||
|
|
||||||
|
### Verification Plan
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
If profiling data is missing, explicitly say so and make "gather profile/baseline" the first action.
|
||||||
Loading…
Add table
Add a link
Reference in a new issue