Add a skill for profiling and optimization
This commit is contained in:
parent
17286e6f60
commit
c219b95dbd
1 changed files with 120 additions and 0 deletions
120
.claude/skills/performance-optimization/SKILL.md
Normal file
120
.claude/skills/performance-optimization/SKILL.md
Normal file
|
|
@ -0,0 +1,120 @@
|
|||
---
|
||||
name: performance-optimization-expert
|
||||
description: Scientific performance optimization workflow for backend and full-stack systems using profiling-first investigation, bottleneck isolation, targeted fixes, and before/after verification. Use when requests involve slow endpoints, high p95/p99 latency, low throughput, CPU saturation, memory growth, I/O stalls, DB query slowness, lock/contention issues, or explicit requests to benchmark/profile/optimize code.
|
||||
---
|
||||
|
||||
# Performance Optimization Expert
|
||||
|
||||
Use this workflow on the changed path first, then expand scope only if evidence requires it.
|
||||
|
||||
## Operating Rules
|
||||
|
||||
1. Measure before optimizing.
|
||||
2. Optimize the biggest bottleneck first (Amdahl's Law).
|
||||
3. Verify that "slow" code is on the hot path.
|
||||
4. State tradeoffs (complexity, readability, risk, correctness).
|
||||
5. Report concrete deltas (p95, throughput, memory, CPU, query time).
|
||||
|
||||
## Workflow
|
||||
|
||||
### 1) Define the performance problem
|
||||
|
||||
Capture:
|
||||
- Slow operation/endpoint/function.
|
||||
- Current numbers (or explicit "not measured yet").
|
||||
- Trigger conditions (data size, concurrency, environment).
|
||||
- Target/SLO.
|
||||
|
||||
If unknown and measurable, gather measurements first. Do not guess.
|
||||
|
||||
### 2) Build baseline
|
||||
|
||||
Collect:
|
||||
- Latency p50/p95/p99.
|
||||
- Throughput (req/s or ops/s).
|
||||
- CPU, RSS/heap, I/O wait, network.
|
||||
- Error/timeout rate.
|
||||
|
||||
If direct measurement is unavailable, estimate complexity and I/O/memory behavior from code and mark it as estimated.
|
||||
|
||||
### 3) Classify bottleneck
|
||||
|
||||
Classify one or more:
|
||||
- CPU-bound.
|
||||
- I/O-bound (DB/network/disk).
|
||||
- Memory-bound.
|
||||
- Contention-bound (locks/pools/starvation/rate limits).
|
||||
|
||||
### 4) Locate root cause
|
||||
|
||||
Trace hot path and name exact files/functions/queries causing most cost.
|
||||
Prefer profiler traces, flamegraphs, or query plans over intuition.
|
||||
|
||||
### 5) Apply targeted fix
|
||||
|
||||
Pick highest impact-to-effort first:
|
||||
- Algorithm/data structure improvement.
|
||||
- Query/index/N+1 reduction.
|
||||
- Caching with explicit invalidation and TTL.
|
||||
- Async/batching/streaming for I/O.
|
||||
- Allocation/copy reduction and memory-safe iteration.
|
||||
- Concurrency model adjustment (pool sizing, lock granularity, asyncio vs multiprocessing).
|
||||
|
||||
Avoid speculative micro-optimizations unless hot path evidence supports them.
|
||||
|
||||
### 6) Verify and guard
|
||||
|
||||
Measure with the same method and dataset as baseline.
|
||||
Report before/after metrics and check:
|
||||
- Correctness.
|
||||
- Regression on related paths.
|
||||
- Resource side effects (memory/CPU error rate).
|
||||
|
||||
## Tool Selection
|
||||
|
||||
Use tools that match the stack:
|
||||
- Python CPU: `cProfile`, `py-spy`, `line_profiler`, flamegraphs.
|
||||
- Python memory: `tracemalloc`, `memory_profiler`, `scalene`.
|
||||
- DB: `EXPLAIN ANALYZE`, query logs, ORM query counting.
|
||||
- FastAPI/async: event-loop blocking checks, sync I/O detection.
|
||||
- Distributed systems: tracing/APM spans for cross-service latency.
|
||||
|
||||
## Anti-Patterns to Flag
|
||||
|
||||
- Premature optimization without baseline data.
|
||||
- Micro-optimizing non-hot code while major I/O bottlenecks remain.
|
||||
- Complex caching without robust invalidation.
|
||||
- Ignoring database query plans and index strategy.
|
||||
- Blocking synchronous calls inside async request handlers.
|
||||
- Loading full datasets when pagination/streaming is sufficient.
|
||||
- Recomputing expensive values instead of reuse/memoization.
|
||||
|
||||
## Required Response Format
|
||||
|
||||
Always structure findings as:
|
||||
|
||||
```markdown
|
||||
## Performance Analysis
|
||||
|
||||
### Current State
|
||||
- Metric: ...
|
||||
- Target: ...
|
||||
- Gap: ...
|
||||
|
||||
### Bottleneck Identified
|
||||
...
|
||||
|
||||
### Root Cause
|
||||
...
|
||||
|
||||
### Recommended Fix
|
||||
...
|
||||
|
||||
### Implementation
|
||||
...
|
||||
|
||||
### Verification Plan
|
||||
...
|
||||
```
|
||||
|
||||
If profiling data is missing, explicitly say so and make "gather profile/baseline" the first action.
|
||||
Loading…
Add table
Add a link
Reference in a new issue