3.5 KiB
| name | description |
|---|---|
| performance-optimization-expert | Scientific performance optimization workflow for backend and full-stack systems using profiling-first investigation, bottleneck isolation, targeted fixes, and before/after verification. Use when requests involve slow endpoints, high p95/p99 latency, low throughput, CPU saturation, memory growth, I/O stalls, DB query slowness, lock/contention issues, or explicit requests to benchmark/profile/optimize code. |
Performance Optimization Expert
Use this workflow on the changed path first, then expand scope only if evidence requires it.
Operating Rules
- Measure before optimizing.
- Optimize the biggest bottleneck first (Amdahl's Law).
- Verify that "slow" code is on the hot path.
- State tradeoffs (complexity, readability, risk, correctness).
- Report concrete deltas (p95, throughput, memory, CPU, query time).
Workflow
1) Define the performance problem
Capture:
- Slow operation/endpoint/function.
- Current numbers (or explicit "not measured yet").
- Trigger conditions (data size, concurrency, environment).
- Target/SLO.
If unknown and measurable, gather measurements first. Do not guess.
2) Build baseline
Collect:
- Latency p50/p95/p99.
- Throughput (req/s or ops/s).
- CPU, RSS/heap, I/O wait, network.
- Error/timeout rate.
If direct measurement is unavailable, estimate complexity and I/O/memory behavior from code and mark it as estimated.
3) Classify bottleneck
Classify one or more:
- CPU-bound.
- I/O-bound (DB/network/disk).
- Memory-bound.
- Contention-bound (locks/pools/starvation/rate limits).
4) Locate root cause
Trace hot path and name exact files/functions/queries causing most cost. Prefer profiler traces, flamegraphs, or query plans over intuition.
5) Apply targeted fix
Pick highest impact-to-effort first:
- Algorithm/data structure improvement.
- Query/index/N+1 reduction.
- Caching with explicit invalidation and TTL.
- Async/batching/streaming for I/O.
- Allocation/copy reduction and memory-safe iteration.
- Concurrency model adjustment (pool sizing, lock granularity, asyncio vs multiprocessing).
Avoid speculative micro-optimizations unless hot path evidence supports them.
6) Verify and guard
Measure with the same method and dataset as baseline. Report before/after metrics and check:
- Correctness.
- Regression on related paths.
- Resource side effects (memory/CPU error rate).
Tool Selection
Use tools that match the stack:
- Python CPU:
cProfile,py-spy,line_profiler, flamegraphs. - Python memory:
tracemalloc,memory_profiler,scalene. - DB:
EXPLAIN ANALYZE, query logs, ORM query counting. - FastAPI/async: event-loop blocking checks, sync I/O detection.
- Distributed systems: tracing/APM spans for cross-service latency.
Anti-Patterns to Flag
- Premature optimization without baseline data.
- Micro-optimizing non-hot code while major I/O bottlenecks remain.
- Complex caching without robust invalidation.
- Ignoring database query plans and index strategy.
- Blocking synchronous calls inside async request handlers.
- Loading full datasets when pagination/streaming is sufficient.
- Recomputing expensive values instead of reuse/memoization.
Required Response Format
Always structure findings as:
## Performance Analysis
### Current State
- Metric: ...
- Target: ...
- Gap: ...
### Bottleneck Identified
...
### Root Cause
...
### Recommended Fix
...
### Implementation
...
### Verification Plan
...
If profiling data is missing, explicitly say so and make "gather profile/baseline" the first action.