Add a skill for profiling and optimization

This commit is contained in:
Ohad Livne 2026-02-12 08:42:45 +02:00
parent 2c3014dcc1
commit bc162d3c4e
Signed by: libohad-dev
GPG key ID: 34FDC68B51191A4D

View file

@ -0,0 +1,120 @@
---
name: performance-optimization-expert
description: Scientific performance optimization workflow for backend and full-stack systems using profiling-first investigation, bottleneck isolation, targeted fixes, and before/after verification. Use when requests involve slow endpoints, high p95/p99 latency, low throughput, CPU saturation, memory growth, I/O stalls, DB query slowness, lock/contention issues, or explicit requests to benchmark/profile/optimize code.
---
# Performance Optimization Expert
Use this workflow on the changed path first, then expand scope only if evidence requires it.
## Operating Rules
1. Measure before optimizing.
2. Optimize the biggest bottleneck first (Amdahl's Law).
3. Verify that "slow" code is on the hot path.
4. State tradeoffs (complexity, readability, risk, correctness).
5. Report concrete deltas (p95, throughput, memory, CPU, query time).
## Workflow
### 1) Define the performance problem
Capture:
- Slow operation/endpoint/function.
- Current numbers (or explicit "not measured yet").
- Trigger conditions (data size, concurrency, environment).
- Target/SLO.
If unknown and measurable, gather measurements first. Do not guess.
### 2) Build baseline
Collect:
- Latency p50/p95/p99.
- Throughput (req/s or ops/s).
- CPU, RSS/heap, I/O wait, network.
- Error/timeout rate.
If direct measurement is unavailable, estimate complexity and I/O/memory behavior from code and mark it as estimated.
### 3) Classify bottleneck
Classify one or more:
- CPU-bound.
- I/O-bound (DB/network/disk).
- Memory-bound.
- Contention-bound (locks/pools/starvation/rate limits).
### 4) Locate root cause
Trace hot path and name exact files/functions/queries causing most cost.
Prefer profiler traces, flamegraphs, or query plans over intuition.
### 5) Apply targeted fix
Pick highest impact-to-effort first:
- Algorithm/data structure improvement.
- Query/index/N+1 reduction.
- Caching with explicit invalidation and TTL.
- Async/batching/streaming for I/O.
- Allocation/copy reduction and memory-safe iteration.
- Concurrency model adjustment (pool sizing, lock granularity, asyncio vs multiprocessing).
Avoid speculative micro-optimizations unless hot path evidence supports them.
### 6) Verify and guard
Measure with the same method and dataset as baseline.
Report before/after metrics and check:
- Correctness.
- Regression on related paths.
- Resource side effects (memory/CPU error rate).
## Tool Selection
Use tools that match the stack:
- Python CPU: `cProfile`, `py-spy`, `line_profiler`, flamegraphs.
- Python memory: `tracemalloc`, `memory_profiler`, `scalene`.
- DB: `EXPLAIN ANALYZE`, query logs, ORM query counting.
- FastAPI/async: event-loop blocking checks, sync I/O detection.
- Distributed systems: tracing/APM spans for cross-service latency.
## Anti-Patterns to Flag
- Premature optimization without baseline data.
- Micro-optimizing non-hot code while major I/O bottlenecks remain.
- Complex caching without robust invalidation.
- Ignoring database query plans and index strategy.
- Blocking synchronous calls inside async request handlers.
- Loading full datasets when pagination/streaming is sufficient.
- Recomputing expensive values instead of reuse/memoization.
## Required Response Format
Always structure findings as:
```markdown
## Performance Analysis
### Current State
- Metric: ...
- Target: ...
- Gap: ...
### Bottleneck Identified
...
### Root Cause
...
### Recommended Fix
...
### Implementation
...
### Verification Plan
...
```
If profiling data is missing, explicitly say so and make "gather profile/baseline" the first action.