Reconnaissance
Reconnaissance
Reconnaissance observes the target system after effectuation — collecting metrics, checking health, and gathering the data needed to evaluate fitness. It's the eyes of the optimization loop.
Role in the Optimization Loop
┌─────────────────────────────────────────────────────────────┐
│ Optimization Loop │
│ │
│ Effectuator ──▶ Target System ──▶ Reconnaissance │
│ (apply) (changed) (observe) │
│ │
│ │ │
│ ▼ │
│ Metrics & │
│ Observations │
│ │
└─────────────────────────────────────────────────────────────┘
Reconnaissance answers: "What happened after we made that change?"
Reconnaissance Interface
┌─────────────────────────────────────────────────────────────┐
│ Reconnaissance │
│ │
│ Input: Trial context (what was applied) │
│ │
│ Action: Collect observations from target │
│ │
│ Output: { metric_a: 123.4, metric_b: "ok", ... } │
│ │
└─────────────────────────────────────────────────────────────┘
| Phase | Responsibility |
|---|---|
| Wait | Allow system to reach steady state |
| Collect | Gather metrics from sources |
| Aggregate | Combine multiple observations |
| Report | Return structured data |
Built-in Reconnaissance Sources
| Source | What it provides | Protocol |
|---|---|---|
| Prometheus | Time-series metrics | PromQL |
| HTTP | Health checks, stats | REST/HTTP |
| Custom scripts | Arbitrary observations | Exec |
Prometheus Reconnaissance
reconnaissance:
type: prometheus
endpoint: http://prometheus:9090
metrics:
- name: latency_p99
query: histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[1m]))
- name: throughput
query: sum(rate(http_requests_total[1m]))
- name: error_rate
query: sum(rate(http_errors_total[1m])) / sum(rate(http_requests_total[1m]))
Use for: Cloud-native systems, Kubernetes, microservices
HTTP Reconnaissance
reconnaissance:
type: http
endpoint: https://api.example.com/stats
method: GET
extract:
latency_p99: ".metrics.latency.p99"
active_connections: ".connections.active"
Use for: Services with stats endpoints, health APIs
Script Reconnaissance
reconnaissance:
type: script
command: /opt/scripts/collect_metrics.sh
# Script outputs JSON to stdout
parse: json
Use for: Legacy systems, databases, custom metrics
Timing and Sampling
Reconnaissance timing affects data quality:
┌─────────────────────────────────────────────────────────────────┐
│ Reconnaissance Timeline │
│ │
│ Effectuation Reconnaissance Window │
│ │ ├─────────────────┤ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ─────●───────────────────●─────────────────●──────────────▶ │
│ │ │ │ │
│ 0s wait: 30s duration: 60s │
│ │
│ Wait for steady state, then observe for duration │
└─────────────────────────────────────────────────────────────────┘
| Parameter | Purpose |
|---|---|
| wait | Delay before first observation (propagation time) |
| duration | How long to observe |
| interval | How often to sample within duration |
reconnaissance:
timing:
wait: 30s # Let changes propagate
duration: 60s # Observe for 1 minute
interval: 5s # Sample every 5 seconds
Aggregation
Multiple samples → single observation:
| Aggregation | Use Case |
|---|---|
| mean | Typical value |
| median | Robust to outliers |
| p95/p99 | Tail behavior |
| max | Worst case |
| min | Best case |
reconnaissance:
metrics:
- name: latency
query: ...
aggregation: p99 # Report 99th percentile
Multi-Source Reconnaissance
Combine observations from multiple sources:
reconnaissance:
sources:
- name: prometheus
type: prometheus
endpoint: http://prometheus:9090
metrics:
- latency_p99
- throughput
- name: app_stats
type: http
endpoint: http://app:8080/stats
extract:
- active_threads
- queue_depth
All sources contribute to the trial's observation data.
Handling Missing Data
Reconnaissance may fail to collect some metrics:
| Scenario | Handling |
|---|---|
| Metric unavailable | Mark trial failed |
| Partial data | Continue with available metrics |
| Timeout | Fail or use partial data |
reconnaissance:
on_missing: fail # or: ignore, default_value
timeout: 30s
Noise and Stability
Real metrics have noise:
┌─────────────────────────────────────────────────────────────┐
│ Noisy Observations │
│ │
│ True value: 50ms │
│ Observed: [48, 52, 47, 55, 49, 51, 48, 53, ...] │
│ │
│ Aggregation smooths noise for fitness evaluation │
│ │
└─────────────────────────────────────────────────────────────┘
Mitigation strategies:
| Strategy | How it helps |
|---|---|
| Longer duration | More samples = less variance |
| Robust aggregation | Median resists outliers |
| Multiple trials | Average over repeated evaluations |
| Replication | Run same config multiple times |
Reconnaissance vs Objectives
| Reconnaissance | Objectives |
|---|---|
| Collects raw metrics | Computes fitness |
| System-specific | Problem-specific |
| Multiple values | Single/directional |
| Descriptive | Evaluative |
Reconnaissance: { latency_p99: 45ms, throughput: 8000, error_rate: 0.001 }
│
▼
Objective: minimize latency_p99 ──▶ fitness = 45
Summary
| Aspect | What it means |
|---|---|
| Role | Observe system after effectuation |
| Sources | Prometheus, HTTP, scripts |
| Timing | Wait for steady state, then sample |
| Aggregation | Combine samples into observations |
| Output | Metrics dict used for fitness and guardrails |
See Also
- Effectuator — What reconnaissance observes
- Guardrails — Check reconnaissance data against limits
- Breeder — Orchestrates the optimization loop