feature/lab-monitoring #6

Merged
torjus merged 7 commits from feature/lab-monitoring into master 2026-02-04 22:48:23 +00:00
Owner

Summary

New MCP server (lab-monitoring) for querying Prometheus and Alertmanager HTTP APIs. Unlike the existing nixpkgs/options servers, this one queries live APIs directly — no database or indexing required.

Tools (8 total)

Tool API Description
list_alerts Alertmanager List alerts with optional filters (state, severity, receiver)
get_alert Alertmanager Get full details for a specific alert by fingerprint
search_metrics Prometheus Search metric names with substring filter, enriched with metadata
get_metric_metadata Prometheus Get type, help text, and unit for a specific metric
query Prometheus Execute instant PromQL query
list_targets Prometheus List scrape targets with health status
list_silences Alertmanager List active/pending silences
create_silence Alertmanager Create a silence (gated behind --enable-silences flag)

MCP core changes (internal/mcp/server.go)

  • Added ModeCustom server mode for MCP servers without database dependencies
  • Added NewGenericServer(logger, config) constructor (no store parameter)
  • Added RegisterTool(tool, handler) method for external tool registration
  • Added InstructionsFunc callback on ServerConfig for dynamic per-session instructions

Dynamic alert instructions

On MCP initialization, the server calls AlertSummary() to check for active non-silenced alerts. If any are firing, the count and severity breakdown are included in the server instructions so the LLM can proactively inform the user.

Security: create_silence opt-in

The create_silence tool is a write operation that can suppress alerts. It is disabled by default and requires explicit opt-in:

  • CLI: lab-monitoring serve --enable-silences
  • NixOS module: services.lab-monitoring.enableSilences = true;

New files

  • internal/monitoring/ — Full package with types, Prometheus client, Alertmanager client, markdown formatters, MCP handlers
  • cmd/lab-monitoring/main.go — CLI with serve, alerts, query, targets, metrics commands
  • nix/lab-monitoring-module.nix — NixOS module (DynamicUser, no state directory needed)

Test coverage

  • prometheus_test.go — httptest-based tests for all Prometheus client methods
  • alertmanager_test.go — httptest-based tests for all Alertmanager client methods
  • handlers_test.go — End-to-end handler tests through the MCP server, including tool count verification with and without silences enabled

Other changes

  • Version bump: existing binaries 0.2.0 → 0.2.1 (shared internal/mcp/server.go change)
  • Added lab-monitoring package and lab-monitoring-mcp NixOS module to flake.nix
  • Updated .mcp.json with dev config entry
  • Updated CLAUDE.md and TODO.md

Stats

19 files changed, ~2700 lines added

## Summary New MCP server (`lab-monitoring`) for querying Prometheus and Alertmanager HTTP APIs. Unlike the existing nixpkgs/options servers, this one queries live APIs directly — no database or indexing required. ### Tools (8 total) | Tool | API | Description | |------|-----|-------------| | `list_alerts` | Alertmanager | List alerts with optional filters (state, severity, receiver) | | `get_alert` | Alertmanager | Get full details for a specific alert by fingerprint | | `search_metrics` | Prometheus | Search metric names with substring filter, enriched with metadata | | `get_metric_metadata` | Prometheus | Get type, help text, and unit for a specific metric | | `query` | Prometheus | Execute instant PromQL query | | `list_targets` | Prometheus | List scrape targets with health status | | `list_silences` | Alertmanager | List active/pending silences | | `create_silence` | Alertmanager | Create a silence (gated behind `--enable-silences` flag) | ### MCP core changes (`internal/mcp/server.go`) - Added `ModeCustom` server mode for MCP servers without database dependencies - Added `NewGenericServer(logger, config)` constructor (no store parameter) - Added `RegisterTool(tool, handler)` method for external tool registration - Added `InstructionsFunc` callback on `ServerConfig` for dynamic per-session instructions ### Dynamic alert instructions On MCP initialization, the server calls `AlertSummary()` to check for active non-silenced alerts. If any are firing, the count and severity breakdown are included in the server instructions so the LLM can proactively inform the user. ### Security: `create_silence` opt-in The `create_silence` tool is a write operation that can suppress alerts. It is disabled by default and requires explicit opt-in: - CLI: `lab-monitoring serve --enable-silences` - NixOS module: `services.lab-monitoring.enableSilences = true;` ### New files - `internal/monitoring/` — Full package with types, Prometheus client, Alertmanager client, markdown formatters, MCP handlers - `cmd/lab-monitoring/main.go` — CLI with `serve`, `alerts`, `query`, `targets`, `metrics` commands - `nix/lab-monitoring-module.nix` — NixOS module (DynamicUser, no state directory needed) ### Test coverage - `prometheus_test.go` — httptest-based tests for all Prometheus client methods - `alertmanager_test.go` — httptest-based tests for all Alertmanager client methods - `handlers_test.go` — End-to-end handler tests through the MCP server, including tool count verification with and without silences enabled ### Other changes - Version bump: existing binaries 0.2.0 → 0.2.1 (shared `internal/mcp/server.go` change) - Added `lab-monitoring` package and `lab-monitoring-mcp` NixOS module to `flake.nix` - Updated `.mcp.json` with dev config entry - Updated `CLAUDE.md` and `TODO.md` ### Stats 19 files changed, ~2700 lines added
torjus added 3 commits 2026-02-04 22:25:42 +00:00
New MCP server that queries live Prometheus and Alertmanager HTTP APIs
with 8 tools: list_alerts, get_alert, search_metrics, get_metric_metadata,
query (PromQL), list_targets, list_silences, and create_silence.

Extends the MCP core with ModeCustom and NewGenericServer for servers
that don't require a database. Includes CLI with direct commands
(alerts, query, targets, metrics), NixOS module, and comprehensive
httptest-based tests.

Bumps existing binaries to 0.2.1 due to shared internal/mcp change.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add InstructionsFunc callback to ServerConfig, called during each
initialize handshake to generate dynamic instructions. The lab-monitoring
server uses this to query Alertmanager and include a count of active
non-silenced alerts, so the LLM can proactively inform the user.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The create_silence tool is a write operation that can suppress alerts.
Disable it by default and require explicit opt-in via --enable-silences
CLI flag (or enableSilences NixOS option) as a safety measure.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
torjus added 2 commits 2026-02-04 22:33:53 +00:00
torjus added 2 commits 2026-02-04 22:46:41 +00:00
torjus merged commit b491a60105 into master 2026-02-04 22:48:23 +00:00
torjus deleted branch feature/lab-monitoring 2026-02-04 22:48:24 +00:00
This repo is archived. You cannot comment on pull requests.
No Reviewers
No Label
1 Participants
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: torjus/labmcp#6