Compare commits
3 Commits
fe80ec3576
...
4d33018285
| Author | SHA1 | Date | |
|---|---|---|---|
|
4d33018285
|
|||
|
678fd3d6de
|
|||
|
9d74aa5c04
|
@@ -154,6 +154,10 @@ For each stateful host, the procedure is:
|
||||
6. Verify Zigbee devices are still paired and communicating
|
||||
7. Decommission old VM
|
||||
|
||||
**Note:** ha1 currently has 2 GB RAM, which is consistently tight. Average memory usage has
|
||||
climbed from ~57% (30-day avg) to ~70% currently, with a 30-day low of only 187 MB free.
|
||||
Consider increasing to 4 GB when reprovisioning to allow headroom for additional integrations.
|
||||
|
||||
**Note:** ha1 is the highest-risk migration due to Zigbee device pairings. The Zigbee
|
||||
coordinator state in `/var/lib/zigbee2mqtt` should preserve pairings, but verify on a
|
||||
non-critical time window.
|
||||
|
||||
@@ -79,6 +79,33 @@ These services have adequate alerting and/or scrape targets:
|
||||
| Nix Cache (Harmonia, build-flakes) | Via Caddy | 4 alerts |
|
||||
| CA (step-ca) | Yes (port 9000) | 4 certificate alerts |
|
||||
|
||||
## Per-Service Resource Metrics (systemd-exporter)
|
||||
|
||||
### Current State
|
||||
|
||||
No per-service CPU, memory, or IO metrics are collected. The existing node-exporter systemd collector only provides unit state (active/inactive/failed), socket stats, and timer triggers. While systemd tracks per-unit resource usage via cgroups internally (visible in `systemctl status` and `systemd-cgtop`), this data is not exported to Prometheus.
|
||||
|
||||
### Available Solution
|
||||
|
||||
The `prometheus-systemd-exporter` package (v0.7.0) is available in nixpkgs with a ready-made NixOS module:
|
||||
|
||||
```nix
|
||||
services.prometheus.exporters.systemd.enable = true;
|
||||
```
|
||||
|
||||
**Options:** `enable`, `port`, `extraFlags`, `user`, `group`
|
||||
|
||||
This exporter reads cgroup data and exposes per-unit metrics including:
|
||||
- CPU seconds consumed per service
|
||||
- Memory usage per service
|
||||
- Task/process counts per service
|
||||
- Restart counts
|
||||
- IO usage
|
||||
|
||||
### Recommendation
|
||||
|
||||
Enable on all hosts via the shared `system/` config (same pattern as node-exporter). Add a corresponding scrape job on monitoring01. This would give visibility into resource consumption per service across the fleet, useful for capacity planning and diagnosing noisy-neighbor issues on shared hosts.
|
||||
|
||||
## Suggested Priority
|
||||
|
||||
1. **PostgreSQL** - Critical infrastructure, easy to add with existing nixpkgs module
|
||||
|
||||
31
docs/plans/zigbee-sensor-battery-monitoring.md
Normal file
31
docs/plans/zigbee-sensor-battery-monitoring.md
Normal file
@@ -0,0 +1,31 @@
|
||||
# Zigbee Sensor Battery Monitoring
|
||||
|
||||
## Problem
|
||||
|
||||
Three Aqara Zigbee temperature sensors report `battery: 0` in their MQTT payload, making the `hass_sensor_battery_percent` Prometheus metric useless for battery monitoring on these devices.
|
||||
|
||||
Affected sensors:
|
||||
- **Temp Living Room** (`0x54ef441000a54d3c`) — area: living_room
|
||||
- **Temp Office** (`0x54ef441000a547bd`) — area: office
|
||||
- **temp_server** — area: server_room
|
||||
|
||||
The **Temp Bedroom** sensor (`0x00124b0025495463`) is a different model and reports battery correctly (69% at time of investigation).
|
||||
|
||||
## Findings
|
||||
|
||||
- All three sensors are actively reporting temperature, humidity, and pressure data — they are not dead.
|
||||
- The Zigbee2MQTT payload includes a `voltage` field (e.g., `2707` = 2.707V), which indicates healthy battery levels (~40-60% for a CR2032 coin cell).
|
||||
- CR2032 voltage reference: ~3.0V fresh, ~2.7V mid-life, ~2.1V dead.
|
||||
- The `voltage` field is not exposed as a Prometheus metric — it exists only in the MQTT payload.
|
||||
- This is a known firmware quirk with some Aqara sensors that always report 0% battery.
|
||||
|
||||
## Possible Solutions
|
||||
|
||||
### 1. Expose voltage as a Prometheus metric
|
||||
Enable the voltage sensor entities in Home Assistant (they may exist but be disabled by default). The HA Prometheus integration would then export them automatically.
|
||||
|
||||
### 2. Calculate battery from voltage in Zigbee2MQTT
|
||||
Override the battery calculation using the voltage field. Approximate formula: `(voltage - 2100) / (3000 - 2100) * 100`.
|
||||
|
||||
### 3. Alert on sensor staleness instead
|
||||
Create a Prometheus alert based on `hass_last_updated_time_seconds` going stale (e.g., no temperature update in 1 hour). This detects dead sensors regardless of battery reporting accuracy.
|
||||
Reference in New Issue
Block a user