zigbee-battery-fix #21
109
docs/plans/completed/zigbee-sensor-battery-monitoring.md
Normal file
109
docs/plans/completed/zigbee-sensor-battery-monitoring.md
Normal file
@@ -0,0 +1,109 @@
|
||||
# Zigbee Sensor Battery Monitoring
|
||||
|
||||
**Status:** Completed
|
||||
**Branch:** `zigbee-battery-fix`
|
||||
**Commit:** `c515a6b home-assistant: fix zigbee sensor battery reporting`
|
||||
|
||||
## Problem
|
||||
|
||||
Three Aqara Zigbee temperature sensors report `battery: 0` in their MQTT payload, making the `hass_sensor_battery_percent` Prometheus metric useless for battery monitoring on these devices.
|
||||
|
||||
Affected sensors:
|
||||
- **Temp Living Room** (`0x54ef441000a54d3c`) — WSDCGQ12LM
|
||||
- **Temp Office** (`0x54ef441000a547bd`) — WSDCGQ12LM
|
||||
- **temp_server** (`0x54ef441000a564b6`) — WSDCGQ12LM
|
||||
|
||||
The **Temp Bedroom** sensor (`0x00124b0025495463`) is a SONOFF SNZB-02 and reports battery correctly.
|
||||
|
||||
## Findings
|
||||
|
||||
- All three sensors are actively reporting temperature, humidity, and pressure data — they are not dead.
|
||||
- The Zigbee2MQTT payload includes a `voltage` field (e.g., `2707` = 2.707V), which indicates healthy battery levels (~40-60% for a CR2032 coin cell).
|
||||
- CR2032 voltage reference: ~3.0V fresh, ~2.7V mid-life, ~2.1V dead.
|
||||
- The `voltage` field is not exposed as a Prometheus metric — it exists only in the MQTT payload.
|
||||
- This is a known firmware quirk with some Aqara WSDCGQ12LM sensors that always report 0% battery.
|
||||
|
||||
## Device Inventory
|
||||
|
||||
Full list of Zigbee devices on ha1 (12 total):
|
||||
|
||||
| Device | IEEE Address | Model | Type |
|
||||
|--------|-------------|-------|------|
|
||||
| temp_server | 0x54ef441000a564b6 | WSDCGQ12LM | Temperature sensor (battery fix applied) |
|
||||
| (Temp Living Room) | 0x54ef441000a54d3c | WSDCGQ12LM | Temperature sensor (battery fix applied) |
|
||||
| (Temp Office) | 0x54ef441000a547bd | WSDCGQ12LM | Temperature sensor (battery fix applied) |
|
||||
| (Temp Bedroom) | 0x00124b0025495463 | SNZB-02 | Temperature sensor (battery works) |
|
||||
| (Water leak) | 0x54ef4410009ac117 | SJCGQ12LM | Water leak sensor |
|
||||
| btn_livingroom | 0x54ef441000a1f907 | WXKG13LM | Wireless mini switch |
|
||||
| btn_bedroom | 0x54ef441000a1ee71 | WXKG13LM | Wireless mini switch |
|
||||
| (Hue bulb) | 0x001788010dc35d06 | 9290024688 | Hue E27 1100lm (Router) |
|
||||
| (Hue bulb) | 0x001788010dc5f003 | 9290024688 | Hue E27 1100lm (Router) |
|
||||
| (Hue ceiling) | 0x001788010e371aa4 | 915005997301 | Hue Infuse medium (Router) |
|
||||
| (Hue ceiling) | 0x001788010d253b99 | 915005997301 | Hue Infuse medium (Router) |
|
||||
| (Hue wall) | 0x001788010d1b599a | 929003052901 | Hue Sana wall light (Router, transition=5) |
|
||||
|
||||
## Implementation
|
||||
|
||||
### Solution 1: Calculate battery from voltage in Zigbee2MQTT (Implemented)
|
||||
|
||||
Override the Home Assistant battery entity's `value_template` in Zigbee2MQTT device configuration to calculate battery percentage from voltage.
|
||||
|
||||
**Formula:** `(voltage - 2100) / 9` (maps 2100-3000mV to 0-100%)
|
||||
|
||||
**Changes in `services/home-assistant/default.nix`:**
|
||||
- Device configuration moved from external `devices.yaml` to inline NixOS config
|
||||
- Three affected sensors have `homeassistant.sensor_battery.value_template` override
|
||||
- All 12 devices now declaratively managed
|
||||
|
||||
**Expected battery values based on current voltages:**
|
||||
| Sensor | Voltage | Expected Battery |
|
||||
|--------|---------|------------------|
|
||||
| Temp Living Room | 2710 mV | ~68% |
|
||||
| Temp Office | 2658 mV | ~62% |
|
||||
| temp_server | 2765 mV | ~74% |
|
||||
|
||||
### Solution 2: Alert on sensor staleness (Implemented)
|
||||
|
||||
Added Prometheus alert `zigbee_sensor_stale` in `services/monitoring/rules.yml` that fires when a Zigbee temperature sensor hasn't updated in over 1 hour. This provides defense-in-depth for detecting dead sensors regardless of battery reporting accuracy.
|
||||
|
||||
**Alert details:**
|
||||
- Expression: `(time() - hass_last_updated_time_seconds{entity=~"sensor\\.(0x[0-9a-f]+|temp_server)_temperature"}) > 3600`
|
||||
- Severity: warning
|
||||
- For: 5m
|
||||
|
||||
## Pre-Deployment Verification
|
||||
|
||||
### Backup Verification
|
||||
|
||||
Before deployment, verified ha1 backup configuration and ran manual backup:
|
||||
|
||||
**Backup paths:**
|
||||
- `/var/lib/hass` ✓
|
||||
- `/var/lib/zigbee2mqtt` ✓
|
||||
- `/var/lib/mosquitto` ✓
|
||||
|
||||
**Manual backup (2026-02-05 22:45:23):**
|
||||
- Snapshot ID: `59704dfa`
|
||||
- Files: 77 total (0 new, 13 changed, 64 unmodified)
|
||||
- Data: 62.635 MiB processed, 6.928 MiB stored (compressed)
|
||||
|
||||
### Other directories reviewed
|
||||
|
||||
- `/var/lib/vault` — Contains AppRole credentials; not backed up (can be re-provisioned via Ansible)
|
||||
- `/var/lib/sops-nix` — Legacy; ha1 uses Vault now
|
||||
|
||||
## Post-Deployment Steps
|
||||
|
||||
After deploying to ha1:
|
||||
|
||||
1. Restart zigbee2mqtt service (automatic on NixOS rebuild)
|
||||
2. In Home Assistant, the battery entities may need to be re-discovered:
|
||||
- Go to Settings → Devices & Services → MQTT
|
||||
- The new `value_template` should take effect after entity re-discovery
|
||||
- If not, try disabling and re-enabling the battery entities
|
||||
|
||||
## Notes
|
||||
|
||||
- Device configuration is now declarative in NixOS. Future device additions via Zigbee2MQTT frontend will need to be added to the NixOS config to persist.
|
||||
- The `devices.yaml` file on ha1 will be overwritten on service start but can be removed after confirming the new config works.
|
||||
- The NixOS zigbee2mqtt module defaults to `devices = "devices.yaml"` but our explicit inline config overrides this.
|
||||
@@ -1,31 +0,0 @@
|
||||
# Zigbee Sensor Battery Monitoring
|
||||
|
||||
## Problem
|
||||
|
||||
Three Aqara Zigbee temperature sensors report `battery: 0` in their MQTT payload, making the `hass_sensor_battery_percent` Prometheus metric useless for battery monitoring on these devices.
|
||||
|
||||
Affected sensors:
|
||||
- **Temp Living Room** (`0x54ef441000a54d3c`) — area: living_room
|
||||
- **Temp Office** (`0x54ef441000a547bd`) — area: office
|
||||
- **temp_server** — area: server_room
|
||||
|
||||
The **Temp Bedroom** sensor (`0x00124b0025495463`) is a different model and reports battery correctly (69% at time of investigation).
|
||||
|
||||
## Findings
|
||||
|
||||
- All three sensors are actively reporting temperature, humidity, and pressure data — they are not dead.
|
||||
- The Zigbee2MQTT payload includes a `voltage` field (e.g., `2707` = 2.707V), which indicates healthy battery levels (~40-60% for a CR2032 coin cell).
|
||||
- CR2032 voltage reference: ~3.0V fresh, ~2.7V mid-life, ~2.1V dead.
|
||||
- The `voltage` field is not exposed as a Prometheus metric — it exists only in the MQTT payload.
|
||||
- This is a known firmware quirk with some Aqara sensors that always report 0% battery.
|
||||
|
||||
## Possible Solutions
|
||||
|
||||
### 1. Expose voltage as a Prometheus metric
|
||||
Enable the voltage sensor entities in Home Assistant (they may exist but be disabled by default). The HA Prometheus integration would then export them automatically.
|
||||
|
||||
### 2. Calculate battery from voltage in Zigbee2MQTT
|
||||
Override the battery calculation using the voltage field. Approximate formula: `(voltage - 2100) / (3000 - 2100) * 100`.
|
||||
|
||||
### 3. Alert on sensor staleness instead
|
||||
Create a Prometheus alert based on `hass_last_updated_time_seconds` going stale (e.g., no temperature update in 1 hour). This detects dead sensors regardless of battery reporting accuracy.
|
||||
@@ -69,6 +69,44 @@
|
||||
frontend = true;
|
||||
permit_join = false;
|
||||
serial.port = "/dev/ttyUSB0";
|
||||
|
||||
# Inline device configuration (replaces devices.yaml)
|
||||
# This allows declarative management and homeassistant overrides
|
||||
devices = {
|
||||
# Temperature sensors with battery fix
|
||||
# WSDCGQ12LM sensors report battery: 0 due to firmware quirk
|
||||
# Override battery calculation using voltage (mV): (voltage - 2100) / 9
|
||||
"0x54ef441000a547bd" = {
|
||||
friendly_name = "0x54ef441000a547bd";
|
||||
homeassistant.sensor_battery.value_template = "{{ (((value_json.voltage | float) - 2100) / 9) | round(0) | int | min(100) | max(0) }}";
|
||||
};
|
||||
"0x54ef441000a54d3c" = {
|
||||
friendly_name = "0x54ef441000a54d3c";
|
||||
homeassistant.sensor_battery.value_template = "{{ (((value_json.voltage | float) - 2100) / 9) | round(0) | int | min(100) | max(0) }}";
|
||||
};
|
||||
"0x54ef441000a564b6" = {
|
||||
friendly_name = "temp_server";
|
||||
homeassistant.sensor_battery.value_template = "{{ (((value_json.voltage | float) - 2100) / 9) | round(0) | int | min(100) | max(0) }}";
|
||||
};
|
||||
|
||||
# Other sensors
|
||||
"0x00124b0025495463".friendly_name = "0x00124b0025495463"; # SONOFF temp sensor (battery works)
|
||||
"0x54ef4410009ac117".friendly_name = "0x54ef4410009ac117"; # Water leak sensor
|
||||
|
||||
# Buttons
|
||||
"0x54ef441000a1f907".friendly_name = "btn_livingroom";
|
||||
"0x54ef441000a1ee71".friendly_name = "btn_bedroom";
|
||||
|
||||
# Philips Hue lights
|
||||
"0x001788010d1b599a" = {
|
||||
friendly_name = "0x001788010d1b599a";
|
||||
transition = 5;
|
||||
};
|
||||
"0x001788010d253b99".friendly_name = "0x001788010d253b99";
|
||||
"0x001788010e371aa4".friendly_name = "0x001788010e371aa4";
|
||||
"0x001788010dc5f003".friendly_name = "0x001788010dc5f003";
|
||||
"0x001788010dc35d06".friendly_name = "0x001788010dc35d06";
|
||||
};
|
||||
};
|
||||
};
|
||||
}
|
||||
|
||||
@@ -226,6 +226,14 @@ groups:
|
||||
annotations:
|
||||
summary: "Mosquitto not running on {{ $labels.instance }}"
|
||||
description: "Mosquitto has been down on {{ $labels.instance }} more than 5 minutes."
|
||||
- alert: zigbee_sensor_stale
|
||||
expr: (time() - hass_last_updated_time_seconds{entity=~"sensor\\.(0x[0-9a-f]+|temp_server)_temperature"}) > 3600
|
||||
for: 5m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "Zigbee sensor {{ $labels.friendly_name }} is stale"
|
||||
description: "Zigbee temperature sensor {{ $labels.entity }} has not reported data for over 1 hour. The sensor may have a dead battery or connectivity issues."
|
||||
- name: smartctl_rules
|
||||
rules:
|
||||
- alert: smart_critical_warning
|
||||
|
||||
Reference in New Issue
Block a user