oubliette/PLAN.md

# Oubliette - SSH Honeypot

A fun SSH honeypot that logs login attempts, presents fake shells to "successful" logins, and tries to detect when a real human is poking around.

The name comes from the medieval dungeon concept - a place you throw people into and forget about them.

## Tech Stack

- **Language:** Go
- **SSH:** golang.org/x/crypto/ssh
- **Database:** SQLite
- **Web UI:** Go templates + htmx
- **Deployment:** Single binary with embedded assets

## Core Concepts

### Shell Profiles
Logins that "succeed" are routed to a fake shell. Shells are selected by weighted random from a registry. Each shell implements a common interface, making it easy to add new ones.

```go
type Shell interface {
    Name() string
    Description() string
    Handle(ctx context.Context, ch ssh.Channel) error
}
```

### Smart Storage
To avoid the database growing unbounded on a small VPS:
- **Deduplication:** Store unique (username, password, IP) combinations with a count + first_seen/last_seen timestamps instead of one row per attempt.
- **Retention policy:** Configurable auto-pruning of records older than N days.
- **Aggregation:** Optionally roll up old raw data into daily summary tables before pruning.

### Human Detection
Score sessions based on signals that distinguish humans from bots:
- Keystroke timing (variable delays vs instant paste)
- Typos and backspace usage
- Tab completion and arrow key usage
- Adaptive behavior (commands that respond to previous output)
- Command diversity
- Session duration

Sessions crossing a human-likelihood threshold get flagged for review and can trigger webhook notifications.

### Login Realism
- Don't accept every attempt. Most attempts should fail. Bots commonly try thousands of combinations from a single IP (20k+ is not unusual), so the acceptance threshold should be high and configurable.
- **Credential memory:** When a credential is accepted, store it as a "valid" credential for a configurable TTL (e.g. 24-72 hours). If the same bot returns with the same username/password, it gets in immediately - making the credential appear legitimate and encouraging further interaction.
- Acceptance strategy is configurable: after N failed attempts from an IP, accept the next attempt (whatever the credentials are) and remember that combo.
- Optionally also support a static list of always-accepted credentials for testing.

---

## Phase 1 - Foundation

Goal: A working SSH honeypot that logs attempts, stores them in SQLite, and can present a basic fake shell. Minimal but functional.

### 1.1 Project Setup ✅
- Go module, directory structure, basic configuration (YAML or TOML)
- Configuration for: listen address, SSH host key path/auto-generation, database path, web UI listen address
- Nix flake with devshell and package output
- NixOS module for easy deployment (listen address, config path, state directory, etc.)

### 1.2 SSH Server ✅
- Listen for SSH connections using x/crypto/ssh
- Handle authentication callbacks
- Log all login attempts (username, password, source IP, timestamp)
- Configurable credential list that triggers "successful" login
- Basic login realism: reject first N attempts before accepting

### 1.3 SQLite Storage ✅
- Schema: login_attempts table with deduplication (username, password, ip, count, first_seen, last_seen)
- Schema: sessions table for successful logins (id, ip, username, shell_name, connected_at, disconnected_at, human_score)
- Schema: session_logs table for command logging (session_id, timestamp, input, output)
- Retention policy: background goroutine that prunes old records on a schedule
- **Database migrations:** Version-tracked migrations using embedded SQL files. Store current schema version in a `schema_version` table, apply pending migrations on startup. Keep it simple - no external migration tool, just sequential numbered `.sql` files embedded in the binary.

### 1.4 Shell Interface & Registry ✅
- Shell interface definition
- Registry with weighted random selection
- Basic bash-like shell:
  - Prompt that looks like `user@hostname:~$`
  - Handful of commands: `ls`, `cd`, `cat`, `pwd`, `whoami`, `uname`, `id`, `exit`
  - Fake filesystem with a few interesting-looking files
  - Log all input/output to the session_logs table

#### Session Context
Shells receive a `SessionContext` struct instead of just `ssh.Channel`, providing:
- `SessionID` (storage UUID)
- `Username` (authenticated user, from `ssh.ConnMetadata`)
- `RemoteAddr` (client IP, from `ssh.ConnMetadata`)
- `ClientVersion` (SSH client version string)
- `Store` (for session logging)

This lets shells build realistic prompts (`username@hostname:~$`) and log activity without needing direct access to the SSH connection.

#### Shell Configuration
- Define a `ShellConfig` sub-struct in the config with common fields: hostname, banner/MOTD, fake username
- Per-shell overrides via `map[string]map[string]any` (e.g. `[shell.bash]`, `[shell.cisco]`) so each Phase 3 shell can have its own knobs
- Shells receive the relevant config section, not the entire project config — keeps a clean boundary

#### Transparent I/O Recording (designed for 2.3 Session Replay)
- Wrap `ssh.Channel` in a `RecordingChannel` before passing it to the shell
- `RecordingChannel` intercepts every `Read` (client input) and `Write` (server output), logging raw byte chunks with precise timestamps to storage
- Shells don't need to know about recording — they just read/write normally
- This ensures consistent, complete capture regardless of shell implementation, and avoids needing to refactor shells when session replay is added in Phase 2.3
- The current `session_logs` schema (input/output text pairs) may need a companion `session_keystrokes` table with `(session_id, timestamp, direction, data)` for byte-level replay fidelity — evaluate when implementing

### 1.5 Minimal Web UI ✅
- Embedded static assets (Go embed)
- Dashboard: total attempts, attempts over time, unique IPs
- Tables: top usernames, top passwords, top source IPs
- List of active/recent sessions

---

## Phase 2 - Detection & Notification

Goal: Detect likely-human sessions and make the system smarter.

### 2.1 Human Detection Scoring ✅
- Keystroke timing analysis
- Track backspace, tab, arrow key usage
- Command diversity scoring
- Compute per-session human score, store in sessions table
- Flag sessions above configurable threshold

### 2.2 Notifications ✅
- Webhook support (generic HTTP POST, works with Slack/Discord/ntfy)
- Trigger on: human score threshold crossed, new session started, configurable
- Include session details in payload

### 2.3 Session Replay ✅
- Store keystroke-by-keystroke data with timing information
- Web UI: replay a session in a terminal-like viewer, watching commands play back in real-time
- Filter/sort sessions by human score

### 2.4 Adaptive Shell Routing
- If early keystrokes suggest a bot, route to basic shell or disconnect
- If keystrokes suggest a human, route to a more interesting shell

---

## Phase 3 - Fun Shells

Goal: Add the entertaining shell implementations.

### 3.1 Bash Shell Variations
- **Infinite sudo:** always asks for password, never works, logs every attempt
- **Slow decay:** shell gets progressively slower, commands take longer and longer
- **Haunted:** commands gradually return stranger output, files appear/disappear, `whoami` returns different users
- **Bread crumbs:** fake .bash_history, id_rsa files, database configs pointing to other honeypots

### 3.2 Cisco IOS Shell ✅
- Realistic `>` and `#` prompts
- Common commands: `show running-config`, `show interfaces`, `enable`, `configure terminal`
- Fake device info that looks like a real router

### 3.3 Smart Fridge Shell ✅
- Samsung FridgeOS boot banner
- Inventory management commands
- Temperature warnings
- "WARNING: milk expires in 2 days"
- Per-credential shell routing via `shell` field in static credentials

### 3.4 Text Adventure ✅
- Zork-style dungeon crawler
- "You are in a dimly lit server room."
- Navigation, items, puzzles
- The dungeon is the oubliette itself

### 3.5 Banking TUI Shell ✅
- 80s-style green-on-black bank terminal

### 3.6 Other Shell Ideas (Future)
- **Nuclear launch terminal:** "ENTER LAUNCH AUTHORIZATION CODE"
- **ELIZA therapist:** every response is a therapy question
- **Pizza ordering terminal:** "Welcome to PizzaNet v2.3"
- **Haiku shell:** every response is a haiku

---

## Phase 4 - Polish

Goal: Make the web UI great and add operational niceties.

### 4.1 Enhanced Web UI
- GeoIP lookups and world map visualization of attack sources
- Charts: attempts over time, hourly patterns, credential trends
- Session detail view with full command log
- Filtering and search

### 4.2 Operational ✅
- Prometheus metrics endpoint ✅
- Structured logging (slog) ✅
- Graceful shutdown ✅
- Docker image (nix dockerTools) ✅
- Systemd unit file / deployment docs ✅

### 4.3 GeoIP ✅
- Embed a lightweight GeoIP database or use an API ✅
- Store country/city with each attempt ✅
- Aggregate stats by country ✅

### 4.4 Capture SSH Exec Commands
Many bots send a command directly via `ssh user@host <command>` (an SSH "exec" request) rather than requesting an interactive shell. Currently these are rejected and the command is lost. We should capture them.

- Handle `"exec"` request type in the server's request loop (alongside `"pty-req"` and `"shell"`)
- Parse the command string from the exec payload
- Add an `exec_command` column (nullable) to the `sessions` table via a new migration
- Store the command on the session record before closing the channel
- Optionally return plausible fake output for common commands (e.g. `uname`, `id`, `cat /etc/passwd`) to encourage further interaction
- Surface exec commands in the web UI (session detail view)

#### 4.4.1 Fake Exec Output
Return plausible fake output for exec commands to encourage bots to interact further.

**Approach: regex-based output assembly.** Bots typically send a single long command that chains recon commands and then echoes a summary (e.g. `echo "UNAME:$uname"`). Rather than interpreting arbitrary shell pipelines, we scan the command string for known patterns and assemble fake output.

Implementation:
- A map of common command/variable patterns to fake output strings, e.g.:
  - `uname -a` / `uname -s -v -n -m` → `"Linux ubuntu-server 5.15.0-91-generic #101-Ubuntu SMP Tue Jan 2 15:13:10 UTC 2024 x86_64"`
  - `uname -m` / `arch` → `"x86_64"`
  - `cat /proc/uptime` → `"86432.71 172801.55"`
  - `nproc` / `grep -c "^processor" /proc/cpuinfo` → `"2"`
  - `cat /proc/cpuinfo` → fake cpuinfo block
  - `lspci` → empty (no GPU — discourages cryptominer targeting)
  - `id` → `"uid=0(root) gid=0(root) groups=0(root)"`
  - `cat /etc/passwd` → minimal fake passwd file
  - `last` → fake login entries
  - `cat --help`, `ls --help` → canned GNU coreutils help text
- Scan the exec command for `echo "KEY:$var"` patterns; for each key, look up the corresponding fake value from the variable assignment earlier in the command
- If we recognise echo patterns, assemble and return the expected output
- If we don't recognise the command at all, return empty output with exit 0 (current behaviour)
- Values should draw from the existing shell config where possible (hostname, fake_user) for consistency
- New package `internal/execfake` or a file in `internal/server/` — keep it simple

Gather more real-world bot examples before implementing to ensure good coverage of common recon patterns.