Regex-based output assembly: scan exec commands for known patterns and return plausible fake values rather than interpreting shell pipelines. Waiting on more real-world bot examples before implementing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
238 lines
11 KiB
Markdown
238 lines
11 KiB
Markdown
# Oubliette - SSH Honeypot
|
|
|
|
A fun SSH honeypot that logs login attempts, presents fake shells to "successful" logins, and tries to detect when a real human is poking around.
|
|
|
|
The name comes from the medieval dungeon concept - a place you throw people into and forget about them.
|
|
|
|
## Tech Stack
|
|
|
|
- **Language:** Go
|
|
- **SSH:** golang.org/x/crypto/ssh
|
|
- **Database:** SQLite
|
|
- **Web UI:** Go templates + htmx
|
|
- **Deployment:** Single binary with embedded assets
|
|
|
|
## Core Concepts
|
|
|
|
### Shell Profiles
|
|
Logins that "succeed" are routed to a fake shell. Shells are selected by weighted random from a registry. Each shell implements a common interface, making it easy to add new ones.
|
|
|
|
```go
|
|
type Shell interface {
|
|
Name() string
|
|
Description() string
|
|
Handle(ctx context.Context, ch ssh.Channel) error
|
|
}
|
|
```
|
|
|
|
### Smart Storage
|
|
To avoid the database growing unbounded on a small VPS:
|
|
- **Deduplication:** Store unique (username, password, IP) combinations with a count + first_seen/last_seen timestamps instead of one row per attempt.
|
|
- **Retention policy:** Configurable auto-pruning of records older than N days.
|
|
- **Aggregation:** Optionally roll up old raw data into daily summary tables before pruning.
|
|
|
|
### Human Detection
|
|
Score sessions based on signals that distinguish humans from bots:
|
|
- Keystroke timing (variable delays vs instant paste)
|
|
- Typos and backspace usage
|
|
- Tab completion and arrow key usage
|
|
- Adaptive behavior (commands that respond to previous output)
|
|
- Command diversity
|
|
- Session duration
|
|
|
|
Sessions crossing a human-likelihood threshold get flagged for review and can trigger webhook notifications.
|
|
|
|
### Login Realism
|
|
- Don't accept every attempt. Most attempts should fail. Bots commonly try thousands of combinations from a single IP (20k+ is not unusual), so the acceptance threshold should be high and configurable.
|
|
- **Credential memory:** When a credential is accepted, store it as a "valid" credential for a configurable TTL (e.g. 24-72 hours). If the same bot returns with the same username/password, it gets in immediately - making the credential appear legitimate and encouraging further interaction.
|
|
- Acceptance strategy is configurable: after N failed attempts from an IP, accept the next attempt (whatever the credentials are) and remember that combo.
|
|
- Optionally also support a static list of always-accepted credentials for testing.
|
|
|
|
---
|
|
|
|
## Phase 1 - Foundation
|
|
|
|
Goal: A working SSH honeypot that logs attempts, stores them in SQLite, and can present a basic fake shell. Minimal but functional.
|
|
|
|
### 1.1 Project Setup ✅
|
|
- Go module, directory structure, basic configuration (YAML or TOML)
|
|
- Configuration for: listen address, SSH host key path/auto-generation, database path, web UI listen address
|
|
- Nix flake with devshell and package output
|
|
- NixOS module for easy deployment (listen address, config path, state directory, etc.)
|
|
|
|
### 1.2 SSH Server ✅
|
|
- Listen for SSH connections using x/crypto/ssh
|
|
- Handle authentication callbacks
|
|
- Log all login attempts (username, password, source IP, timestamp)
|
|
- Configurable credential list that triggers "successful" login
|
|
- Basic login realism: reject first N attempts before accepting
|
|
|
|
### 1.3 SQLite Storage ✅
|
|
- Schema: login_attempts table with deduplication (username, password, ip, count, first_seen, last_seen)
|
|
- Schema: sessions table for successful logins (id, ip, username, shell_name, connected_at, disconnected_at, human_score)
|
|
- Schema: session_logs table for command logging (session_id, timestamp, input, output)
|
|
- Retention policy: background goroutine that prunes old records on a schedule
|
|
- **Database migrations:** Version-tracked migrations using embedded SQL files. Store current schema version in a `schema_version` table, apply pending migrations on startup. Keep it simple - no external migration tool, just sequential numbered `.sql` files embedded in the binary.
|
|
|
|
### 1.4 Shell Interface & Registry ✅
|
|
- Shell interface definition
|
|
- Registry with weighted random selection
|
|
- Basic bash-like shell:
|
|
- Prompt that looks like `user@hostname:~$`
|
|
- Handful of commands: `ls`, `cd`, `cat`, `pwd`, `whoami`, `uname`, `id`, `exit`
|
|
- Fake filesystem with a few interesting-looking files
|
|
- Log all input/output to the session_logs table
|
|
|
|
#### Session Context
|
|
Shells receive a `SessionContext` struct instead of just `ssh.Channel`, providing:
|
|
- `SessionID` (storage UUID)
|
|
- `Username` (authenticated user, from `ssh.ConnMetadata`)
|
|
- `RemoteAddr` (client IP, from `ssh.ConnMetadata`)
|
|
- `ClientVersion` (SSH client version string)
|
|
- `Store` (for session logging)
|
|
|
|
This lets shells build realistic prompts (`username@hostname:~$`) and log activity without needing direct access to the SSH connection.
|
|
|
|
#### Shell Configuration
|
|
- Define a `ShellConfig` sub-struct in the config with common fields: hostname, banner/MOTD, fake username
|
|
- Per-shell overrides via `map[string]map[string]any` (e.g. `[shell.bash]`, `[shell.cisco]`) so each Phase 3 shell can have its own knobs
|
|
- Shells receive the relevant config section, not the entire project config — keeps a clean boundary
|
|
|
|
#### Transparent I/O Recording (designed for 2.3 Session Replay)
|
|
- Wrap `ssh.Channel` in a `RecordingChannel` before passing it to the shell
|
|
- `RecordingChannel` intercepts every `Read` (client input) and `Write` (server output), logging raw byte chunks with precise timestamps to storage
|
|
- Shells don't need to know about recording — they just read/write normally
|
|
- This ensures consistent, complete capture regardless of shell implementation, and avoids needing to refactor shells when session replay is added in Phase 2.3
|
|
- The current `session_logs` schema (input/output text pairs) may need a companion `session_keystrokes` table with `(session_id, timestamp, direction, data)` for byte-level replay fidelity — evaluate when implementing
|
|
|
|
### 1.5 Minimal Web UI ✅
|
|
- Embedded static assets (Go embed)
|
|
- Dashboard: total attempts, attempts over time, unique IPs
|
|
- Tables: top usernames, top passwords, top source IPs
|
|
- List of active/recent sessions
|
|
|
|
---
|
|
|
|
## Phase 2 - Detection & Notification
|
|
|
|
Goal: Detect likely-human sessions and make the system smarter.
|
|
|
|
### 2.1 Human Detection Scoring ✅
|
|
- Keystroke timing analysis
|
|
- Track backspace, tab, arrow key usage
|
|
- Command diversity scoring
|
|
- Compute per-session human score, store in sessions table
|
|
- Flag sessions above configurable threshold
|
|
|
|
### 2.2 Notifications ✅
|
|
- Webhook support (generic HTTP POST, works with Slack/Discord/ntfy)
|
|
- Trigger on: human score threshold crossed, new session started, configurable
|
|
- Include session details in payload
|
|
|
|
### 2.3 Session Replay ✅
|
|
- Store keystroke-by-keystroke data with timing information
|
|
- Web UI: replay a session in a terminal-like viewer, watching commands play back in real-time
|
|
- Filter/sort sessions by human score
|
|
|
|
### 2.4 Adaptive Shell Routing
|
|
- If early keystrokes suggest a bot, route to basic shell or disconnect
|
|
- If keystrokes suggest a human, route to a more interesting shell
|
|
|
|
---
|
|
|
|
## Phase 3 - Fun Shells
|
|
|
|
Goal: Add the entertaining shell implementations.
|
|
|
|
### 3.1 Bash Shell Variations
|
|
- **Infinite sudo:** always asks for password, never works, logs every attempt
|
|
- **Slow decay:** shell gets progressively slower, commands take longer and longer
|
|
- **Haunted:** commands gradually return stranger output, files appear/disappear, `whoami` returns different users
|
|
- **Bread crumbs:** fake .bash_history, id_rsa files, database configs pointing to other honeypots
|
|
|
|
### 3.2 Cisco IOS Shell ✅
|
|
- Realistic `>` and `#` prompts
|
|
- Common commands: `show running-config`, `show interfaces`, `enable`, `configure terminal`
|
|
- Fake device info that looks like a real router
|
|
|
|
### 3.3 Smart Fridge Shell ✅
|
|
- Samsung FridgeOS boot banner
|
|
- Inventory management commands
|
|
- Temperature warnings
|
|
- "WARNING: milk expires in 2 days"
|
|
- Per-credential shell routing via `shell` field in static credentials
|
|
|
|
### 3.4 Text Adventure ✅
|
|
- Zork-style dungeon crawler
|
|
- "You are in a dimly lit server room."
|
|
- Navigation, items, puzzles
|
|
- The dungeon is the oubliette itself
|
|
|
|
### 3.5 Banking TUI Shell ✅
|
|
- 80s-style green-on-black bank terminal
|
|
|
|
### 3.6 Other Shell Ideas (Future)
|
|
- **Nuclear launch terminal:** "ENTER LAUNCH AUTHORIZATION CODE"
|
|
- **ELIZA therapist:** every response is a therapy question
|
|
- **Pizza ordering terminal:** "Welcome to PizzaNet v2.3"
|
|
- **Haiku shell:** every response is a haiku
|
|
|
|
---
|
|
|
|
## Phase 4 - Polish
|
|
|
|
Goal: Make the web UI great and add operational niceties.
|
|
|
|
### 4.1 Enhanced Web UI
|
|
- GeoIP lookups and world map visualization of attack sources
|
|
- Charts: attempts over time, hourly patterns, credential trends
|
|
- Session detail view with full command log
|
|
- Filtering and search
|
|
|
|
### 4.2 Operational ✅
|
|
- Prometheus metrics endpoint ✅
|
|
- Structured logging (slog) ✅
|
|
- Graceful shutdown ✅
|
|
- Docker image (nix dockerTools) ✅
|
|
- Systemd unit file / deployment docs ✅
|
|
|
|
### 4.3 GeoIP ✅
|
|
- Embed a lightweight GeoIP database or use an API ✅
|
|
- Store country/city with each attempt ✅
|
|
- Aggregate stats by country ✅
|
|
|
|
### 4.4 Capture SSH Exec Commands
|
|
Many bots send a command directly via `ssh user@host <command>` (an SSH "exec" request) rather than requesting an interactive shell. Currently these are rejected and the command is lost. We should capture them.
|
|
|
|
- Handle `"exec"` request type in the server's request loop (alongside `"pty-req"` and `"shell"`)
|
|
- Parse the command string from the exec payload
|
|
- Add an `exec_command` column (nullable) to the `sessions` table via a new migration
|
|
- Store the command on the session record before closing the channel
|
|
- Optionally return plausible fake output for common commands (e.g. `uname`, `id`, `cat /etc/passwd`) to encourage further interaction
|
|
- Surface exec commands in the web UI (session detail view)
|
|
|
|
#### 4.4.1 Fake Exec Output
|
|
Return plausible fake output for exec commands to encourage bots to interact further.
|
|
|
|
**Approach: regex-based output assembly.** Bots typically send a single long command that chains recon commands and then echoes a summary (e.g. `echo "UNAME:$uname"`). Rather than interpreting arbitrary shell pipelines, we scan the command string for known patterns and assemble fake output.
|
|
|
|
Implementation:
|
|
- A map of common command/variable patterns to fake output strings, e.g.:
|
|
- `uname -a` / `uname -s -v -n -m` → `"Linux ubuntu-server 5.15.0-91-generic #101-Ubuntu SMP Tue Jan 2 15:13:10 UTC 2024 x86_64"`
|
|
- `uname -m` / `arch` → `"x86_64"`
|
|
- `cat /proc/uptime` → `"86432.71 172801.55"`
|
|
- `nproc` / `grep -c "^processor" /proc/cpuinfo` → `"2"`
|
|
- `cat /proc/cpuinfo` → fake cpuinfo block
|
|
- `lspci` → empty (no GPU — discourages cryptominer targeting)
|
|
- `id` → `"uid=0(root) gid=0(root) groups=0(root)"`
|
|
- `cat /etc/passwd` → minimal fake passwd file
|
|
- `last` → fake login entries
|
|
- `cat --help`, `ls --help` → canned GNU coreutils help text
|
|
- Scan the exec command for `echo "KEY:$var"` patterns; for each key, look up the corresponding fake value from the variable assignment earlier in the command
|
|
- If we recognise echo patterns, assemble and return the expected output
|
|
- If we don't recognise the command at all, return empty output with exit 0 (current behaviour)
|
|
- Values should draw from the existing shell config where possible (hostname, fake_user) for consistency
|
|
- New package `internal/execfake` or a file in `internal/server/` — keep it simple
|
|
|
|
Gather more real-world bot examples before implementing to ensure good coverage of common recon patterns.
|