The Guide

Everything you need to deploy DrainCtl, configure it for your environment, and start getting real-time drain mode visibility across your RDSH farm.

1. Installation

DrainCtl ships as a single MSI. Deploy it however you deploy software — your RMM, SCCM, Intune, GPO, or just double-click it.

Interactive install

Double-click the MSI to launch the interactive installer. It walks you through choosing an install mode (dashboard, registration, or standalone) and setting notification URLs, dashboard address, grace period, and other options.

Silent install

msiexec /i LISSTech.DrainCtl.msi /qn

Unattended MSI properties

Pass these properties on the command line (or in your RMM/SCCM transform) to pre-configure the installation:

Property	Values / Example	Description
`INSTALL_MODE`	`dashboard` \| `registration` \| `standalone`	Sets the operational mode. dashboard enables the web UI, registration registers with an existing dashboard, standalone runs independently.
`DASHBOARD_URL`	`https://dash:49470`	Dashboard URL to register with (used with `registration` mode)
`DASHBOARD_PORT`	`49470`	Port for the dashboard listener (used with `dashboard` mode)
`DASHBOARD_GROUP`	`Domain Admins`	AD group authorized for dashboard access
`GRACE_PERIOD`	`120`	Grace period in minutes before drain mode becomes an alert
`POLL_INTERVAL`	`60`	Service poll interval in seconds
`SESSION_THRESHOLD`	`80`	Session utilization warning threshold percentage; `0` disables it
`ENABLE_PERF`	`1` \| `0`	Enable host performance monitoring
`ENABLE_RFX`	`1` \| `0`	Enable optional RemoteFX counter collection
`LOG_FILE_LEVEL`	`info`	Daily file-log level: `debug`, `info`, `warn`, or `error`
`LOG_EVENT_LEVEL`	`info`	Windows Event Log sink level
`DASHBOARD_ONLY`	`1` \| `0`	Run only the dashboard service on this host; skip local drain monitoring

Notification targets are intentionally not MSI properties. Add webhook, ntfy, or email targets after install with drainctl notify add-* or through the dashboard Configuration modal.

Example — register with a dashboard, no UI:

msiexec /i LISSTech.DrainCtl.msi /qn INSTALL_MODE=registration DASHBOARD_URL=https://dash:49470

What the MSI does

Installs drainctl.exe and drainctl.dll to C:\Program Files\LISS Technologies\LISSTech DrainCtl\bin\
Adds the bin folder to system PATH — drainctl is available immediately in any terminal
Registers the LISSTech.DrainCtl PowerShell module in C:\Program Files\WindowsPowerShell\Modules\
Creates the data directory at C:\ProgramData\LISS Technologies\LISSTech DrainCtl\
Installs and auto-starts the DrainCtl Windows Service (runs as LocalSystem)
Registers the Event Log source with a custom message file
Writes default configuration to %ProgramData%\LISS Technologies\LISSTech DrainCtl\config.json

💡 No reboot required. The service starts immediately. Open a new terminal and drainctl check will work right away.

PowerShell module only (no service)

If you just want the PowerShell cmdlets without the Windows Service, install directly from PSGallery:

Install-Module -Name LISSTech.DrainCtl -Scope AllUsers -AllowPrerelease

This gives you Get-RDSHDrainMode, Test-RDSHDrainMode, and friends — but without the service, queries go directly to the registry instead of the named pipe (slightly slower, no persistent audit store).

📦 PSGallery vs MSI The MSI is the full package: service + CLI + PowerShell module + Event Log + notifications. PSGallery is the module alone — great for quick checks, scripting, or machines where you don't need continuous monitoring.

Silent uninstall

msiexec /x LISSTech.DrainCtl.msi /qn

This stops the service, removes all files, cleans up PATH, and deregisters the Event Log source. Your audit data in ProgramData is preserved (delete it manually if you want a clean slate).

Upgrading

Run the new MSI over an existing installation. The installer detects the previous version and skips all configuration dialogs — you go straight from the welcome screen to install.

config.json and servers.json are preserved
Configuration actions are skipped (existing settings are not overwritten)
ETW provider, Event Log permissions, and the Windows Service are re-registered
Audit trail and TLS certificates in ProgramData are preserved
The Windows Firewall rule is always maintained across upgrades

Silent upgrades work the same way — just run the new MSI with /qn.

2. First Run

After installation, verify everything is working:

# Is the service running?
Get-Service DrainCtl

# Quick status check (instant — talks to the service via named pipe)
drainctl check

# PowerShell way
Get-RDSHDrainMode | Format-List

You should see status=Healthy and connections_allowed=true. If drain mode is active on this server, you'll see the current state and how long it's been active.

⚡ Instant response? If drainctl check responds in under 100ms, it's talking to the service over the named pipe. If it takes a second or two, the service isn't running and it's falling back to a direct registry read. Check Get-Service DrainCtl.

3. Configuration

All settings live in a single JSON file and hot-reload automatically when you save it. No service restart needed.

%ProgramData%\LISS Technologies\LISSTech DrainCtl\config.json

Full config.json structure

{
  "grace_period": 60,
  "retention_days": 90,
  "poll_interval": 60,
  "audit_path": "C:\\ProgramData\\LISS Technologies\\LISSTech DrainCtl\\drainctl.db",
  "memory_limit_mb": 256,
  "log_file_level": "info",
  "log_event_level": "info",
  "notifications": [
    {
      "type": "webhook",
      "url": "https://hooks.example.com/drainctl",
      "triggers": ["drain_on", "drain_off", "alert", "healthy"],
      "repeat_minutes": 30
    },
    {
      "type": "ntfy",
      "url": "https://ntfy.sh/my-rdsh-alerts",
      "triggers": ["alert", "session_warning"],
      "repeat_minutes": 0
    },
    {
      "type": "email",
      "url": "smtp://smtp.example.com:587",
      "to": ["ops@example.com"],
      "from": "drainctl@example.com",
      "secret": "smtp-password",
      "triggers": ["drain_on", "drain_off", "alert"]
    }
  ],
  "dashboard": {
    "enabled": false,
    "port": 49470,
    "group": "Domain Admins",
    "fetch_interval": 300
  },
  "session_warning_threshold": 80,
  "performance": {
    "enabled": true,
    "sample_interval_sec": 30,
    "cpu_warn_pct": 70,
    "cpu_crit_pct": 85,
    "mem_warn_pct": 20,
    "mem_crit_pct": 10,
    "input_delay_warn_ms": 50,
    "input_delay_crit_ms": 100,
    "input_delay_percentile": "p95",
    "load_alert_delay_sec": 60,
    "input_delay_alert_delay_sec": 90,
    "collect_remotefx": false,
    "collect_per_session": true
  },
  "evtspike": {
    "enabled": false
  },
  "retention": {
    "metrics_days": 30,
    "audit_days": 365
  },
  "telemetry": {
    "aggregator_interval_seconds": 60,
    "retention_interval_minutes": 15
  },
  "update": {
    "enabled": false,
    "channel": "stable",
    "poll_interval": "24h"
  }
}

💡 Dashboard-authoritative. On servers joined to a dashboard, most of these fields are pulled from the dashboard every few minutes and overwrite the local file. Edit them in the dashboard Configuration modal, not here. This table is the reference; the modal is the steering wheel. See Dashboard-authoritative config for the details.

Key	Default	What it does
`grace_period`	60	Minutes drain mode must be active before it becomes an alert. During the grace period, the status is "Grace" and exit code is 0. Performance thresholds may independently promote the status to "Warning" (any warning) or "Alert" (any critical).
`retention_days`	90	Days to keep audit records. Maximum 365. Older records are pruned daily.
`poll_interval`	60	Seconds between polls. The service also uses real-time registry notifications for drain mode changes; this interval governs performance monitoring and session checks.
`audit_path`	ProgramData path	SQLite database path. Legacy `audit.jsonl` values are normalized to `drainctl.db` in the same directory; the live audit trail, telemetry tiers, server registry, and event-spike history live in SQLite WAL mode.
`memory_limit_mb`	256	Go runtime soft memory limit for the service process. Dashboard installs raise this to 512 MB unless explicitly configured. Valid range: 32–4096 MB.
`log_file_level` / `log_event_level`	`info`	Minimum level for the daily file log and Windows Event Log sinks.
`notifications`	[]	Array of notification targets (see Notifications).
`dashboard`	see above	Dashboard server, registration, TLS pinning, and config-fetch settings (see Dashboard).
`dashboard_only`	false	Run the dashboard on this host without local drain monitoring.
`session_warning_threshold`	80	Session utilization percentage that triggers a `session_warning` notification.
`performance`	see above	Performance monitoring settings. Set `enabled: true` to collect CPU, memory, disk I/O, input delay, and optional RemoteFX metrics on each sample tick. Thresholds default to industry baselines (CPU 70/85%, memory 20/10% free, input delay 50/100ms). Set any threshold to `-1` to disable it. Additional fields: `sample_interval_sec` (collection interval, default 30, range 10–300), `input_delay_percentile` (`"p50"` or `"p95"`, default `"p95"`), `load_alert_delay_sec` (seconds CPU/memory must breach before firing, default 60), `input_delay_alert_delay_sec` (seconds input-delay must breach before firing, default 90).
`retention`	see above	SQLite retention windows: `metrics_days` defaults to 30, and `audit_days` defaults to 365. Legacy `retention_days` is still accepted for compatibility.
`telemetry`	see above	Background worker cadence: metric aggregation defaults to 60 seconds; retention sweeps default to every 15 minutes.
`update`	disabled	Opt-in self-update settings (see Auto-update).

Edit the file with any text editor:

%ProgramData%\LISS Technologies\LISSTech DrainCtl\config.json

The service picks up changes within seconds (it watches config.json with a file system watcher). No restart needed. On dashboard-joined agents, edit through the dashboard Configuration modal instead — runtime fields you change locally will be overwritten on the next pull. The local file still owns bootstrap settings like dashboard.url, dashboard.tls_fingerprint, log_file_level, and memory_limit_mb.

🔄 Config normalization On every service start, DrainCtl validates config.json and writes it back with all fields populated at their effective defaults. New fields (like sample_interval_sec, load_alert_delay_sec, or evtspike) appear automatically after upgrading — no manual editing required. If migrating from a registry-based version, settings are imported from HKLM\...\DrainCtl\Parameters on first run; the old values are left in place but no longer read.

4. Audit Setup (Change Attribution)

DrainCtl can tell you who changed drain mode — but it needs Windows to record that information first. Run this once (as admin):

drainctl audit-setup
# or
Install-RDSHDrainAudit

This does two things:

Enables the Registry audit subcategory via auditpol
Sets a SACL on the Terminal Server registry key so that Windows writes Event ID 4657 whenever TSServerDrainMode is modified

⚠️ Domain-joined machines On domain-joined servers, Group Policy refresh (~90 min) can overwrite local auditpol settings. For persistence, configure the equivalent GPO:

Computer Configuration > Policies > Windows Settings > Security Settings > Advanced
                    Audit Policy Configuration > Object Access > Audit Registry > Success

Without audit setup, DrainCtl still detects changes instantly — it just can't tell you who made them. The changed_by field will be empty.

5. CLI Usage

Check current state

# Plain text (default — great for terminal and RMM scripts)
drainctl check

# JSON (great for parsing)
drainctl check --format json

# Table
drainctl check --format table

# Quiet mode (only final status line)
drainctl --quiet check

View audit history

# Last 50 records (table format)
drainctl history

# Only state transitions
drainctl history --changes-only

# Last 10 records as JSON
drainctl history --limit 10 --format json

# Time-bounded queries (RFC 3339 timestamps)
drainctl history --since 2025-01-01T00:00:00Z
drainctl history --since 2025-06-01T00:00:00Z --until 2025-06-30T23:59:59Z

List current sessions

# Table view (default) — session ID, user, station, state + summary line
drainctl sessions

# JSON — structured output with sessions array and summary object
drainctl sessions --format json

# CSV — for spreadsheet import or RMM custom fields
drainctl sessions --format csv

Inspect current configuration

# Print all settings at a glance — no prompts, no side-effects
drainctl configure show

Prints version, config path, grace period, poll interval, retention days, session warning threshold, dashboard settings, and all notification targets. Useful for confirming what is active on a machine without opening config.json or running the interactive wizard.

Reset the event-spike baseline

# Wipe in-memory evtspike detectors + baseline.json, in sync with the running service
drainctl baseline reset

Use after confirming the event-log anomaly baseline is poisoned (e.g., a real incident fired during training and got absorbed as "normal"). The command routes through the service, wipes every subscribed channel's detector state, and deletes baseline.json. Scoring resumes from the prior on the next 10-second tick — subscriptions stay alive. See Event-Log Anomaly Detection for background.

Exit codes

Code	Meaning	When
0	Healthy / Grace	Connections allowed, or drain mode active but within grace period
1	Alert	Drain mode active beyond grace period — new connections are blocked
2	Error	Registry unreadable or other failure

Output formats

--format plain (default for check), table (default for history), csv, json.

The JSON output from drainctl check --format json includes session data — active sessions, total capacity, and utilization percentage — useful for building custom dashboards or feeding into monitoring systems.

Notification management

# View notification status
drainctl notify status

# Add a webhook target
drainctl notify add-webhook https://hooks.example.com/drainctl

# Add webhook with HMAC signing secret and custom trigger filter
drainctl notify add-webhook https://hooks.example.com/drainctl \
  --secret MY_SIGNING_SECRET \
  --triggers drain_on,alert,healthy \
  --repeat-minutes 60

# Update the first webhook target; fields not supplied are preserved
drainctl notify set-webhook https://hooks.example.com/drainctl --target-index 0 --secret NEW_SECRET

# Add an ntfy target
drainctl notify add-ntfy https://ntfy.sh/my-rdsh-alerts

# ntfy with session-warning notifications and 30-minute repeat throttle
drainctl notify add-ntfy https://ntfy.sh/my-rdsh-alerts \
  --triggers drain_on,alert,healthy,session_warning \
  --repeat-minutes 30

# Send a test notification to all configured targets
drainctl notify test

💡 Flag behaviour. Use add-webhook, add-ntfy, or add-email to create a target. Use set-webhook, set-ntfy, or set-email with --target-index to update an existing target; omitted optional flags preserve the existing secret, triggers, and repeat interval. Valid trigger names: drain_on, drain_off, grace_entered, alert, healthy, session_warning, cpu_warning, cpu_critical, memory_warning, memory_critical, input_delay_warning, input_delay_critical, event_spike.

Interactive configuration wizard

Run drainctl configure without flags to step through every setting interactively. The current value is shown in brackets — press Enter to keep it.

drainctl configure

When called with flags (e.g., by the MSI installer during silent install), it applies the values directly and writes config.json without prompting:

drainctl configure --grace-period 120 --mode dashboard --dashboard-port 49470

# Agent registration with session warning threshold and custom poll interval
drainctl configure --mode registration --dashboard-url https://dash:49470 \
  --session-warning-threshold 90 --poll-interval 30 --retention-days 90

Inspect current configuration

To review every active setting at a glance without prompting or making changes, use the read-only configure show subcommand:

drainctl configure show

Output covers: version, config file path, grace period, poll interval, retention days, session warning threshold, dashboard server status, agent registration URL, TLS pin status, and all notification targets (with HMAC signing status for webhook targets).

6. PowerShell Module

The module is auto-registered by the MSI. It works on both PowerShell 5.1 and 7+.

# Rich status object
Get-RDSHDrainMode

# Boolean check — perfect for scripts and alerts
if (Test-RDSHDrainMode) { "All good" } else { "ALERT: drain mode active!" }

# Audit history
Get-RDSHDrainHistory -Limit 20

# Only transitions (who changed what, when)
Get-RDSHDrainHistory -ChangesOnly | Format-Table Timestamp, DrainMode, ChangedBy

# Pipeline magic
Get-RDSHDrainHistory -ChangesOnly |
    Where-Object { $_.DrainMode -ne "ALLOW_ALL_CONNECTIONS" } |
    Select-Object Timestamp, ChangedBy, DrainMode

All cmdlets

Cmdlet	Returns	Description
`Get-RDSHDrainMode`	PSObject	Full drain mode state with audit data
`Test-RDSHDrainMode`	bool	`$true` if connections allowed
`Get-RDSHDrainHistory`	PSObject[]	Audit trail records
`Install-RDSHDrainAudit`	—	One-time audit configuration
`Get-RDSHDrainNotificationTarget`	PSObject[]	Lists all notification targets with type, URL, triggers, repeat interval
`Add-RDSHDrainNotificationTarget`	—	Adds a notification target (`-Type`, `-URL`, `-Triggers`, `-RepeatMinutes`)
`Set-RDSHDrainNotificationTarget`	—	Updates a notification target by type and index
`Remove-RDSHDrainNotificationTarget`	—	Removes a notification target by `-URL`
`Test-RDSHDrainNotificationTarget`	—	Sends a test notification to configured targets
`Enable-RDSHDrainDashboard`	—	Enable the multi-server dashboard on this server
`Disable-RDSHDrainDashboard`	—	Disable the dashboard on this server
`Install-RDSHDrainCertificate`	—	Install a custom TLS certificate for the dashboard

7. Notifications

Get pushed when something happens. DrainCtl supports multiple notification targets — any combination of webhooks (Slack, Teams, PagerDuty, custom HTTP endpoints), ntfy.sh topics, and email via SMTP. Each target has its own triggers and repeat interval.

Setup

# Add a webhook target (Slack, Teams, PagerDuty, custom endpoint, etc.)
drainctl notify add-webhook https://hooks.slack.com/services/T.../B.../xxx

# Add webhook with HMAC signing and only alert/healthy triggers
drainctl notify add-webhook https://hooks.slack.com/services/T.../B.../xxx \
  --secret MY_SIGNING_SECRET \
  --triggers drain_on,alert,healthy

# Add an ntfy target (push notifications on your phone!)
drainctl notify add-ntfy https://ntfy.sh/my-rdsh-alerts

# ntfy with session-warning alerts, repeat at most every 30 minutes
drainctl notify add-ntfy https://ntfy.sh/my-rdsh-alerts \
  --triggers drain_on,alert,healthy,session_warning \
  --repeat-minutes 30

# Add an email target (any SMTP relay — Gmail, Mailgun, O365, SendGrid)
drainctl notify add-email smtp://smtp.example.com:587 \
  --to ops@example.com --from drainctl@example.com \
  --secret smtp-password \
  --triggers drain_on,drain_off,alert

# View notification status
drainctl notify status

# Send a test to make sure it works
drainctl notify test

Optional flags on add-webhook, add-ntfy, set-webhook, and set-ntfy:

--secret TEXT — HMAC-SHA256 signing secret sent as X-DrainCtl-Signature (webhook only)
--triggers LIST — comma-separated events that fire this target; valid names: drain_on, drain_off, grace_entered, alert, healthy, session_warning, cpu_warning, cpu_critical, memory_warning, memory_critical, input_delay_warning, input_delay_critical, event_spike
--repeat-minutes N — minimum minutes between repeated alert / session_warning notifications; 0 means once per state change (default)

For set-* commands, omitted flags preserve the existing value, so you can update a single field without touching the rest. To manage multiple targets with per-target settings, use the dashboard UI or the CLI --target-index flag.

PowerShell target management

# List all targets
Get-RDSHDrainNotificationTarget

# Add a webhook for alerts only
Add-RDSHDrainNotificationTarget -Type webhook -URL 'https://hooks.example.com/drain' -Triggers alert -RepeatMinutes 15

# Add ntfy for all events
Add-RDSHDrainNotificationTarget -Type ntfy -URL 'https://ntfy.sh/drainctl'

# Add email target
Add-RDSHDrainNotificationTarget -Type email -URL 'smtp://smtp.example.com:587' `
  -To 'ops@example.com' -From 'drainctl@example.com' -Secret 'smtp-password' `
  -Triggers drain_on,alert

# Update the first webhook target in place
Set-RDSHDrainNotificationTarget -Type webhook -TargetIndex 0 `
  -URL 'https://hooks.example.com/new-drain' -RepeatMinutes 30

# Send a test notification
Test-RDSHDrainNotificationTarget

# Remove a target
Remove-RDSHDrainNotificationTarget -URL 'https://hooks.example.com/drain'

Granular triggers

Each notification target can subscribe to specific triggers. Assign them with --triggers when adding a target, or edit config.json directly.

Trigger	Description
`drain_on`	Drain mode activated
`drain_off`	Drain mode deactivated
`grace_entered`	Entered grace period
`alert`	Grace period exceeded
`healthy`	Returned to healthy
`session_warning`	Session utilization threshold exceeded (default 80%)
`cpu_warning`	CPU usage at or above warning threshold (default 70%)
`cpu_critical`	CPU usage at or above critical threshold (default 85%)
`memory_warning`	Available memory at or below warning threshold (default 20% free)
`memory_critical`	Available memory at or below critical threshold (default 10% free)
`input_delay_warning`	Input delay P95 at or above warning threshold (default 50ms)
`input_delay_critical`	Input delay P95 at or above critical threshold (default 100ms)
`event_spike`	Event-log anomaly detector (evtspike) confirmed a rate spike on a subscribed channel. Requires `evtspike.enabled`.

All three target types share the same natural-language subjects, e.g. "RDS01 — New remote connections disabled by DOMAIN\admin", "RDS01 — Remote connections disabled for 2h 15m (exceeds 1h grace period)", or "RDS01 — Session utilization at 85% (17/20 sessions)". For webhooks the subject appears in the message field; for ntfy it's the notification title; for email it's the Subject header and the HTML body heading.

Each target also has a repeat_minutes setting — set it to re-send alerts periodically while the condition persists (0 = notify once).

Webhook payload

{
  "event": "transition",
  "host": "MDS-LDC1-RDS5",
  "drain_mode": "ALLOW_RECONNECTIONS_PREVENT_NEW_LOGONS",
  "previous_mode": "ALLOW_ALL_CONNECTIONS",
  "status": "Grace",
  "message": "Drain mode active, within grace period (45m remaining).",
  "changed_by": "MDS\\lissadmin",
  "state_duration_seconds": 900,
  "grace_period_seconds": 3600,
  "connections_allowed": false,
  "version": "26.95.0",
  "timestamp": "2026-04-03T14:30:00-04:00",
  "sessions": {
    "active_sessions": 12,
    "disconnected_sessions": 3,
    "total_sessions": 15,
    "max_sessions": 20,
    "utilization_pct": 75
  }
}

previous_mode is only present on transition events. sessions is only present when session monitoring is active (see Session Tracking). The connections_allowed field is false when drain mode is blocking new connections, true when all connections are allowed — use this instead of parsing status to gate automation.

ntfy messages arrive with priority high for alerts (red notification on your phone) and default for transitions.

Email (SMTP)

Email notifications via SMTP. Use smtp:// for STARTTLS (port 587) or smtps:// for implicit TLS (port 465). Works with any SMTP relay — Gmail, Mailgun, SendGrid, O365. The secret field is the SMTP password. The from and to fields are required.

{
  "type": "email",
  "url": "smtp://smtp.example.com:587",
  "to": ["ops@example.com", "oncall@example.com"],
  "from": "drainctl@example.com",
  "secret": "smtp-password",
  "triggers": ["drain_on", "drain_off", "alert"]
}

8. Session Tracking

DrainCtl monitors active RDS sessions via WTSEnumerateSessionsW and exposes utilization data alongside drain mode state.

drainctl sessions

The drainctl sessions command lists every active RDS session on the local server in real time, including session ID, username, station name, and connection state. A summary line shows totals and utilization when a session cap is configured.

drainctl sessions

Example output (table format):

ID   USER            STATION    STATE
---  ----            -------    -----
1    alice           RDP-Tcp#0  Active
2    bob             RDP-Tcp#1  Active
3    carol           RDP-Tcp#2  Disconnected
Summary: 3/50 sessions (6% utilization)

Use --format json for structured output (a sessions array plus a summary object), or --format csv for spreadsheet import.

Session data in CLI output

Session counts and utilization appear automatically in drainctl check --format json output, including active sessions, total capacity, and utilization percentage. This is useful for feeding into monitoring systems or custom dashboards.

Session utilization alerts

When session utilization exceeds the configured threshold, DrainCtl fires a session_warning notification. The default threshold is 80% — change it via the CLI or in config.json:

# Via CLI (0 = disabled)
drainctl configure --session-warning-threshold 90

# Or in config.json directly
{
  "session_warning_threshold": 90
}

Add session_warning to a notification target's triggers to receive these alerts (see Notifications).

Dashboard session gauges

The multi-server dashboard displays per-server session gauges showing current utilization at a glance.

9. Multi-Server Dashboard

Managing multiple RDSH servers? The dashboard gives you a single, live view of drain mode state across your entire farm — with Windows Authentication so only the right people see it.

How it works

One server runs the dashboard — enable it in config.json, the DrainCtl service serves an HTTP dashboard
Other servers register — run drainctl register once, the agent starts reporting its state automatically
Dashboard aggregates — live grid of all servers, color-coded by status, updated by Server-Sent Events with periodic polling as a safety net

Enable the dashboard (on one server)

# Enable the dashboard server (writes to config.json)
drainctl dashboard enable --port 49470 --group "Domain Admins"

# Or with a custom AD group
drainctl dashboard enable --port 49470 --group "RDS Admins"

You can also edit config.json directly:

{
  "dashboard": {
    "enabled": true,
    "port": 49470,
    "group": "Domain Admins"
  }
}

Changes are hot-reloaded — no service restart needed.

Register servers

On each RDSH server you want to monitor, run this once:

# Register with explicit URL
drainctl register https://dashboard-server:49470

# Or auto-discover via DNS SRV record
drainctl register --auto

This does two things:

Writes dashboard.url to the local config.json (so the service starts reporting automatically on startup)
Sends a registration request to the dashboard (so it knows about this server)

Auto-discovery (DNS SRV)

Instead of configuring dashboard.url on every agent, create a DNS SRV record and agents will find the dashboard automatically. The service checks for _drainctl._tcp.<domain> on startup when no URL is configured.

AD Domain

Dashboard Server FQDN

# PowerShell — create the SRV record in AD-integrated DNS
Add-DnsServerResourceRecord -ZoneName "contoso.com" `
  -Name "_drainctl._tcp" -Srv `
  -DomainName "dashboard.contoso.com" `
  -Port 49470 -Priority 0 -Weight 0

# Verify
Resolve-DnsName -Name "_drainctl._tcp.contoso.com" -Type SRV

Once the SRV record exists, agents discover the dashboard without any per-machine config. Just install the MSI and the service registers itself.

💡 Self-maintaining. After registration, the service reports its state on every check cycle. No cron jobs, no scripts, no manual syncing. If the dashboard server is down, reports are silently skipped and resume when it's back. With SRV discovery, even registration is automatic — install the MSI and you're done.

View the dashboard

# Open in your browser
drainctl dashboard

# Or navigate directly
https://dashboard-server:49470

The dashboard shows a live grid with each server's hostname, drain mode, status, state duration, who last changed it, and when it was last seen. Color-coded: green for healthy, amber for grace, red for alert, gray for offline. Each server card includes a session gauge showing current utilization, plus an Overview LOAD chart that overlays CPU, CPU P95, Memory, and Sessions for the entire fleet on one dual-axis uPlot canvas.

Clicking a server opens the Server Detail panel. The Host Load tile on that panel is a single dual-axis chart showing CPU, CPU P95, Memory, and Sessions for that host with its own 5M/1H/1D/3D/5D window pill (replacing the earlier layout of three separate small charts). A swimlane tile below it plots confirmed event-log spikes per channel, with a HEALTHY / TRAINING / DISABLED / ERROR state chip in the header sourced from the host's evtspike detector. See Event-Log Anomaly Detection for how those states are derived.

The Configuration modal (gear icon) is the single place operators edit runtime settings. It exposes: CPU warn/critical and memory warn/critical thresholds, input-delay warn/critical thresholds and percentile (P50/P95), the grace period, session warning threshold, agent poll interval, sample interval, performance-monitoring toggle (including RemoteFX and per-session CPU sub-toggles), the evtspike detector toggle, and every notification target (each with its own triggers — including event_spike — and repeat interval). The Alert Sensitivity row (Chill / Anxious / Twitchy presets) sets thresholds, percentile, and sustain windows in one click.

Manage servers

# List all registered servers
drainctl dashboard list-servers

# Remove a decommissioned server
drainctl dashboard remove-server RDSH-OLD

Authentication

The dashboard supports two login methods:

Windows SSO (primary) — Kerberos/SPNEGO via SSPI. Your browser sends domain credentials automatically (Integrated Windows Auth). Works out of the box in the Intranet zone.
Credential login (fallback) — username and password form for non-domain machines, remote access, or browsers that don't support Negotiate. Accepts DOMAIN\username or username@domain format. Credentials are validated against Active Directory via LogonUserW; passwords are never stored or logged.

Both methods create a session cookie (drainctl_session) with an 8-hour inactivity timeout and 24-hour absolute lifetime. Sessions are stored in memory and do not survive service restarts.

Agent reporting — each server authenticates with its machine Kerberos ticket. Automatic in a domain, zero config.
Authorization — only members of the configured AD group (DashboardGroup) can view the dashboard. Agents can report but not view.
Unregistered hosts — the dashboard rejects reports from servers that haven't been registered (403).

HTTPS & TLS Certificates

The dashboard always runs over HTTPS. On first start, the service auto-generates a self-signed TLS certificate (ECDSA P-256, valid 1 year). The certificate and private key are stored in the DrainCtl data directory:

%ProgramData%\LISS Technologies\LISSTech DrainCtl\dashboard-tls.crt   # world-readable
%ProgramData%\LISS Technologies\LISSTech DrainCtl\dashboard-tls.key   # SYSTEM + Admins only

The private key is ACL-restricted to SYSTEM and Administrators. The certificate auto-renews 24 hours before expiry.

Certificate fingerprint

To view the dashboard's current TLS certificate fingerprint:

drainctl dashboard fingerprint

Certificate pinning (optional)

Certificate pinning lets agents verify they're talking to the real dashboard, not an impersonator. Pinning is opt-in — without it, agents use trust-on-first-use (TOFU) for TLS, which is fine for most environments.

Auto-pin on registration: Pass --pin when registering to automatically capture and save the dashboard's certificate fingerprint:

# Register and auto-pin the dashboard's TLS certificate
drainctl register --auto --pin
drainctl register https://dashboard:49470 --pin

The fingerprint is saved to dashboard.tls_fingerprint in the agent's config.json. All subsequent reports will verify the dashboard's certificate matches.

Auto-pin via config (for service auto-registration): Set "auto_pin": true in config.json and the service will capture the fingerprint on its next registration:

{
  "dashboard": {
    "url": "https://dashboard:49470",
    "auto_pin": true
  }
}

MSI deployment: the MSI can set DASHBOARD_URL for registration, but it does not currently expose an AUTO_PIN property. For pinned deployments, set dashboard.auto_pin in config management after install, or run drainctl register ... --pin once on each agent.

Manual pinning: If you prefer to set the fingerprint yourself (e.g., distributed via GPO), run drainctl dashboard fingerprint on the dashboard server and set dashboard.tls_fingerprint in each agent's config. A manually-set fingerprint is never overwritten by auto-pin.

⚠️ Certificate renewal and pinning. The auto-generated certificate is valid for 1 year and renews automatically. When it renews, the fingerprint changes. Agents with the old fingerprint pinned will silently fail to report until they re-register. To re-pin after renewal, run drainctl register --auto --pin on each agent, or set "auto_pin": true in config and restart the service. Without pinning enabled, cert renewal is seamless.

Bring your own certificate

To use a CA-issued or internal PKI certificate instead of the auto-generated self-signed one:

# CLI
drainctl dashboard install-cert C:\certs\dashboard.pem C:\certs\dashboard-key.pem

# PowerShell
Install-RDSHDrainCertificate -CertPath C:\certs\dashboard.pem -KeyPath C:\certs\dashboard-key.pem

This copies the PEM files into the DrainCtl data directory and updates config.json with the paths. Restart the service to use the new certificate.

Both files must be PEM-encoded. The service loads them on startup and skips auto-generation when both are set. If the custom cert is invalid or expired, the service falls back to HTTP with a warning in the event log.

You can also set the paths directly in config.json:

{
  "dashboard": {
    "tls_cert": "C:\\certs\\dashboard.pem",
    "tls_key": "C:\\certs\\dashboard-key.pem"
  }
}

Dashboard configuration (config.json)

Key	Default	Description
`dashboard.enabled`	false	Enable the HTTPS dashboard (set to true)
`dashboard.port`	49470	Port the dashboard listens on
`dashboard.group`	Domain Admins	AD group authorized to view the dashboard
`dashboard.url`	empty	Agent-side: URL of the dashboard to report to (set by `drainctl register`)
`dashboard.tls_cert`	empty	Path to PEM certificate file. If empty, a self-signed cert is auto-generated.
`dashboard.tls_key`	empty	Path to PEM private key file. Required when `tls_cert` is set.
`dashboard.tls_fingerprint`	empty	SHA-256 certificate fingerprint for agent-side pinning. Set manually or via `--pin`.
`dashboard.auto_pin`	false	Auto-capture dashboard cert fingerprint on service registration.
`dashboard.fetch_interval`	300	Seconds between dashboard-authoritative config pulls by registered agents.

Dashboard-authoritative config (the pull model)

The dashboard is the single source of truth for runtime settings. When an agent has a dashboard.url configured, it pulls notification targets, thresholds, the grace period, the session warning threshold, the performance block, and the evtspike enabled flag from the dashboard server. Edit settings once in the Configuration modal and every agent picks up the change on its next poll — and immediately on its local config.json write.

How it works:

On startup, the agent fetches GET /api/v1/config from the dashboard using its machine Kerberos ticket
The fetched settings replace the corresponding fields in local config.json (an atomic write); the service's config-file watcher sees the change and hot-reloads live
The agent re-polls on a short cadence, so edits made in the dashboard UI roll out to every farm member within minutes without any per-agent action
If the dashboard is unreachable, the agent keeps using the last successfully pulled config
If no pull has ever succeeded, the agent falls back to whatever is already in its local config.json

No dashboard URL? Standalone agents (no dashboard.url) keep using their local config.json — nothing changes for single-server deployments.

⚠️ Do not hand-edit per-agent config.json. Once an agent is joined to a dashboard, any runtime field you change locally will be overwritten on the next pull. Use the dashboard Configuration modal instead. The local file is useful for bootstrap settings (dashboard.url, dashboard.tls_fingerprint, log levels, memory_limit_mb) and for admin-only evtspike knobs (channel lists, thresholds, baseline path) that the dashboard modal intentionally does not expose.

Tip: For farms with dozens of servers, configure notifications and thresholds entirely in the dashboard UI. Leave notifications empty in each agent's config.json — the dashboard owns them.

⚠️ Firewall The dashboard listens on the configured port. If you're running Windows Firewall, allow inbound TCP on that port for the agent servers and admin workstations.

10. Performance Monitoring

DrainCtl collects host-level and per-session performance metrics through the Windows PDH API. Enable it with one config key:

"performance": { "enabled": true }

With defaults, DrainCtl collects CPU, available memory, pages/sec, disk queue length, TCP retransmits, and per-session input delay on the configured sample interval. Metrics appear in the dashboard, audit trail, webhook payloads, and CLI JSON output.

Thresholds & Triggers

Trigger	Default	Fires when
`cpu_warning`	70%	CPU ≥ threshold, sustained for `load_alert_delay_sec` (default 60s)
`cpu_critical`	85%	CPU ≥ threshold, sustained for `load_alert_delay_sec` (default 60s)
`memory_warning`	20% free	Available memory ≤ threshold, sustained for `load_alert_delay_sec` (default 60s)
`memory_critical`	10% free	Available memory ≤ threshold, sustained for `load_alert_delay_sec` (default 60s)
`input_delay_warning`	50ms	Input delay ≥ threshold, sustained for `input_delay_alert_delay_sec` (default 90s)
`input_delay_critical`	100ms	Input delay ≥ threshold, sustained for `input_delay_alert_delay_sec` (default 90s)

All performance triggers require a sustained breach before firing, eliminating flap noise from transient spikes. CPU and memory default to a 60-second sustain window (load_alert_delay_sec); input delay defaults to 90 seconds (input_delay_alert_delay_sec) because it is inherently volatile. Both windows are expressed in seconds and translate to a whole number of polls internally based on sample_interval_sec (default 30). Input delay is evaluated against the P95 percentile by default; set "input_delay_percentile": "p50" for median-based thresholds.

Set any threshold to 0 for the default, or -1 to disable.

RemoteFX Counters

Set "collect_remotefx": true to collect RemoteFX Graphics and Network counters: output FPS, encoding time, frame quality, RTT, packet loss, and skip rates. These counters are only available when the RemoteFX role is installed; absence is not an error.

Dashboard Display

Each server card in the Overview grid shows live CPU/Memory/Sessions values color-coded by severity (green/amber/red). The fleet-wide LOAD chart overlays CPU, CPU P95, Memory, and Sessions on one dual-axis canvas; a P95 toggle surfaces spikes the average smooths over.

Opening a server reveals the Server Detail panel — a single dual-axis Host Load chart scoped to that host plots CPU, CPU P95, Memory, and Sessions together, with its own 5M/1H/1D/3D/5D window pill. Input delay, disk queue, and RemoteFX counters (when the role is installed) are surfaced in the metrics tiles alongside.

Server Status Levels

The dashboard shows a composite status for each server reflecting both drain mode and performance health. The worst severity wins:

Status	Color	Condition
Healthy	Green	Drain off, no performance threshold breaches
Warning	Amber	Drain off, but one or more perf metrics at warning level (CPU, memory, or input delay)
Grace	Amber	Drain mode active, within configured grace period
Alert	Red	Drain mode exceeded grace period, or any performance metric at critical level
Offline	Grey	No report received in the last 10 minutes

Notification Cooldown Model

Each notification target’s repeat_minutes acts as a cooldown window. Once a notification fires, the same trigger type will not fire again for that target until the interval elapses. If the metric is still breaching when the interval expires, a reminder is sent.

Cooldowns are never reset by metric oscillation. A metric that briefly dips below threshold and rises again does not produce a new notification — the original cooldown holds. This prevents notification storms from volatile metrics like input delay.

Real-Time Event Streaming (SSE)

The dashboard streams real-time updates to connected browsers via Server-Sent Events. When an agent reports or settings change, every open dashboard tab updates within 2 seconds — no manual refresh needed.

Endpoint: GET /api/v1/events (requires session cookie)
Event types: server_update (agent reports, local checks), server_deleted (server removed via dashboard), and settings_update (config changes)
Reconnection: automatic (~3 seconds, handled by the browser’s EventSource API)
Fallback: the existing periodic poll continues as a full-sync safety net
Capacity: up to 100 concurrent browser sessions with <1 MB memory per subscriber

Dashboard UI Features

Multi-expand: open multiple server detail panels simultaneously
Page size: choose 15, 30, or 50 servers per page (persisted in browser)
Alert presets: Chill, Anxious, or Twitchy — one click sets thresholds, percentile, and consecutive poll counts
CPU P95 overlay: toggle a P95 line on the fleet LOAD chart to surface spikes the average smooths over

REST API Documentation

The dashboard serves an OpenAPI 3.1 spec at /api/v1/openapi.yaml and a Swagger UI at /api/docs. Both are public (no authentication required). The spec covers all API endpoints including the SSE event stream.

11. Event-Log Anomaly Detection

DrainCtl includes an opt-in event-log anomaly detector (evtspike). It subscribes to a configured set of Windows event-log channels, learns a per-channel, time-of-day rate baseline with a robust-cap Bayesian model, and fires a notification on the event_spike trigger when a confirmed rate anomaly is detected. It is designed to flag things like authentication-failure floods, application-crash storms, or a misbehaving service that starts spewing warnings — without needing a SIEM.

Enable the detector

Open the dashboard, click the gear icon, scroll to the Event Log Anomaly Detection group, and tick Enable detector. The change propagates to every connected agent on the next pull and takes effect without a service restart.

Subscribed channels default to Application, System, and Microsoft-Windows-TerminalServices-LocalSessionManager/Operational. Channel lists, thresholds, cooldowns, and the optional Security-channel opt-in are admin-only knobs and live under "evtspike" in config.json on the dashboard server.

Dashboard display

Each Server Detail panel carries an Event Spikes swimlane tile: one row per active channel, with confirmed spikes plotted as markers on a time axis (5M / 1H / 1D / 3D / 5D window pill). The tile header carries a state chip:

State	Meaning
`HEALTHY`	Detector enabled; at least half of the subscribed channels have accumulated enough per-slot observations to score reliably.
`TRAINING`	Detector enabled and subscribed, but fewer than half of the channels are mature yet. Spikes can still fire for channels that have matured; the chip stays amber until the majority are in a usable state.
`DISABLED`	`evtspike.enabled` is false on this host.
`ERROR`	Zero channels subscribed successfully, or a startup error surfaced. Check the DrainCtl file log for `evtspike` WARN lines — typically a missing channel, access-denied, or the Security channel enabled without `SeSecurityPrivilege` on the service token.

Configuration knobs

Every numeric knob below lives under "evtspike" in config.json on the dashboard server and propagates to every connected agent on the next pull. Numeric fields are clamped to the ranges shown; setting a value outside the range silently falls back to the default. The only runtime knob also editable from the dashboard UI is enabled; everything else is an admin-only file edit.

A complete block with defaults:

"evtspike": {
  "enabled": true,
  "min_count": 10,
  "threshold": 1e-4,
  "cooldown_minutes": 10,
  "slot_maturity_observations": 90,
  "persist_interval_seconds": 900,
  "half_life_buckets": 360,
  "prior_strength": 60.0,
  "mean_per_bucket_prior": 0.1,
  "security_channel_enabled": false,
  "disabled_channels": [],
  "added_channels": []
}

Detection knobs — how loudly the detector fires

These three work together. The detector emits a candidate spike when (a) the 10-second bucket count is at least min_count, and (b) the Bayesian tail probability of seeing that count under the learned baseline is below threshold. Two-of-three consecutive candidates confirm a spike, and cooldown_minutes then suppresses repeats on the same channel.

Key	Default	Range	What it does — and when to tune
`min_count`	10	1 – 10000	Floor on the bucket count required to even consider scoring an anomaly. Rare channels where two events/10 s is already catastrophic want `3`–`5`. Very chatty channels where background is 50+/10 s want `25`–`50` to avoid scoring normal noise. Raising this is the single most effective lever for cutting false positives on a noisy channel without retraining the baseline.
`threshold`	`1e-4`	`1e-9` – `0.1`	Maximum tail probability for a count to qualify as anomalous. Lower = stricter. At `1e-4` the detector flags roughly one-in-ten-thousand buckets under a well-learned baseline. Drop to `1e-6`–`1e-8` if you trust the baseline deeply and only want the most extreme deviations. Raise to `1e-3`–`1e-2` for exploratory fleets where you want to see borderline activity.
`cooldown_minutes`	10	1 – 1440	After a confirmed spike fires on a channel, further spikes on the same channel are suppressed for this many minutes. Pair with your notification target's `repeat_minutes`: the detector cooldown gates the first fire; the target cooldown gates reminder notifications. Short cooldowns (`2`–`5`) are useful on channels where distinct incidents can arrive back-to-back (auth failure storms). Long cooldowns (`60`+) fit channels where a single incident produces a multi-hour burst (patch storms, scheduled backup windows).

Learning knobs — how the baseline adapts

The baseline is a per-channel Gamma-Poisson posterior with one bucket per 15-minute time-of-day slot (96 slots/day). These knobs control how fast the baseline learns, how confidently it starts, and when a slot is considered mature enough for its own scoring.

Key	Default	Range	What it does — and when to tune
`slot_maturity_observations`	90	1 – 100	Number of bucket observations a single 15-minute slot must accumulate before scoring prefers it over the global posterior. The default of `90` means a slot is declared mature only after a full 15-minute visit (90 ten-second buckets) in that time-of-day window — eliminating the "first morning back after a long weekend" false positive class. Lower values accept less evidence before trusting a slot's own statistics; if you find the detector spends too long in TRAINING on quiet channels, `30`–`60` is a reasonable softer setting. Pre-009 the default was `7` (≈70 s of coverage); persisted `7` values are NOT auto-migrated, so existing installs keep their tuning.
`half_life_buckets`	360	60 – 10000	Exponential-forgetting half-life in 10-second buckets. At `360` the baseline weights the last hour heavily and older data fades; after one half-life the contribution of a given observation drops to half. Shorter (`120`–`180`, 20–30 minutes) adapts faster to channel-behaviour drift but is more easily dragged by a sustained background-rate shift. Longer (`720`+, two-plus hours) is more stable across short incidents but slower to re-home on a permanent rate change (e.g., a newly deployed application that changed the channel's idle rate).
`prior_strength`	60.0	1.0 – 10000.0	How many bucket-equivalents of "pretend evidence" the initial prior is worth. With the default, the baseline starts with the equivalent of 60 observations already seen at the prior mean, so scoring is meaningful from minute one rather than exquisitely sensitive. Raise this to `300`+ to make the detector slower and more skeptical on freshly installed agents; lower to `10`–`20` to let a real baseline take over faster. Rarely worth touching outside a staging environment.
`mean_per_bucket_prior`	0.1	0.0 – 1000.0	Expected events per 10-second bucket under "normal" conditions. Used only to seed the prior before real data arrives. `0.1` reflects the assumption that most curated channels are mostly quiet. For a known-chatty channel (SMB audit, Kerberos on a DC with many short-lived tickets) you can seed a larger value so the first few minutes of observation aren't flagged as anomalous simply because the prior was too low.
`persist_interval_seconds`	900	60 – 86400	How often the in-memory detector state is written to `baseline.json` on disk. Shorter = less learning lost on a hard service kill; longer = fewer disk writes on constrained hosts. The default of 15 minutes loses at most one slot's worth of observations on a crash. A clean service stop always forces a final save regardless of cadence.

Channel selection

The detector subscribes to a curated list of 54 default channels — Application, System, FSLogix, RemoteApp/RDP, SMB client/server, Terminal Services subsystems, auth (NTLM / Kerberos / LSA), infrastructure (DNS / TCPIP / CAPI2), user experience (GroupPolicy / User Profile Service / PrintService), and stability (WHEA / Windows Defender / Crashdump). Three knobs shape that list:

Key	Default	What it does
`security_channel_enabled`	false	Opt-in subscription to the `Security` channel. Disabled by default because subscribing requires `SeSecurityPrivilege` on the service account — the DrainCtl service runs as `LocalSystem`, which has this privilege, but if you run it under a different service account you must grant the right first. Enabling this can generate a lot of noise on a domain controller; pair with higher `min_count`.
`disabled_channels`	`[]`	Case-insensitive list of default channels to remove from the subscription set. Use this to silence a default channel without editing the curated list. Does not affect `added_channels` — so an admin who disables a default can still re-add it explicitly via the added list if they want custom per-channel tuning later.
`added_channels`	`[]`	Extra channel names to subscribe to beyond the curated defaults. Any channel you actually want the detector watching and that isn't already in the default set goes here — application-specific channels (`Microsoft-Windows-Hyper-V-Worker/Operational`, vendor-specific subsystems) are the typical case. Non-existent channels are logged as skipped at service start and do not block other subscriptions.

💡 Discovering channel names. List the full set of channels currently registered on a host with wevtutil el. The case used in that output is the exact string DrainCtl expects in added_channels. For a channel's log path, use wevtutil gl <channel-name>.

Tuning for your fleet

Defaults ship for a general-purpose Terminal Services host running well-behaved line-of-business apps. Four common scenarios warrant different postures:

Cautious (exploratory, new fleet)

You just turned the detector on. You don't yet know what your baseline looks like. You want to see borderline activity so you can decide what to tune. Prefer false positives to false negatives.

"evtspike": {
  "enabled": true,
  "min_count": 5,
  "threshold": 1e-3,
  "cooldown_minutes": 30,
  "slot_maturity_observations": 30
}

Lower min_count and looser threshold catch more. Longer cooldown prevents a single noisy channel from pager-bombing you while you figure out if it's signal or normal. Lower slot_maturity_observations than the default (90) lets per-slot scoring engage sooner so you see borderline activity earlier — at the cost of more false positives in the first few hours.

Strict (mature fleet, high-signal alerts only)

The baseline has been learning for a month. Operators are tired of informational spikes. You want only real emergencies to fire. Defaults already give you the strict-direction slot maturity; tighten the detection knobs:

"evtspike": {
  "enabled": true,
  "min_count": 25,
  "threshold": 1e-6,
  "cooldown_minutes": 10
}

Higher min_count filters out small deviations. Tighter threshold demands extreme tail events. The default slot_maturity_observations = 90 already means the detector refuses to call a slot mature until it has seen a full 15-minute visit in that time-of-day window — eliminates the "first morning back after a three-day weekend" false positive class.

Noisy-channel surgery (keep the rest, tame the one)

Global defaults work for 50+ channels but one specific channel (often Microsoft-Windows-SMBClient/Audit or Microsoft-Windows-Kerberos/Operational on a DC) keeps firing and you trust it's operational noise, not signal. Disable just that channel:

"evtspike": {
  "enabled": true,
  "disabled_channels": [
    "Microsoft-Windows-SMBClient/Audit"
  ]
}

Prefer this to globally raising min_count — it preserves sensitivity on the other 53 channels. Reinstate the channel after the workload stabilises.

DC-heavy fleet with Security auditing

You're running DrainCtl on domain controllers and want auth-failure flood detection. Enable Security, tighten thresholds since the baseline on auth channels is naturally high:

"evtspike": {
  "enabled": true,
  "min_count": 50,
  "threshold": 1e-6,
  "cooldown_minutes": 5,
  "security_channel_enabled": true,
  "mean_per_bucket_prior": 2.0,
  "half_life_buckets": 720
}

Higher mean_per_bucket_prior stops the first few hours of runtime from flagging normal Security churn. Longer half_life_buckets gives the baseline two hours of memory so short spiky authentications don't nudge the baseline upward. Short cooldown is appropriate because auth storms (password-spraying, service-account lockouts) are often distinct incidents arriving minutes apart.

⚠️ After changing learning knobs, reset the baseline. prior_strength, mean_per_bucket_prior, and half_life_buckets only affect the initial seeding of a detector — an already-running detector keeps its current posterior. Run drainctl baseline reset after changing any of these so the new values actually take effect. Detection knobs (min_count, threshold, cooldown) and channel knobs apply immediately on the next config pull; no reset needed.

Baseline persistence

The detector stores its per-channel sufficient statistics (Gamma-Poisson posteriors, robust-cap state, last-alert timestamps) to a single file:

%ProgramData%\LISS Technologies\LISSTech DrainCtl\baseline.json

The service rewrites this file on a fixed persistence cadence and once more on clean shutdown, so a restart resumes scoring from the same learned distribution instead of starting over.

Resetting a poisoned baseline

If a real operational incident fired during the training window — e.g., a patch Tuesday burst got learned as normal — the baseline can become desensitised. Reset it via the CLI:

drainctl baseline reset

This routes through the running service so both halves stay in sync: the in-memory detectors for every subscribed channel are wiped, and baseline.json is deleted. Subscriptions and the scoring loop keep running; a fresh baseline starts accumulating from the next 10-second scoring tick.

⚠️ Do not delete baseline.json by hand while the service runs. The file is rewritten from in-memory state on the next persistence tick, so a manual delete without the in-memory wipe achieves nothing. Always use drainctl baseline reset.

Routing spikes to a notification target

In the Configuration modal's notification target editor, add event_spike to any target's trigger list. The same target can combine drain events and event spikes (e.g., PagerDuty gets alert and event_spike; Slack gets everything). The target's repeat_minutes cooldown applies per trigger type, so an event_spike cooldown is independent of the drain-mode alert cooldown.

12. RMM Integration

DrainCtl is designed to integrate with any RMM platform that can run a script and check an exit code.

How it works

Your RMM runs a monitoring script at its polling interval
The script executes drainctl check
DrainCtl connects to the service via named pipe — instant response
The RMM captures stdout (structured key=value lines) and the exit code
Threshold: exit code 0 = Normal, 1 = Alert, 2 = Error

🎯 Pro tip: Even if your RMM only polls every 15 minutes, the service detected the change in real time. The audit trail has the exact timestamp and attribution. Your RMM is just sampling the service's state.

Minimal monitoring script

drainctl check --quiet
exit $LASTEXITCODE

That's it. --quiet suppresses intermediate log lines — only the final status line is emitted. Works with any RMM that supports PowerShell or CMD scripts with exit code thresholds.

JSON output for advanced integrations

# Parse JSON output for custom dashboards or APIs
$result = drainctl check --format json | ConvertFrom-Json
if ($result.exit_code -ne 0) {
    # Send to your ticketing system, dashboard, etc.
}

13. Auto-update

The agent can self-update by polling GitHub releases on a daily cadence, verifying the release manifest, verifying the MSI Authenticode signature against the LISS code-signing certificate, and running msiexec silently. Auto-update is off by default — it does not contact GitHub at all until the operator opts in.

Enabling auto-update

Edit config.json and add (or modify) the update object:

"update": {
  "enabled": true,
  "channel": "prerelease",
  "poll_interval": "24h"
}

The service watches config.json and reloads within seconds — no restart needed. The first poll fires 5–15 minutes after the change (jittered); subsequent polls run every poll_interval ± 8% (so 24h ± 2h on the default).

Field	Default	Notes
`enabled`	`false`	When `false`, no goroutine, no GitHub call. Default is opt-in by design — auto-installing software without explicit consent on a host violates the principle of least surprise.
`channel`	`"stable"`	Recognized values: `"stable"` and `"prerelease"`. `stable` polls `/releases/latest` (skips prereleases). `prerelease` polls `/releases?per_page=1` (most recent regardless of flag). Unknown values clamp to `"stable"` with a warn log.
`poll_interval`	`"24h"`	Go-duration string (`"24h"`, `"6h30m"`, `"72h"`). Minimum `"1h"`; lower values clamp.

⚠ Today, every release ships as “prerelease” Until the project graduates a non-prerelease build, channel: "stable" finds nothing on GitHub and the agent logs update=no_stable_release on each poll. To actually receive updates today, both "enabled": true AND "channel": "prerelease" are required. Once a stable release is published, "channel": "stable" becomes operationally useful.

How an update lands

Poll api.github.com for the latest release on the configured channel.
Compare CalVer tags numerically. If the remote version is not newer, log update=up_to_date and reschedule.
Download release.json and release.json.sig, then verify the Ed25519 signature against the public key embedded in the binary.
Download the signed MSI to %TEMP%\drainctl-update-<random>.msi and verify its SHA-256 matches the signed manifest.
Verify the Authenticode signature is valid AND the signing-certificate Subject CN equals exactly LISS Consulting, Corp. Anything else — missing signature, untrusted chain, wrong subject — is refused and the temp file is deleted.
Spawn msiexec /i <path> /quiet /norestart as a detached process so it survives the service's exit.
Trigger our own clean shutdown so file replacement isn't blocked by the running service. The MSI's custom action restarts the new version after install.

Verifying it’s working

Set log_file_level to "debug" in config.json to see every poll attempt. Then tail the daily file log:

Get-Content '$env:ProgramData\LISS Technologies\LISSTech DrainCtl\drainctl-*.log' |
    Select-String 'update='

You'll see lines like:

update=poll_start — one per scheduled poll.
update=up_to_date — remote is the same version we're running.
update=not_modified — GitHub returned 304 against the cached ETag.
update=no_stable_release — channel is "stable" and no non-prerelease release exists. Steady-state idle, not a failure.
update=installing old=… new=… — install path triggered. Persists 7 days in the rotated daily logs.
update=refused reason=… — manifest or Authenticode verification failed. Stays at the configured cadence (deliberate refusal, not a transient failure).

Opting back out

Set "enabled": false in config.json. The next poll tick (within poll_interval) will see the change and skip the GitHub call. To stop all activity immediately, restart the service after the edit.

14. Service Management

The MSI handles service installation. These commands are for manual management or troubleshooting.

# Check service status (CLI — shows Running / Stopped)
drainctl service status

# Check service status (PowerShell — richer output)
Get-Service DrainCtl

# Stop / Start / Restart
Stop-Service DrainCtl
Start-Service DrainCtl
Restart-Service DrainCtl

# Manual install (if not using MSI)
drainctl service install
drainctl service start

# Manual uninstall
drainctl service stop
drainctl service uninstall

What the service does

Subscribes to registry changes on TSServerDrainMode via RegNotifyChangeKeyValue — instant detection
Subscribes to Security Event ID 4657 via EvtSubscribe — knows who made the change
Polls every poll interval (default 60 seconds) as a safety net
Persists drain-mode transitions and performance metrics to a single SQLite database (drainctl.db, WAL mode) under the DrainCtl data directory — the audit trail, metrics tiers, server registry, and event-spike history all live here
Serves queries over a named pipe (\\.\pipe\drainctl) for instant CLI/PS responses
Writes to the Windows Event Log (Applications and Services Logs > DrainCtl) on state changes
Sends webhook/ntfy/email notifications when configured
Tracks active RDS sessions via WTSEnumerateSessionsW for utilization alerts
Hot-reloads configuration when config.json changes

Event Log

Open Event Viewer → Applications and Services Logs → DrainCtl:

ID	Level	What happened
1000	Info	Service started
1001	Info	Service stopped
1002	Info	Check: healthy
1004	Info	State transition detected
2000	Warning	Drain mode in grace period
3000	Error	Drain mode alert (grace exceeded)

File log

Alongside the Event Log sink, the service writes a slog-formatted text log to:

%ProgramData%\LISS Technologies\LISSTech DrainCtl\drainctl.log

The file rotates at local midnight: the previous day's file is renamed to drainctl-YYYY-MM-DD.log in the same directory, and a fresh drainctl.log is opened. Archives older than 7 days are pruned automatically at rotation time. The per-sink level is controlled by log_file_level (file) and log_event_level (Windows Event Log) in config.json; both default to info. The CLI is independent — pass --log-level debug for one-shot verbosity.

15. Troubleshooting

"changed_by" is empty

Run drainctl audit-setup as admin. On domain-joined machines, also configure the GPO (see Audit Setup).

Service won't start

Check the DrainCtl event log for DrainCtl errors (Event ID 3002). Common causes:

Another instance holds the SQLite database or named-pipe endpoint — restart the machine or find the process
Permissions on the ProgramData directory

CLI is slow (1-2 seconds instead of instant)

The service isn't running. drainctl check falls back to direct registry read + file-based audit. Start the service: Start-Service DrainCtl.

Notifications not arriving

drainctl notify status — verify URLs are set
drainctl notify test — sends a test message
Check the DrainCtl event log for "webhook notification failed", "ntfy notification failed", or "email notification failed" warnings
Verify network connectivity from the RDSH server to the webhook/ntfy/SMTP endpoint
For email: confirm the SMTP password is correct, the from address is authorized to send, and the relay allows connections from the server's IP

Dashboard shows "Offline" for a registered server

The server's DrainCtl service isn't reporting. Check:

Is the service running on that server? Get-Service DrainCtl
Is dashboard.url set in %ProgramData%\LISS Technologies\LISSTech DrainCtl\config.json?
Can the server reach the dashboard? Test-NetConnection dashboard-server -Port 49470
Check the DrainCtl event log on the reporting server for DrainCtl errors

Dashboard returns 401 or "Access Denied"

Windows Authentication (Kerberos) issue:

Make sure the dashboard URL is in the browser's Intranet zone (or Trusted Sites)
Try using the FQDN (e.g., https://server.domain.com:49470) — Kerberos needs the full hostname
Verify your account is in the configured DashboardGroup (default: Domain Admins)
For cross-domain scenarios, ensure Kerberos delegation is configured

Dashboard returns 403 on agent report

The reporting server isn't registered. Run drainctl register https://dashboard:49470 on that server.

Need help?

Open an issue on GitHub or contact LISS Technologies support.