OpsDash User Manual

Version 0.4.0 | For: Ops Engineers, System Administrators | Updated: 2026-04-01

1. Quick Start

1.1 Requirements

Dependency	Minimum Version	Notes
Docker & Docker Compose	Docker 24+, Compose v2	Runs all infrastructure containers
Python	3.11+	Backend runtime
Node.js	18+	Frontend build tooling
OS	Linux / macOS / WSL2	Ubuntu 22.04+ recommended

Hardware: 4-core CPU, 8 GB RAM, 50 GB disk (VictoriaMetrics retains 90 days of data).

1.2 Deployment Steps

Step 1: Start Infrastructure Containers

docker compose up -d

This launches 7 services:

Container	Image	Host Port
opsdash-postgres	postgres:16	5433
opsdash-redis	redis:7	6381
opsdash-zabbix-db	postgres:16	5434
opsdash-zabbix-server	zabbix/zabbix-server-pgsql:7.0	10051
opsdash-zabbix-web	zabbix/zabbix-web-nginx-pgsql:7.0	8080
opsdash-emqx	emqx/emqx:5.8	11883 / 28083
opsdash-victoriametrics	victoriametrics/victoria-metrics:v1.106.1	8428

Step 2: Configure Environment Variables

Create a .env file in the backend/ directory:

DATABASE_URL=postgresql+asyncpg://opsdash:opsdash123@localhost:5433/opsdash
REDIS_URL=redis://localhost:6381/0
SECRET_KEY=your-production-secret-key-change-me
CREDENTIAL_KEY=your-32-byte-encryption-key
ZABBIX_URL=http://localhost:8080/api_jsonrpc.php
ZABBIX_API_TOKEN=
EMQX_API_URL=http://localhost:28083/api/v5
EMQX_API_KEY=
EMQX_API_SECRET=
VM_QUERY_URL=http://localhost:8428
VM_WRITE_URL=http://localhost:8428

Step 3: Install Backend Dependencies & Initialize Database

cd backend
pip install -r requirements.txt
python3 -m app seed

The seed script creates all database tables and populates initial data including the admin account (admin / admin123), sample departments, devices, and alerts.

Step 4: Start the Backend API

uvicorn app.main:app --reload --host 0.0.0.0 --port 8001

Step 5: Start the Frontend Dev Server

cd frontend
npm install
npm run dev

Frontend runs at http://localhost:3000 and proxies /api requests to the backend on port 8001.

1.3 First Login

Open http://localhost:3000 in your browser
Log in with default admin credentials: admin / admin123
You will be redirected to the Operations Dashboard
Important: Change the default admin password immediately via User Management

2. Feature Guide

2.1 Operations Dashboard

Navigation: Sidebar → Daily Ops → Operations Dashboard

The dashboard is the system homepage, providing a comprehensive view of your infrastructure. It uses tabbed navigation with 5 tabs:

Tab	Content
Overview	KPI cards + department summary + live alert ticker
Charts	Device status pie chart + alert trend bar chart
Resource Usage	Top N device utilization + global trend graphs
Health Heatmap	Device health matrix (grouped by location/type/department)
Ops Timeline	Alert / maintenance / change event timeline

Dashboard data is updated in real-time via WebSocket. When the monitoring engine detects device status changes or new alerts, the page refreshes automatically.

Filters: Six multi-select dimensions (department, device type, subtype, location, tags, manager). All tabs refresh in sync when filters change.

Custom Layout: Click the gear icon to show/hide modules per tab and adjust display order. Layout preferences are saved to browser local storage.

2.2 Alert Monitoring

Navigation: Sidebar → Daily Ops → Alerts → Live Alerts

The live alerts page supports multi-dimensional filtering: severity level, target type, alert source (zabbix/emqx/opsdash), resolution status, suppressed/excluded toggle, keyword search, and time range. All filter state is synced to the URL for easy sharing.

Alert Actions

Acknowledge: Records who acknowledged the alert and when
Resolve: Marks the alert as resolved, removing it from the active list
Batch Operations: Select multiple alerts for bulk acknowledge or resolve

Alert Convergence Model

The system uses a "persistent open" model -- when the same target triggers the same rule condition continuously, no new alert record is created. Instead, the existing alert's repeat count is incremented. When the metric recovers, the open alert is automatically resolved.

Alert Correlation Groups

When a network device failure causes downstream devices to go offline, the system automatically groups related alerts for efficient root cause analysis.

Topology Cascade: BFS traversal of topology links when a network device goes down
Location Correlation: Multiple devices down at the same location are auto-grouped

Root Cause Analysis

Critical and warning alerts offer root cause analysis, showing upstream link alerts, correlation group lead alerts, and co-located alerts to help pinpoint the source of the issue.

2.3 Alert Rules

Navigation: Sidebar → Daily Ops → Alerts → Alert Rules

Alert rules define threshold-based automatic alerting. All enabled rules are evaluated every 60 seconds against metric data from VictoriaMetrics.

Key rule settings:

Scope: Global / condition-based matching (9 dimensions) / specific targets
Metric Name: Auto-discovered from VictoriaMetrics, searchable dropdown
Comparison: gt / lt / gte / lte / eq / neq
Dual Thresholds: Warning and Critical levels
Duration: Seconds the metric must exceed the threshold before triggering
Cooldown: Controls repeat notification frequency (default 300s)
Notification Channels: Select specific channels or broadcast to all enabled ones

Alert Exclusion Rules

Navigation: Sidebar → Daily Ops → Alerts → Alert Exclusions

Alert exclusion rules filter out alerts that do not require attention (e.g., during planned maintenance). Matched alerts are still recorded but auto-resolved without sending notifications.

Three time modes: Permanent, One-time (start/end time), and Recurring (periodic maintenance windows).

2.4 Device Management

Navigation: Sidebar → Asset Management → Devices

Device List Features

Multi-field search: Searches device name, hostname, IP, MAC address, and serial number simultaneously
Rich filters: Department, device type, subtype, brand, status, zone, rack, tags, favorites
Agent status column: Shows Zabbix Agent installation state per device
URL state persistence: Filter state is synced to the URL and restored on navigation
Import/Export: Bulk import/export in CSV/Excel format with automatic MAC/IP deduplication

Device Types

10 supported types: network, server, security, database, storage, industrial, iot, ups, hvac, custom.

Device Detail Page

9 tabs: Basic Info, Link Relations, Software, Alerts, Monitoring Data, IoT Location, Agent Management, Change Log, and Custom Fields.

Device status updates in real-time via WebSocket. Top-level stats include 30-day availability (SLA), downtime duration, and status reason.

Auto-Discovery

Enter a management IP and click "Auto Detect" -- the system probes the device (TCP scan, banner grab, SNMP query, fingerprint fusion) and fills in device type, brand/model, and more.

Batch Operations

Select multiple devices for: bulk delete, bulk update, bulk agent install/upgrade/uninstall/health check, config update, bulk template bind/unbind.

Agent Installation

One-click remote Zabbix Agent installation covering Ubuntu/Debian/CentOS/RHEL/Rocky/Alma across amd64/arm64/armhf architectures. Offline installation -- target machines do not need internet access. Batch installation is also supported.

2.5 Software Assets

Software instances are attached to server-type devices. Supports department/type filtering. Detail page includes 5 tabs: Basic Info, Dependencies, Alerts, Monitoring Data, Change Log.