User Guide

Comprehensive guides covering every feature module

OpsDash User Manual

Version 0.4.0  |  For: Ops Engineers, System Administrators  |  Updated: 2026-04-01

1. Quick Start

1.1 Requirements

DependencyMinimum VersionNotes
Docker & Docker ComposeDocker 24+, Compose v2Runs all infrastructure containers
Python3.11+Backend runtime
Node.js18+Frontend build tooling
OSLinux / macOS / WSL2Ubuntu 22.04+ recommended

Hardware: 4-core CPU, 8 GB RAM, 50 GB disk (VictoriaMetrics retains 90 days of data).

1.2 Deployment Steps

Step 1: Start Infrastructure Containers

docker compose up -d

This launches 7 services:

ContainerImageHost Port
opsdash-postgrespostgres:165433
opsdash-redisredis:76381
opsdash-zabbix-dbpostgres:165434
opsdash-zabbix-serverzabbix/zabbix-server-pgsql:7.010051
opsdash-zabbix-webzabbix/zabbix-web-nginx-pgsql:7.08080
opsdash-emqxemqx/emqx:5.811883 / 28083
opsdash-victoriametricsvictoriametrics/victoria-metrics:v1.106.18428

Step 2: Configure Environment Variables

Create a .env file in the backend/ directory:

DATABASE_URL=postgresql+asyncpg://opsdash:opsdash123@localhost:5433/opsdash
REDIS_URL=redis://localhost:6381/0
SECRET_KEY=your-production-secret-key-change-me
CREDENTIAL_KEY=your-32-byte-encryption-key
ZABBIX_URL=http://localhost:8080/api_jsonrpc.php
ZABBIX_API_TOKEN=
EMQX_API_URL=http://localhost:28083/api/v5
EMQX_API_KEY=
EMQX_API_SECRET=
VM_QUERY_URL=http://localhost:8428
VM_WRITE_URL=http://localhost:8428

Step 3: Install Backend Dependencies & Initialize Database

cd backend
pip install -r requirements.txt
python3 -m app seed

The seed script creates all database tables and populates initial data including the admin account (admin / admin123), sample departments, devices, and alerts.

Step 4: Start the Backend API

uvicorn app.main:app --reload --host 0.0.0.0 --port 8001

Step 5: Start the Frontend Dev Server

cd frontend
npm install
npm run dev

Frontend runs at http://localhost:3000 and proxies /api requests to the backend on port 8001.

1.3 First Login

  1. Open http://localhost:3000 in your browser
  2. Log in with default admin credentials: admin / admin123
  3. You will be redirected to the Operations Dashboard
  4. Important: Change the default admin password immediately via User Management

2. Feature Guide

2.1 Operations Dashboard

Navigation: Sidebar → Daily Ops → Operations Dashboard

The dashboard is the system homepage, providing a comprehensive view of your infrastructure. It uses tabbed navigation with 5 tabs:

TabContent
OverviewKPI cards + department summary + live alert ticker
ChartsDevice status pie chart + alert trend bar chart
Resource UsageTop N device utilization + global trend graphs
Health HeatmapDevice health matrix (grouped by location/type/department)
Ops TimelineAlert / maintenance / change event timeline

Dashboard data is updated in real-time via WebSocket. When the monitoring engine detects device status changes or new alerts, the page refreshes automatically.

Filters: Six multi-select dimensions (department, device type, subtype, location, tags, manager). All tabs refresh in sync when filters change.

Custom Layout: Click the gear icon to show/hide modules per tab and adjust display order. Layout preferences are saved to browser local storage.

2.2 Alert Monitoring

Navigation: Sidebar → Daily Ops → Alerts → Live Alerts

The live alerts page supports multi-dimensional filtering: severity level, target type, alert source (zabbix/emqx/opsdash), resolution status, suppressed/excluded toggle, keyword search, and time range. All filter state is synced to the URL for easy sharing.

Alert Actions

Alert Convergence Model

The system uses a "persistent open" model -- when the same target triggers the same rule condition continuously, no new alert record is created. Instead, the existing alert's repeat count is incremented. When the metric recovers, the open alert is automatically resolved.

Alert Correlation Groups

When a network device failure causes downstream devices to go offline, the system automatically groups related alerts for efficient root cause analysis.

Root Cause Analysis

Critical and warning alerts offer root cause analysis, showing upstream link alerts, correlation group lead alerts, and co-located alerts to help pinpoint the source of the issue.

2.3 Alert Rules

Navigation: Sidebar → Daily Ops → Alerts → Alert Rules

Alert rules define threshold-based automatic alerting. All enabled rules are evaluated every 60 seconds against metric data from VictoriaMetrics.

Key rule settings:

  1. Scope: Global / condition-based matching (9 dimensions) / specific targets
  2. Metric Name: Auto-discovered from VictoriaMetrics, searchable dropdown
  3. Comparison: gt / lt / gte / lte / eq / neq
  4. Dual Thresholds: Warning and Critical levels
  5. Duration: Seconds the metric must exceed the threshold before triggering
  6. Cooldown: Controls repeat notification frequency (default 300s)
  7. Notification Channels: Select specific channels or broadcast to all enabled ones

Alert Exclusion Rules

Navigation: Sidebar → Daily Ops → Alerts → Alert Exclusions

Alert exclusion rules filter out alerts that do not require attention (e.g., during planned maintenance). Matched alerts are still recorded but auto-resolved without sending notifications.

Three time modes: Permanent, One-time (start/end time), and Recurring (periodic maintenance windows).

2.4 Device Management

Navigation: Sidebar → Asset Management → Devices

Device List Features

Device Types

10 supported types: network, server, security, database, storage, industrial, iot, ups, hvac, custom.

Device Detail Page

9 tabs: Basic Info, Link Relations, Software, Alerts, Monitoring Data, IoT Location, Agent Management, Change Log, and Custom Fields.

Device status updates in real-time via WebSocket. Top-level stats include 30-day availability (SLA), downtime duration, and status reason.

Auto-Discovery

Enter a management IP and click "Auto Detect" -- the system probes the device (TCP scan, banner grab, SNMP query, fingerprint fusion) and fills in device type, brand/model, and more.

Batch Operations

Select multiple devices for: bulk delete, bulk update, bulk agent install/upgrade/uninstall/health check, config update, bulk template bind/unbind.

Agent Installation

One-click remote Zabbix Agent installation covering Ubuntu/Debian/CentOS/RHEL/Rocky/Alma across amd64/arm64/armhf architectures. Offline installation -- target machines do not need internet access. Batch installation is also supported.

2.5 Software Assets

Software instances are attached to server-type devices. Supports department/type filtering. Detail page includes 5 tabs: Basic Info, Dependencies, Alerts, Monitoring Data, Change Log.

For full details, see the Help Documentation within OpsDash.

2.6 Topology

Interactive network topology powered by AntV G6 5.0. Features node/link visualization, toolbar operations (layout switching, search, export), fault impact analysis, path analysis, historical playback, and multi-version comparison.

For full details, see the Help Documentation within OpsDash.

2.7 IoT Device Management

Deep IoT management module: device models (thing models), telemetry dashboards (gauges + historical charts), alert templates, geofencing (circle/polygon + enter/exit events), device command dispatch, batch configuration, firmware management and OTA upgrades, gateway topology, data quality dashboard.

For full details, see the Help Documentation within OpsDash.

2.8 Metrics Viewer

ECharts-based MetricChart component auto-discovers all device metrics in VictoriaMetrics and groups them by category. Supports line chart, bar chart, and gauge display modes.

For full details, see the Help Documentation within OpsDash.

2.9 Remote Operations

Batch command/script execution (10 vendor CLI profiles), script library, execution strategies (rolling/canary/grouped), risk assessment and approval workflows, DAG workflow orchestration, config backup with diff comparison, IPMI out-of-band management, execution statistics.

For full details, see the Help Documentation within OpsDash.

2.10 Network Discovery

Automated network scanning based on Zabbix Discovery: create scan tasks, auto-detect devices, review discovery results, one-click or batch onboarding. Supports TCP port scanning, SNMP probing, and ICMP ping.

For full details, see the Help Documentation within OpsDash.

2.11 Department Management

Multi-tenant department management: create/edit/delete departments, devices and software are isolated per department, supports department-level role assignment.

For full details, see the Help Documentation within OpsDash.

2.12 Users & Permissions

RBAC permission system: 5 built-in roles (admin/engineer/viewer/dept_engineer/dept_viewer), 43 permission strings across 21 resources. Supports global and department-scoped roles (one user can have different roles in different departments). SSO integration with Feishu, DingTalk, and WeCom.

For full details, see the Help Documentation within OpsDash.

2.13 Credential Management

Navigation: Sidebar → Operations → Remote Ops → Credentials

Centralized credential storage for all monitoring protocols. Supports 10 credential types (SNMP/SSH/Database/JMX/IPMI/WMI/ONVIF/K8s/HTTP/API Key), 4 scopes (global/subnet/device_type/device), encrypted at rest.

Key features:

For full details, see the Help Documentation within OpsDash.

2.14 Tag Management

Lightweight cross-resource tagging: centralized tag color/category/description management, smart auto-tagging rules, multi-tag AND/OR filtering, batch tagging, tag merging, and usage analytics.

For full details, see the Help Documentation within OpsDash.

2.15 Notification Channels

4 notification channels: Email, DingTalk Webhook, Feishu Webhook, WeCom Webhook. Supports escalation policies (multi-step progressive notification).

For full details, see the Help Documentation within OpsDash.

2.16 - 2.27 More Features

OpsDash also includes:

For full details, see the Help Documentation within OpsDash.

3. External Systems

Used during initial deployment setup and advanced troubleshooting. For daily monitoring, use the Engine Status page within OpsDash.

Full content with PromQL examples and integration configuration is available in OpsDash's Help Documentation.

4. Configuration

Includes environment variable reference, port map, and Docker container management. Key environment variables:

Full environment variable table and port map available in OpsDash's Help Documentation.

5. FAQ

Forgot admin password?

Connect to PostgreSQL, generate a new password hash with python3 -c "from app.auth import get_password_hash; print(get_password_hash('new_password'))" and UPDATE the user record. Or delete the admin user and re-run python3 -m app seed to restore the default password admin123.

Device status stuck on "unknown"?

Verify the monitoring engine is running, the device has a monitoring protocol and management IP configured, and SNMP credentials are correct. New devices may take up to 60 seconds to start monitoring.

Zabbix / EMQX connection failed?

Check that containers are running, .env is configured correctly, and API tokens have been created. Zabbix Server may take 2-5 minutes to initialize on first start.

No data in VictoriaMetrics?

Verify the container is running, VM_QUERY_URL and VM_WRITE_URL in .env are correct, and the monitoring engine is actively collecting data.

IoT device can't connect via MQTT?

Check that the EMQX container is running, MQTT port 11883 is reachable, and device authentication credentials are correct.

More FAQs available in OpsDash's Help Documentation.